python - Comparing and replacing values inside DataFrames -


i have 2 lists, 1 being main list used "key" , other 1 being updated due missing information.

main_df:

+---------+--------+--------+--------+--------+ | id      | value1 | value2 | value3 | value4 | +=========+========+========+========+========+ | 9845213 | 1      | 11     |      | aa     | +---------+--------+--------+--------+--------+ | 545167  | 2      | 22     | b      | bb     | +---------+--------+--------+--------+--------+ | 132498  | 3      | 33     | c      | cc     | +---------+--------+--------+--------+--------+ | 89465   | 4      | 44     | d      | dd     | +---------+--------+--------+--------+--------+ | 871564  | 5      | 55     | e      | ee     | +---------+--------+--------+--------+--------+ | 646879  | 6      | 66     | f      | ff     | +---------+--------+--------+--------+--------+ ... 

data_df:

+----------+--------+--------+--------+--------+--------+ | id       | value1 | value2 | value3 | value4 | value5 | +==========+========+========+========+========+========+ | 4968712  | nan    | nan    |      | aa     | a1     | +----------+--------+--------+--------+--------+--------+ | 21347987 | 2      | 22     | b      | bb     | b2     | +----------+--------+--------+--------+--------+--------+ | 4168512  | nan    | nan    | c      | cc     | c3     | +----------+--------+--------+--------+--------+--------+ | 31468612 | 4      | 44     | d      | dd     | d4     | +----------+--------+--------+--------+--------+--------+ | 9543213  | 5      | 55     | e      | ee     | e5     | +----------+--------+--------+--------+--------+--------+ | 324798   | nan    | nan    | f      | ff     | f6     | +----------+--------+--------+--------+--------+--------+ 

what i'm trying use value3 , value4 main_df in order update values1 , values2 in data_df.

none of merge, join, , concatenate work me since need keep 2 files separate.

i tried using working missing data , .replace() i'm not sure how extract values needed main_df replace nan values in data_df.

try following code uses update() function.

import numpy np import pandas pd  main_df = pd.read_csv('/home/jian/downloads/main.txt', sep='|') data_df = pd.read_csv('/home/jian/downloads/data.csv')  out[229]:        id      lat     long                        city state                 time 0  12345      nan      nan           cape hinchinbrook    ak  2015-06-27 21:03:19 1  12346      nan      nan              delenia island    ak  2015-06-27 21:03:19 2  12347  29.7401 -95.4636                     houston    tx  2015-06-27 21:03:19 3  12348  41.7132 -83.7032                    sylvania    oh  2015-06-27 21:03:19 4  12349      nan      nan                  alaskaland    ak  2015-06-27 21:03:19 5  12350      nan      nan  badger road baptist church    ak  2015-06-27 21:03:19  main_df_part = main_df[['prim_lat_dec', 'prim_long_dec','feature_name', 'state_alpha']] main_df_part.columns = ['lat', 'long', 'city', 'state'] main_df_part = main_df_part.set_index(['city', 'state'])  out[230]:                                        lat      long city                       state                    pacific ocean              ca     39.3103 -123.8447 cape hinchinbrook          ak     60.2347 -146.6417 delenia island             ak     60.3394 -148.1383 alaskaland                 ak     64.8394 -147.7700 badger road baptist church ak     64.8167 -147.5661 barnes creek               ak     65.0014 -147.2939 barnette magnet school     ak     64.8383 -147.7300 bentley park               ak     64.8364 -147.6942  data_df = data_df.set_index(['city', 'state'])  out[233]:                                       id      lat     long                 time city                       state                                               cape hinchinbrook          ak     12345      nan      nan  2015-06-27 21:03:19 delenia island             ak     12346      nan      nan  2015-06-27 21:03:19 houston                    tx     12347  29.7401 -95.4636  2015-06-27 21:03:19 sylvania                   oh     12348  41.7132 -83.7032  2015-06-27 21:03:19 alaskaland                 ak     12349      nan      nan  2015-06-27 21:03:19 badger road baptist church ak     12350      nan      nan  2015-06-27 21:03:19   data_df.update(main_df_part)  out[235]:                                       id      lat      long                 time city                       state                                                cape hinchinbrook          ak     12345  60.2347 -146.6417  2015-06-27 21:03:19 delenia island             ak     12346  60.3394 -148.1383  2015-06-27 21:03:19 houston                    tx     12347  29.7401  -95.4636  2015-06-27 21:03:19 sylvania                   oh     12348  41.7132  -83.7032  2015-06-27 21:03:19 alaskaland                 ak     12349  64.8394 -147.7700  2015-06-27 21:03:19 badger road baptist church ak     12350  64.8167 -147.5661  2015-06-27 21:03:19 

Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -