python - Comparing and replacing values inside DataFrames -
i have 2 lists, 1 being main list used "key" , other 1 being updated due missing information.
main_df:
+---------+--------+--------+--------+--------+ | id | value1 | value2 | value3 | value4 | +=========+========+========+========+========+ | 9845213 | 1 | 11 | | aa | +---------+--------+--------+--------+--------+ | 545167 | 2 | 22 | b | bb | +---------+--------+--------+--------+--------+ | 132498 | 3 | 33 | c | cc | +---------+--------+--------+--------+--------+ | 89465 | 4 | 44 | d | dd | +---------+--------+--------+--------+--------+ | 871564 | 5 | 55 | e | ee | +---------+--------+--------+--------+--------+ | 646879 | 6 | 66 | f | ff | +---------+--------+--------+--------+--------+ ...
data_df:
+----------+--------+--------+--------+--------+--------+ | id | value1 | value2 | value3 | value4 | value5 | +==========+========+========+========+========+========+ | 4968712 | nan | nan | | aa | a1 | +----------+--------+--------+--------+--------+--------+ | 21347987 | 2 | 22 | b | bb | b2 | +----------+--------+--------+--------+--------+--------+ | 4168512 | nan | nan | c | cc | c3 | +----------+--------+--------+--------+--------+--------+ | 31468612 | 4 | 44 | d | dd | d4 | +----------+--------+--------+--------+--------+--------+ | 9543213 | 5 | 55 | e | ee | e5 | +----------+--------+--------+--------+--------+--------+ | 324798 | nan | nan | f | ff | f6 | +----------+--------+--------+--------+--------+--------+
what i'm trying use value3
, value4
main_df
in order update values1
, values2
in data_df
.
none of merge, join, , concatenate work me since need keep 2 files separate.
i tried using working missing data , .replace()
i'm not sure how extract values needed main_df
replace nan
values in data_df
.
try following code uses update()
function.
import numpy np import pandas pd main_df = pd.read_csv('/home/jian/downloads/main.txt', sep='|') data_df = pd.read_csv('/home/jian/downloads/data.csv') out[229]: id lat long city state time 0 12345 nan nan cape hinchinbrook ak 2015-06-27 21:03:19 1 12346 nan nan delenia island ak 2015-06-27 21:03:19 2 12347 29.7401 -95.4636 houston tx 2015-06-27 21:03:19 3 12348 41.7132 -83.7032 sylvania oh 2015-06-27 21:03:19 4 12349 nan nan alaskaland ak 2015-06-27 21:03:19 5 12350 nan nan badger road baptist church ak 2015-06-27 21:03:19 main_df_part = main_df[['prim_lat_dec', 'prim_long_dec','feature_name', 'state_alpha']] main_df_part.columns = ['lat', 'long', 'city', 'state'] main_df_part = main_df_part.set_index(['city', 'state']) out[230]: lat long city state pacific ocean ca 39.3103 -123.8447 cape hinchinbrook ak 60.2347 -146.6417 delenia island ak 60.3394 -148.1383 alaskaland ak 64.8394 -147.7700 badger road baptist church ak 64.8167 -147.5661 barnes creek ak 65.0014 -147.2939 barnette magnet school ak 64.8383 -147.7300 bentley park ak 64.8364 -147.6942 data_df = data_df.set_index(['city', 'state']) out[233]: id lat long time city state cape hinchinbrook ak 12345 nan nan 2015-06-27 21:03:19 delenia island ak 12346 nan nan 2015-06-27 21:03:19 houston tx 12347 29.7401 -95.4636 2015-06-27 21:03:19 sylvania oh 12348 41.7132 -83.7032 2015-06-27 21:03:19 alaskaland ak 12349 nan nan 2015-06-27 21:03:19 badger road baptist church ak 12350 nan nan 2015-06-27 21:03:19 data_df.update(main_df_part) out[235]: id lat long time city state cape hinchinbrook ak 12345 60.2347 -146.6417 2015-06-27 21:03:19 delenia island ak 12346 60.3394 -148.1383 2015-06-27 21:03:19 houston tx 12347 29.7401 -95.4636 2015-06-27 21:03:19 sylvania oh 12348 41.7132 -83.7032 2015-06-27 21:03:19 alaskaland ak 12349 64.8394 -147.7700 2015-06-27 21:03:19 badger road baptist church ak 12350 64.8167 -147.5661 2015-06-27 21:03:19
Comments
Post a Comment