Python: Pandas generating downward filling variables in DataFrame -
i have following dataframe df
:
s 2011-01-26 1 2011-01-27 0 2011-01-28 0 2011-01-29 0 2011-01-30 0 2011-01-31 0 2011-02-01 0 2011-02-02 0 2011-02-03 0 2011-02-04 0 2011-02-05 0 2011-02-06 0 2011-02-07 0 2011-02-08 0 2011-02-09 0
i trying generate following dataframe df
:
s s1 s2 s3 2011-01-26 1 0 0 0 2011-01-27 0 1 0 0 2011-01-28 0 1 0 0 2011-01-29 0 0 1 0 2011-01-30 0 0 1 0 2011-01-31 0 0 1 0 2011-02-01 0 0 1 0 2011-02-02 0 0 0 1 2011-02-03 0 0 0 1 2011-02-04 0 0 0 1 2011-02-05 0 0 0 1 2011-02-06 0 0 0 1 2011-02-07 0 0 0 1 2011-02-08 0 0 0 1 2011-02-09 0 0 0 1
you can see number of 1
in each columns increases downward multiple of 2. there in pandas function, fillna
can specify fill downwards x rows?
update in fact, have more complicated task.
if df
:
s 2011-01-26 1 2011-01-27 0 2011-01-28 0 2011-01-29 0 2011-01-30 0 2011-01-31 0 2011-02-01 0 2011-02-02 0 2011-02-03 0 2011-02-04 0 2011-02-05 0 2011-02-06 0 2011-02-07 0 2011-02-08 0 2011-02-09 0 ... (all zeros) s 2011-04-26 1 2011-04-27 0 2011-04-28 0 2011-04-29 0 2011-04-30 0 2011-04-31 0 2011-05-01 0 2011-05-02 0 2011-05-03 0 2011-05-04 0 2011-05-05 0 2011-05-06 0 2011-05-07 0 2011-05-08 0 2011-05-09 0
and need this:
s s1 s2 s3 2011-01-26 1 0 0 0 2011-01-27 0 1 0 0 2011-01-28 0 1 0 0 2011-01-29 0 0 1 0 2011-01-30 0 0 1 0 2011-01-31 0 0 1 0 2011-02-01 0 0 1 0 2011-02-02 0 0 0 1 2011-02-03 0 0 0 1 2011-02-04 0 0 0 1 2011-02-05 0 0 0 1 2011-02-06 0 0 0 1 2011-02-07 0 0 0 1 2011-02-08 0 0 0 1 2011-02-09 0 0 0 1 zeros every s s1 s2 s3 2011-04-26 1 0 0 0 2011-04-27 0 1 0 0 2011-04-28 0 1 0 0 2011-04-29 0 0 1 0 2011-04-30 0 0 1 0 2011-04-31 0 0 1 0 2011-05-01 0 0 1 0 2011-05-02 0 0 0 1 2011-05-03 0 0 0 1 2011-05-04 0 0 0 1 2011-05-05 0 0 0 1 2011-05-06 0 0 0 1 2011-05-07 0 0 0 1 2011-05-08 0 0 0 1 2011-05-09 0 0 0 1
to best knowledge, there no ready-available function this. can use following trick similar.
import pandas pd import numpy np # data # ======================================== df = pd.dataframe(0, index=pd.date_range('2015-01-01', periods=100, freq='d'), columns=['col']) df.iloc[[0, 71], 0] = 1 grouped = df.groupby(df.col.cumsum()) grouped.get_group(1) out[275]: col 2015-01-01 1 2015-01-02 0 2015-01-03 0 2015-01-04 0 2015-01-05 0 2015-01-06 0 2015-01-07 0 2015-01-08 0 ... ... 2015-03-05 0 2015-03-06 0 2015-03-07 0 2015-03-08 0 2015-03-09 0 2015-03-10 0 2015-03-11 0 2015-03-12 0 [71 rows x 1 columns] grouped.get_group(2) out[276]: col 2015-03-13 1 2015-03-14 0 2015-03-15 0 2015-03-16 0 2015-03-17 0 2015-03-18 0 2015-03-19 0 2015-03-20 0 ... ... 2015-04-03 0 2015-04-04 0 2015-04-05 0 2015-04-06 0 2015-04-07 0 2015-04-08 0 2015-04-09 0 2015-04-10 0 [29 rows x 1 columns] # processing # ================================== def func(group): group['temp'] = 0 group.temp.iloc[2 ** np.arange(int(np.log2(len(group))) + 1) - 1] = 1 group['new_col'] = group.temp.cumsum() return pd.get_dummies(group.new_col) grouped.apply(func) out[281]: 1 2 3 4 5 6 7 2015-01-01 1 0 0 0 0 0 0 2015-01-02 0 1 0 0 0 0 0 2015-01-03 0 1 0 0 0 0 0 2015-01-04 0 0 1 0 0 0 0 2015-01-05 0 0 1 0 0 0 0 2015-01-06 0 0 1 0 0 0 0 2015-01-07 0 0 1 0 0 0 0 2015-01-08 0 0 0 1 0 0 0 ... .. .. .. .. .. .. .. 2015-04-03 0 0 0 0 1 nan nan 2015-04-04 0 0 0 0 1 nan nan 2015-04-05 0 0 0 0 1 nan nan 2015-04-06 0 0 0 0 1 nan nan 2015-04-07 0 0 0 0 1 nan nan 2015-04-08 0 0 0 0 1 nan nan 2015-04-09 0 0 0 0 1 nan nan 2015-04-10 0 0 0 0 1 nan nan
Comments
Post a Comment