2016-10-19 47 views
5

Günlük verileri mali çeyrek verilerine dönüştürmeye çalışıyorum.Tarihler pandalar DataFrame nasıl gruplanır?

Company Period Quarter_End 
M  2016Q1 05/02/2015 
M  2016Q2 08/01/2015 
M  2016Q3 10/31/2015 
M  2016Q4 01/30/2016 
WFM  2015Q2 04/12/2015 
WFM  2015Q3 07/05/2015 
WFM  2015Q4 09/27/2015 
WFM  2016Q1 01/17/2016 

ve günlük verilerin bir tablo: Örneğin, ben bir mali çeyrekte bitiş tarihleri ​​ile tablo var

Company Date  Price 
M  06/20/2015 1.05 
M  06/22/2015 4.05 
M  07/10/2015 3.45 
M  07/29/2015 1.86 
M  08/24/2015 1.58 
M  09/02/2015 8.64 
M  09/22/2015 2.56 
M  10/20/2015 5.42 
M  11/02/2015 1.58 
M  11/24/2015 4.58 
M  12/03/2015 6.48 
M  12/05/2015 4.56 
M  01/03/2016 7.14 
M  01/30/2016 6.34 
WFM  06/20/2015 1.05 
WFM  06/22/2015 4.05 
WFM  07/10/2015 3.45 
WFM  07/29/2015 1.86 
WFM  08/24/2015 1.58 
WFM  09/02/2015 8.64 
WFM  09/22/2015 2.56 
WFM  10/20/2015 5.42 
WFM  11/02/2015 1.58 
WFM  11/24/2015 4.58 
WFM  12/03/2015 6.48 
WFM  12/05/2015 4.56 
WFM  01/03/2016 7.14 
WFM  01/17/2016 6.34 

Ve aşağıda tablo oluşturmak istiyoruz.

Company Period Quarter_end Sum(Price) 
M  2016Q2 8/1/2015 10.41 
M  2016Q3 10/31/2015 18.2 
M  2016Q4 1/30/2016 30.68 
WFM  2015Q3 7/5/2015 5.1 
WFM  2015Q4 9/27/2015 18.09 
WFM  2016Q1 1/17/2016 36.1 

Ancak, ben her kayıt döngü olmadan tarihleri ​​değiştirilerek nasıl grubunu bilmiyorum. Herhangi bir yardım büyük beğeni topluyor.

Teşekkürler!

cevap

6

Sana merge_ordered kullanabilirsiniz düşünüyorum:

#first convert columns to datetime 
df1.Quarter_End = pd.to_datetime(df1.Quarter_End) 
df2.Date = pd.to_datetime(df2.Date) 


df = pd.merge_ordered(df1, 
         df2, 
         left_on=['Company','Quarter_End'], 
         right_on=['Company','Date'], 
         how='outer') 
print (df) 
    Company Period Quarter_End  Date Price 
0  M 2016Q1 2015-05-02  NaT NaN 
1  M  NaN   NaT 2015-06-20 1.05 
2  M  NaN   NaT 2015-06-22 4.05 
3  M  NaN   NaT 2015-07-10 3.45 
4  M  NaN   NaT 2015-07-29 1.86 
5  M 2016Q2 2015-08-01  NaT NaN 
6  M  NaN   NaT 2015-08-24 1.58 
7  M  NaN   NaT 2015-09-02 8.64 
8  M  NaN   NaT 2015-09-22 2.56 
9  M  NaN   NaT 2015-10-20 5.42 
10  M 2016Q3 2015-10-31  NaT NaN 
11  M  NaN   NaT 2015-11-02 1.58 
12  M  NaN   NaT 2015-11-24 4.58 
13  M  NaN   NaT 2015-12-03 6.48 
14  M  NaN   NaT 2015-12-05 4.56 
15  M  NaN   NaT 2016-01-03 7.14 
16  M 2016Q4 2016-01-30 2016-01-30 6.34 
17  WFM 2015Q2 2015-04-12  NaT NaN 
18  WFM  NaN   NaT 2015-06-20 1.05 
19  WFM  NaN   NaT 2015-06-22 4.05 
20  WFM 2015Q3 2015-07-05  NaT NaN 
21  WFM  NaN   NaT 2015-07-10 3.45 
22  WFM  NaN   NaT 2015-07-29 1.86 
23  WFM  NaN   NaT 2015-08-24 1.58 
24  WFM  NaN   NaT 2015-09-02 8.64 
25  WFM  NaN   NaT 2015-09-22 2.56 
26  WFM 2015Q4 2015-09-27  NaT NaN 
27  WFM  NaN   NaT 2015-10-20 5.42 
28  WFM  NaN   NaT 2015-11-02 1.58 
29  WFM  NaN   NaT 2015-11-24 4.58 
30  WFM  NaN   NaT 2015-12-03 6.48 
31  WFM  NaN   NaT 2015-12-05 4.56 
32  WFM  NaN   NaT 2016-01-03 7.14 
33  WFM 2016Q1 2016-01-17 2016-01-17 6.34 

Sonra bfill ve sum agrega Sütunların Period yılında NaN ve Quarter_End doldururuz. Series.dropna ve son reset_index, tüm NaN değerleri kaldırmak eklemek gerekirse:

df.Period = df.Period.bfill() 
df.Quarter_End = df.Quarter_End.bfill() 

print (df.groupby(['Company','Period','Quarter_End'])['Price'].sum().dropna().reset_index()) 

    Company Period Quarter_End Price 
0  M 2016Q2 2015-08-01 10.41 
1  M 2016Q3 2015-10-31 18.20 
2  M 2016Q4 2016-01-30 30.68 
3  WFM 2015Q3 2015-07-05 5.10 
4  WFM 2015Q4 2015-09-27 18.09 
5  WFM 2016Q1 2016-01-17 36.10 
4
  • set_index
  • pd.concat ile indeksleri
  • groupby hizalamak agg

prd_df = period_df.set_index(['Company', 'Quarter_End']) 

prc_df = price_df.set_index(['Company', 'Date'], drop=False) 

df = pd.concat([prd_df, prc_df], axis=1) 

df.groupby([df.index.get_level_values(0), df.Period.bfill()]) \ 
    .agg(dict(Date='last', Price='sum')).dropna() 

enter image description here