2016-03-28 13 views
1

Pandalarla önceden işlem yapmak istediğim bir veri kümem var. Bu, o veri kümesinin iki satır bir örnek:Csv dosyasını pandalar ile yeniden biçimlendirme (değerleri kümeleri sütunlara dönüştür, başka bir grubu eşleşen değerlerle eşleştir)

| text | rank | date  | provinces.0 | provinces.1  | provinces.2  | provinces.3 | provinces.4 | provinces.5 | provinces.6    | provinces.7    | provinces.8 | provinces.9   | provinces.10 | provinceFrequency.0 | provinceFrequency.1 | provinceFrequency.2 | provinceFrequency.3 | provinceFrequency.4 | provinceFrequency.5 | provinceFrequency.6 | provinceFrequency.7 | provinceFrequency.8 | provinceFrequency.9 | provinceFrequency.10 | | 
|--------|------|-----------|-------------|------------------|------------------|-------------|-------------|---------------|---------------------------|---------------------------|--------------|----------------------|--------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|----------------------|---| 
| Topic1 | 0 | 9/13/2015 | Ontario  | Quebec   | British Columbia | Alberta  | Manitoba | Saskatchewan | Newfoundland and Labrador |       |    |      |    | 192     | 378     | 8     | 10     | 1     | 1     | 1     |      |      |      |      | | 
| Topic2 | 1 | 9/13/2015 | Manitoba | British Columbia | Alberta   | Ontario  | Nova Scotia | New Brunswick | Quebec     | Newfoundland and Labrador | Saskatchewan | Prince Edward Island | Nunavut  | 7     | 61     | 51     | 112     | 7     | 8     | 11     | 2     | 2     | 1     | 2     | | 
|  |  |   |    |     |     |    |    |    |       |       |    |      |    |      |      |      |      |      |      |      |      |      |      |      | | 

farklı ilde Kanada'da tweeter konuyu gidişli, 'provinces.0' ile kolonlar ilin adını ve 'provinceFrequency.0' işaret içeren bu veri kümesi sayısını gösterir Bu ilin bu trend konusu için isabetlerin listesi.

bu biçime dönüştürmek istediğiniz

:

| Topic | Date  | Ontario | Quebec | Nova Scotia | New Brunswick | Manitoba | British Columbia | Prince Edward Island | Saskatchewan | Alberta | Newfoundland and Labrador | 
|--------|-----------|---------|--------|-------------|---------------|----------|------------------|----------------------|--------------|---------|---------------------------| 
| Topic1 | 9/13/2015 | 192  | 378 | -   | -    | 1  | 8    | -     | 1   | 10  | 1       | 

Ben yerli Python kodu ve kod dolu yapabilir, pandalar bunu yapmak için herhangi bir kolay yolu var mı?

+1

@moderators, IMO bu soru oldukça zorlu, bu yüzden SO – MaxU

+0

'da __keep__ oylamasına oy verir. Hedefe kaynak eşleştirmeleri sağladıysanız yardımcı olur. Aksi halde okuyucular çok yatay olarak kaydırmalı ve bu eşlemeleri türetmelidir. – J0e3gan

cevap

2

Bu görevdi zor biraz:

import pandas as pd 

df = pd.read_csv(r'D:\download\Sheet1.csv') 

# `id_vars` helper list for `melt()` 
id_vars = df.columns[df.columns.str.contains('provinces\.')].tolist() 

# `value_vars` helper list for `melt()` 
val_vars = df.columns[df.columns.str.contains('provinceFrequency\.')].tolist() 

mlt = pd.melt(df, id_vars=id_vars, value_vars=val_vars) 

mlt['variable'] = mlt['variable'].str.replace(r'provinceFrequency', 'provinces') 

# add column with the _correct_ province 
mlt['prov'] = mlt.apply(lambda row: row[row['variable']], axis=1) 

new = mlt[['prov', 'value']].reset_index() 

# free memory 
del mlt 

# set original df's index for joining in future 
new['idx'] = new['index']%len(df) 

# pivot (convert rows to columns) 
pvt = pd.pivot_table(new, index='idx', columns='prov', values='value', aggfunc='first') 

# free memory 
del new 

# join original `DF` with the pivoted DF `PVT` using index 
rslt = df[['text','rank','date']].join(pvt) 

print(rslt) 

Çıktı: adım

 text rank  date Alberta British Columbia Manitoba \ 
0 Topic1  0 9/13/2015  10.0    8.0  1.0 
1 Topic2  1 9/13/2015  51.0    61.0  7.0 

    New Brunswick Newfoundland and Labrador Nova Scotia Nunavut Ontario \ 
0   NaN      1.0   NaN  NaN 192.0 
1   8.0      2.0   7.0  2.0 112.0 

    Prince Edward Island Quebec Saskatchewan 
0     NaN 378.0   1.0 
1     1.0 11.0   2.0 

Adım:

In [263]: id_vars 
Out[263]: 
['provinces.0', 
'provinces.1', 
'provinces.2', 
'provinces.3', 
'provinces.4', 
'provinces.5', 
'provinces.6', 
'provinces.7', 
'provinces.8', 
'provinces.9', 
'provinces.10'] 

In [264]: val_vars 
Out[264]: 
['provinceFrequency.0', 
'provinceFrequency.1', 
'provinceFrequency.2', 
'provinceFrequency.3', 
'provinceFrequency.4', 
'provinceFrequency.5', 
'provinceFrequency.6', 
'provinceFrequency.7', 
'provinceFrequency.8', 
'provinceFrequency.9', 
'provinceFrequency.10'] 

In [265]: mlt 
Out[265]: 
    provinces.0  provinces.1  provinces.2 provinces.3 provinces.4 \ 
0  Ontario   Quebec British Columbia  Alberta  Manitoba 
1  Manitoba British Columbia   Alberta  Ontario Nova Scotia 
2  Ontario   Quebec British Columbia  Alberta  Manitoba 
3  Manitoba British Columbia   Alberta  Ontario Nova Scotia 
4  Ontario   Quebec British Columbia  Alberta  Manitoba 
5  Manitoba British Columbia   Alberta  Ontario Nova Scotia 
6  Ontario   Quebec British Columbia  Alberta  Manitoba 
7  Manitoba British Columbia   Alberta  Ontario Nova Scotia 
8  Ontario   Quebec British Columbia  Alberta  Manitoba 
9  Manitoba British Columbia   Alberta  Ontario Nova Scotia 
10  Ontario   Quebec British Columbia  Alberta  Manitoba 
11 Manitoba British Columbia   Alberta  Ontario Nova Scotia 
12  Ontario   Quebec British Columbia  Alberta  Manitoba 
13 Manitoba British Columbia   Alberta  Ontario Nova Scotia 
14  Ontario   Quebec British Columbia  Alberta  Manitoba 
15 Manitoba British Columbia   Alberta  Ontario Nova Scotia 
16  Ontario   Quebec British Columbia  Alberta  Manitoba 
17 Manitoba British Columbia   Alberta  Ontario Nova Scotia 
18  Ontario   Quebec British Columbia  Alberta  Manitoba 
19 Manitoba British Columbia   Alberta  Ontario Nova Scotia 
20  Ontario   Quebec British Columbia  Alberta  Manitoba 
21 Manitoba British Columbia   Alberta  Ontario Nova Scotia 

     provinces.5    provinces.6    provinces.7 \ 
0 Saskatchewan Newfoundland and Labrador      NaN 
1 New Brunswick      Quebec Newfoundland and Labrador 
2 Saskatchewan Newfoundland and Labrador      NaN 
3 New Brunswick      Quebec Newfoundland and Labrador 
4 Saskatchewan Newfoundland and Labrador      NaN 
5 New Brunswick      Quebec Newfoundland and Labrador 
6 Saskatchewan Newfoundland and Labrador      NaN 
7 New Brunswick      Quebec Newfoundland and Labrador 
8 Saskatchewan Newfoundland and Labrador      NaN 
9 New Brunswick      Quebec Newfoundland and Labrador 
10 Saskatchewan Newfoundland and Labrador      NaN 
11 New Brunswick      Quebec Newfoundland and Labrador 
12 Saskatchewan Newfoundland and Labrador      NaN 
13 New Brunswick      Quebec Newfoundland and Labrador 
14 Saskatchewan Newfoundland and Labrador      NaN 
15 New Brunswick      Quebec Newfoundland and Labrador 
16 Saskatchewan Newfoundland and Labrador      NaN 
17 New Brunswick      Quebec Newfoundland and Labrador 
18 Saskatchewan Newfoundland and Labrador      NaN 
19 New Brunswick      Quebec Newfoundland and Labrador 
20 Saskatchewan Newfoundland and Labrador      NaN 
21 New Brunswick      Quebec Newfoundland and Labrador 

    provinces.8   provinces.9 provinces.10  variable value \ 
0   NaN     NaN   NaN provinces.0 192.0 
1 Saskatchewan Prince Edward Island  Nunavut provinces.0 7.0 
2   NaN     NaN   NaN provinces.1 378.0 
3 Saskatchewan Prince Edward Island  Nunavut provinces.1 61.0 
4   NaN     NaN   NaN provinces.2 8.0 
5 Saskatchewan Prince Edward Island  Nunavut provinces.2 51.0 
6   NaN     NaN   NaN provinces.3 10.0 
7 Saskatchewan Prince Edward Island  Nunavut provinces.3 112.0 
8   NaN     NaN   NaN provinces.4 1.0 
9 Saskatchewan Prince Edward Island  Nunavut provinces.4 7.0 
10   NaN     NaN   NaN provinces.5 1.0 
11 Saskatchewan Prince Edward Island  Nunavut provinces.5 8.0 
12   NaN     NaN   NaN provinces.6 1.0 
13 Saskatchewan Prince Edward Island  Nunavut provinces.6 11.0 
14   NaN     NaN   NaN provinces.7 NaN 
15 Saskatchewan Prince Edward Island  Nunavut provinces.7 2.0 
16   NaN     NaN   NaN provinces.8 NaN 
17 Saskatchewan Prince Edward Island  Nunavut provinces.8 2.0 
18   NaN     NaN   NaN provinces.9 NaN 
19 Saskatchewan Prince Edward Island  Nunavut provinces.9 1.0 
20   NaN     NaN   NaN provinces.10 NaN 
21 Saskatchewan Prince Edward Island  Nunavut provinces.10 2.0 

         prov 
0      Ontario 
1     Manitoba 
2      Quebec 
3   British Columbia 
4   British Columbia 
5      Alberta 
6      Alberta 
7      Ontario 
8     Manitoba 
9     Nova Scotia 
10    Saskatchewan 
11    New Brunswick 
12 Newfoundland and Labrador 
13      Quebec 
14      NaN 
15 Newfoundland and Labrador 
16      NaN 
17    Saskatchewan 
18      NaN 
19  Prince Edward Island 
20      NaN 
21     Nunavut 

In [269]: mlt[['prov', 'value']].reset_index() 
Out[269]: 
    index      prov value 
0  0     Ontario 192.0 
1  1     Manitoba 7.0 
2  2      Quebec 378.0 
3  3   British Columbia 61.0 
4  4   British Columbia 8.0 
5  5     Alberta 51.0 
6  6     Alberta 10.0 
7  7     Ontario 112.0 
8  8     Manitoba 1.0 
9  9    Nova Scotia 7.0 
10  10    Saskatchewan 1.0 
11  11    New Brunswick 8.0 
12  12 Newfoundland and Labrador 1.0 
13  13      Quebec 11.0 
14  14      NaN NaN 
15  15 Newfoundland and Labrador 2.0 
16  16      NaN NaN 
17  17    Saskatchewan 2.0 
18  18      NaN NaN 
19  19  Prince Edward Island 1.0 
20  20      NaN NaN 
21  21     Nunavut 2.0 

In [270]: # set original index for joining in future 

In [271]: new['idx'] = new['index']%len(df) 

In [272]: new 
Out[272]: 
    index      prov value idx 
0  0     Ontario 192.0 0 
1  1     Manitoba 7.0 1 
2  2      Quebec 378.0 0 
3  3   British Columbia 61.0 1 
4  4   British Columbia 8.0 0 
5  5     Alberta 51.0 1 
6  6     Alberta 10.0 0 
7  7     Ontario 112.0 1 
8  8     Manitoba 1.0 0 
9  9    Nova Scotia 7.0 1 
10  10    Saskatchewan 1.0 0 
11  11    New Brunswick 8.0 1 
12  12 Newfoundland and Labrador 1.0 0 
13  13      Quebec 11.0 1 
14  14      NaN NaN 0 
15  15 Newfoundland and Labrador 2.0 1 
16  16      NaN NaN 0 
17  17    Saskatchewan 2.0 1 
18  18      NaN NaN 0 
19  19  Prince Edward Island 1.0 1 
20  20      NaN NaN 0 
21  21     Nunavut 2.0 1 

In [273]: pvt = pd.pivot_table(new, index='idx', columns='prov', values='value', aggfunc='first') 

In [274]: pvt 
Out[274]: 
prov Alberta British Columbia Manitoba New Brunswick \ 
idx 
0  10.0    8.0  1.0   NaN 
1  51.0    61.0  7.0   8.0 

prov Newfoundland and Labrador Nova Scotia Nunavut Ontario \ 
idx 
0       1.0   NaN  NaN 192.0 
1       2.0   7.0  2.0 112.0 

prov Prince Edward Island Quebec Saskatchewan 
idx 
0      NaN 378.0   1.0 
1      1.0 11.0   2.0 

In [275]: rslt = df[['text','rank','date']].join(pvt) 

In [276]: rslt 
Out[276]: 
    text rank  date Alberta British Columbia Manitoba \ 
0 Topic1  0 9/13/2015  10.0    8.0  1.0 
1 Topic2  1 9/13/2015  51.0    61.0  7.0 

    New Brunswick Newfoundland and Labrador Nova Scotia Nunavut Ontario \ 
0   NaN      1.0   NaN  NaN 192.0 
1   8.0      2.0   7.0  2.0 112.0 

    Prince Edward Island Quebec Saskatchewan 
0     NaN 378.0   1.0 
1     1.0 11.0   2.0 
+0

Bu kesinlikle muhteşemdi! – Iman

+0

@Iman, yardım edebildiğime sevindim. Sorunuzu düzenleyebilir ve örnek girdi verilerinizi ve beklenen çıktınızı sorunuzun içine metin olarak ekleyebilir misiniz? Belki de moderatörler bu soruyu kapatmazlar. Bence başkalarına da yardımcı olabileceğini umuyorum ... – MaxU

+0

Bunu http://www.tablesgenerator.com/markdown_tables kullanarak yaptım, yardımın için teşekkürler, Bu soru için daha iyi bir başlık biliyorsan, ben hazırım başlığı – Iman

İlgili konular