Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
128 views
in Technique[技术] by (71.8m points)

python - how to use a column of df in a pivot table more than once

I have a dataframe like this but I need to convert it to a pivot table like the below one. to sum up, I need to use item columns more than one in a pivot table. I have tried to use aggfunc but how can I define it for items themselves. Could anyone please give a trick about that?

index item interval transaction
0 a x1 1
1 a x2 2
2 b x1 2
question from:https://stackoverflow.com/questions/65541032/how-to-use-a-column-of-df-in-a-pivot-table-more-than-once

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The first step is to obtain the information you want in a natural way ("natural": easy to express in pandas, e.g. using pivot_table() or groupby()). In order to make the full product of interval x item (with 0 for missing pairs), you may use:

df.pivot_table(index='interval', columns='item', values='transaction',
               aggfunc=sum, fill_value=0)

# out:
item      a  b
interval      
x1        1  2
x2        2  0

The trick however is how to reshape this into the specific format you asked for. This will involve duplicating the 'item' column or level (something that pandas, understandably, is not particularly fond of). The following is the full operation in one chained sequence:

df2 = (df
     .pivot_table(index='interval', columns='item', values='transaction',
                  aggfunc=sum, fill_value=0)
     .stack().to_frame('count')
     .reset_index('item').set_index('item', append=True, drop=False)
     .unstack('interval').swaplevel(axis=1)
     .sort_index(axis=1, ascending=[True, False])
     .reset_index(drop=True)
    )

# df2:
interval   x1         x2      
         item count item count
0           a     1    a     2
1           b     2    b     0

You can comment out from the end to see the various stages. Let's break this down line by line after the pivot_table:

Move item to level-1 multiindex and rename the sum as 'count'

...     .stack().to_frame('count')
               count
interval item       
x1       a         1
         b         2
x2       a         2
         b         0

Duplicate the item column (in order to unstack later):

...     .reset_index('item').set_index('item', append=True, drop=False)
              item  count
interval item            
x1       a       a      1
         b       b      2
x2       a       a      2
         b       b      0

Unstack the interval, and swap the levels of the new multiindex columns (note: that's why we needed to duplicate item: otherwise unstack() would operate on a regular index (not MultiIndex), and as such would convert to a Series):

...     .unstack('interval').swaplevel(axis=1)
interval   x1   x2    x1    x2
         item item count count
item                          
a           a    a     1     2
b           b    b     2     0

Finally, sort the columns MultiIndex and drop the (now useless) index:

...     .sort_index(axis=1, ascending=[True, False])
...     .reset_index(drop=True)
interval   x1         x2      
         item count item count
0           a     1    a     2
1           b     2    b     0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...