python - how to use a column of df in a pivot table more than once

Question

Welcome To Ask or Share your Answers For Others

python - how to use a column of df in a pivot table more than once

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - how to use a column of df in a pivot table more than once

I have a dataframe like this but I need to convert it to a pivot table like the below one. to sum up, I need to use item columns more than one in a pivot table. I have tried to use aggfunc but how can I define it for items themselves. Could anyone please give a trick about that?

index	item	interval	transaction
0	a	x1	1
1	a	x2	2
2	b	x1	2

question from:https://stackoverflow.com/questions/65541032/how-to-use-a-column-of-df-in-a-pivot-table-more-than-once

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:52:47+0000

The first step is to obtain the information you want in a natural way ("natural": easy to express in pandas, e.g. using pivot_table() or groupby()). In order to make the full product of interval x item (with 0 for missing pairs), you may use:

df.pivot_table(index='interval', columns='item', values='transaction',
               aggfunc=sum, fill_value=0)

# out:
item      a  b
interval      
x1        1  2
x2        2  0

The trick however is how to reshape this into the specific format you asked for. This will involve duplicating the 'item' column or level (something that pandas, understandably, is not particularly fond of). The following is the full operation in one chained sequence:

df2 = (df
     .pivot_table(index='interval', columns='item', values='transaction',
                  aggfunc=sum, fill_value=0)
     .stack().to_frame('count')
     .reset_index('item').set_index('item', append=True, drop=False)
     .unstack('interval').swaplevel(axis=1)
     .sort_index(axis=1, ascending=[True, False])
     .reset_index(drop=True)
    )

# df2:
interval   x1         x2      
         item count item count
0           a     1    a     2
1           b     2    b     0

You can comment out from the end to see the various stages. Let's break this down line by line after the pivot_table:

Move item to level-1 multiindex and rename the sum as 'count'

...     .stack().to_frame('count')
               count
interval item       
x1       a         1
         b         2
x2       a         2
         b         0

Duplicate the item column (in order to unstack later):

...     .reset_index('item').set_index('item', append=True, drop=False)
              item  count
interval item            
x1       a       a      1
         b       b      2
x2       a       a      2
         b       b      0

Unstack the interval, and swap the levels of the new multiindex columns (note: that's why we needed to duplicate item: otherwise unstack() would operate on a regular index (not MultiIndex), and as such would convert to a Series):

...     .unstack('interval').swaplevel(axis=1)
interval   x1   x2    x1    x2
         item item count count
item                          
a           a    a     1     2
b           b    b     2     0

Finally, sort the columns MultiIndex and drop the (now useless) index:

...     .sort_index(axis=1, ascending=[True, False])
...     .reset_index(drop=True)
interval   x1         x2      
         item count item count
0           a     1    a     2
1           b     2    b     0

Categories

python - how to use a column of df in a pivot table more than once

python - how to use a column of df in a pivot table more than once

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags