Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
359 views
in Technique[技术] by (71.8m points)

python - Pandas: Sort groups and sort within group

My dataframe df contains products which have an EAN, an earlier and later date, 'yes' and 'no' labels and values.

EAN-Unique  Date         Start  Value 
3324324     2019-04-30   no      0.11
3324324     2018-06-01   yes    56.03
asd2343     2015-03-23   yes     8.02
asd2343     2015-07-11   no      8.45
Xjkhfsd     1999-04-12   yes    12.33
Xjkhfsd     2001-02-01   no      9.11
5234XAR     2013-12-13   no     15.75
5234XAR     2000-12-13   yes     9.00
3434343     1972-05-23   yes     1.26
3434343     1980-11-01   no      2.77

I want to sort the groups of EAN-Uniques (for example 3324324 is a group, asd2343 is a group and so on) based on

  • lowest to highest value based on the earlier date and
  • within each group from earlier date to later date.

The df shall look as follows:

EAN-Unique  Date         Start  Value 
3434343     1972-05-23   yes     1.26
3434343     1980-11-01   no      2.77
asd2343     2015-03-23   yes     8.02
asd2343     2015-07-11   no      8.45
5234XAR     2000-12-13   yes     9.00
5234XAR     2013-12-13   no     15.75
Xjkhfsd     1999-04-12   yes    12.33
Xjkhfsd     2001-02-01   no      9.11
3324324     2018-06-01   yes    56.03
3324324     2019-04-30   no      0.11

My attempt was to sort it

df = df.sort_values(by=['EAN-Unique','Date','Value'], ascending=[True,True,True]).reset_index(drop=True)

But it didn't work as intended. Can anybody help me out?

Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

create an auxiliary column seq to store group order by Start Value

group_order = df.sort_values(['Start', 'Value'], ascending=[False, True])['EAN-Unique'].unique()
seq_map =  dict(zip(group_order, range(len(group_order))))
df['seq'] = df['EAN-Unique'].map(seq_map)
df.sort_values(['seq', 'Date'], inplace=True)
print(df)
  EAN-Unique        Date Start  Value  seq
8    3434343  1972-05-23   yes   1.26    0
9    3434343  1980-11-01    no   2.77    0
2    asd2343  2015-03-23   yes   8.02    1
3    asd2343  2015-07-11    no   8.45    1
7    5234XAR  2000-12-13   yes   9.00    2
6    5234XAR  2013-12-13    no  15.75    2
4    Xjkhfsd  1999-04-12   yes  12.33    3
5    Xjkhfsd  2001-02-01    no   9.11    3
1    3324324  2018-06-01   yes  56.03    4
0    3324324  2019-04-30    no   0.11    4

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...