Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
117 views
in Technique[技术] by (71.8m points)

python - pandas sort a column by values in another column

I have a dataset that I want to sort and assign rank based on it.

Suppose it has two columns, one is year and the other is the column that I want to sort.

import pandas as pd
data = {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([5, 10, 4, 1])}
df = pd.DataFrame(data)

I want to sort the column 'value' by each year and then give rank to it. What I would like to have is

data2= {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([10, 5, 4, 1]),  
        'rank': pd.Series([1, 2, 1, 2]}
df2=pd.DataFrame(data2)

>>> df2
   rank  value  year
0     1     10  2006
1     2      5  2006
2     1      4  2007
3     2      1  2007
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use groupby and then use rank (with ascending=False to get the largest values first). You don't need to sort in the groupby, as the result is indexed to the dataframe (slightly faster performance).

df['yearly_rank'] = df.groupby('year', sort=False)['value'].rank(ascending=False)

>>> df.sort_values(['year', 'yearly_rank'])
   value  year  yearly_rank
1     10  2006            1
0      5  2006            2
2      4  2007            1
3      1  2007            2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...