python - Pandas, group dataframe and normalize values in each group

Question

Welcome To Ask or Share your Answers For Others

python - Pandas, group dataframe and normalize values in each group

posted Feb 5, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas, group dataframe and normalize values in each group

I have a csv file with different groups identified by an ID, something like:

ID,X
aaa,3
aaa,5
aaa,4
bbb,50
bbb,54
bbb,52

I need to:

calculate the mean of x in each group;
divide each value of x by the mean of x for that specific group.

So, in my example above, the mean in the 'aaa' group is 4, while in 'bbb' it's 52. I need to obtain a new dataframe with a third column, where in each row I have the original value of x divided by the group average:

ID,X,x/group_mean
aaa,3,3/4
aaa,5,5/4
aaa,4,4/4
bbb,50,50/52
bbb,54,54/52
bbb,52,52/52

I can group the dataframe and calcualte each group's mean by:

    df_data = pd.read_csv('test.csv', index_col=0)
    df_grouped = df_data.groupby('ID')
    for group_name, group_content in df_grouped:
        mean_x_group = group_content['x'].mean()
        print(f'mean = {mean_x_group}')

but how do I add the third column?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-05T04:23:46+0000

Use Groupby.transform:

In [1874]: df['mean']  = df.groupby('ID').transform('mean')

In [1879]: df['newcol'] = df.X.div(df['mean'])

In [1880]: df
Out[1880]: 
    ID   X  mean    newcol
0  aaa   3     4  0.750000
1  aaa   5     4  1.250000
2  aaa   4     4  1.000000
3  bbb  50    52  0.961538
4  bbb  54    52  1.038462
5  bbb  52    52  1.000000

Categories

python - Pandas, group dataframe and normalize values in each group

python - Pandas, group dataframe and normalize values in each group

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags