python - Pandas: Difference between largest and smallest value within group

Question

Welcome To Ask or Share your Answers For Others

python - Pandas: Difference between largest and smallest value within group

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas: Difference between largest and smallest value within group

Given a data frame that looks like this

GROUP VALUE
  1     5
  2     2
  1     10
  2     20
  1     7

I would like to compute the difference between the largest and smallest value within each group. That is, the result should be

GROUP   DIFF
  1      5
  2      18

What is an easy way to do this in Pandas?

What is a fast way to do this in Pandas for a data frame with about 2 million rows and 1 million groups?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:10:20+0000

Using @unutbu 's df

per timing
unutbu's solution is best over large data sets

import pandas as pd
import numpy as np

df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})

df.groupby('GROUP')['VALUE'].agg(np.ptp)

GROUP
1     5
2    18
Name: VALUE, dtype: int64

np.ptp docs returns the range of an array

timing
small df

large df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))

large df
many groups
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))

Categories

python - Pandas: Difference between largest and smallest value within group

python - Pandas: Difference between largest and smallest value within group

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags