python - get first and last values in a groupby

Question

Welcome To Ask or Share your Answers For Others

python - get first and last values in a groupby

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - get first and last values in a groupby

I have a dataframe df

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
                   ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']],
                  ['X', 'Y'])

How do I get the first and last rows, grouped by the first level of the index?

I tried

df.groupby(level=0).agg(['first', 'last']).stack()

and got

          X   Y
a first   0   1
  last    6   7
b first   8   9
  last   12  13
c first  14  15
  last   16  17
d first  18  19
  last   18  19

This is so close to what I want. How can I preserve the level 1 index and get this instead:

      X   Y
a a   0   1
  d   6   7
b e   8   9
  g  12  13
c h  14  15
  i  16  17
d j  18  19
  j  18  19

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:06:43+0000

Option 1

def first_last(df):
    return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)

Option 2 - only works if index is unique

idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
df.loc[idx]

Option 3 - per notes below, this only makes sense when there are no NAs

I also abused the agg function. The code below works, but is far uglier.

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() 
    .set_index('level_1', append=True).reset_index(1, drop=True) 
    .rename_axis([None, None])

Note

per @unutbu: agg(['first', 'last']) take the firs non-na values.

I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.

Let's include another test

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [list('aaaabbbccd'),
                   list('abcdefghij')],
                  list('XY'))

df.loc[tuple('aa'), 'X'] = np.nan

def first_last(df):
    return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() 
    .set_index('level_1', append=True).reset_index(1, drop=True) 
    .rename_axis([None, None])

Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.

Categories

python - get first and last values in a groupby

python - get first and last values in a groupby

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Option 1

Option 2 - only works if index is unique

Option 3 - per notes below, this only makes sense when there are no NAs

Note

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags