python - what is different between groupby.first, groupby.nth, groupby.head when as_index=False

Question

Welcome To Ask or Share your Answers For Others

python - what is different between groupby.first, groupby.nth, groupby.head when as_index=False

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - what is different between groupby.first, groupby.nth, groupby.head when as_index=False

Edit: the rookie mistake I made in string np.nan having pointed out by @coldspeed, @wen-ben, @ALollz. Answers are quite good, so I don't delete this question to keep those answers.

Original:
I have read this question/answer What's the difference between groupby.first() and groupby.head(1)?

That answer explained that the differences are on handling NaN value. However, when I call groupby with as_index=False, they both pick NaN fine.

Furthermore, Pandas has groupby.nth with similar functionality to head, and first

What are difference of groupby.first(), groupby.nth(0), groupby.head(1) with as_index=False?

Example below:

In [448]: df
Out[448]:
   A       B
0  1  np.nan
1  1       4
2  1      14
3  2       8
4  2      19
5  2      12

In [449]: df.groupby('A', as_index=False).head(1)
Out[449]:
   A       B
0  1  np.nan
3  2       8

In [450]: df.groupby('A', as_index=False).first()
Out[450]:
   A       B
0  1  np.nan
1  2       8

In [451]: df.groupby('A', as_index=False).nth(0)
Out[451]:
   A       B
0  1  np.nan
3  2       8

I saw that `firs()' resets index while the other 2 doesn't. Besides that, is there any differences?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:26+0000

The major issue is that you likely have the string 'np.nan' stored and not a real null value. Here are how the three handle null values differently:

Sample Data:

import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2,3,3], 'B': [None, '1', np.NaN, '2', 3, 4]})

`first`/`last`

This will return the first/last non-null value within each group. Oddly enough it will not skip None, though this can be made possible with the kwarg dropna=True. As a result, you may return values for columns that were part of different rows originally:

df.groupby('A', as_index=False).first()
#   A     B
#0  1  None
#1  2     2
#2  3     3

df.groupby('A', as_index=False).first(dropna=True)
#   A  B
#0  1  1
#1  2  2
#2  3  3

`head(n)`/`tail(n)`

Returns the top/bottom n rows within a group. Values remain bound within rows. If you give it an n that is more than the number of rows, it returns all rows in that group without complaining:

df.groupby('A', as_index=False).head(1)
#   A     B
#0  1  None
#2  2   NaN
#4  3     3

df.groupby('A', as_index=False).head(200)
#   A     B
#0  1  None
#1  1     1
#2  2   NaN
#3  2     2
#4  3     3
#5  3     4

`nth`

This takes the nth row, so again values remain bound within the row. .nth(0) is the same as .head(1), though they have different uses. For instance, if you need the 0th and 2nd row, that's difficult to do with .head(), but easy with .nth([0,2]). Also it's fair easier to write .head(10) than .nth(list(range(10)))).

df.groupby('A', as_index=False).nth(0)
#   A     B
#0  1  None
#2  2   NaN
#4  3     3

nth also supports dropping rows with any null-values, so you can use it to return the first row without any null-values, unlike .head()

df.groupby('A', as_index=False).nth(0, dropna='any')
#   A  B
#A      
#1  1  1
#2  2  2
#3  3  3

Categories

python - what is different between groupby.first, groupby.nth, groupby.head when as_index=False

python - what is different between groupby.first, groupby.nth, groupby.head when as_index=False

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Sample Data:

`first`/`last`

`head(n)`/`tail(n)`

`nth`

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - what is different between groupby.first, groupby.nth, groupby.head when as_index=False

python - what is different between groupby.first, groupby.nth, groupby.head when as_index=False

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Sample Data:

first/last

head(n)/tail(n)

nth

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

`first`/`last`

`head(n)`/`tail(n)`

`nth`