Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
298 views
in Technique[技术] by (71.8m points)

python - Pandas: assign an index to each group identified by groupby

When using groupby(), how can I create a DataFrame with a new column containing an index of the group number, similar to dplyr::group_indices in R. For example, if I have

>>> df=pd.DataFrame({'a':[1,1,1,2,2,2],'b':[1,1,2,1,1,2]})
>>> df
   a  b
0  1  1
1  1  1
2  1  2
3  2  1
4  2  1
5  2  2

How can I get a DataFrame like

   a  b  idx
0  1  1  1
1  1  1  1
2  1  2  2
3  2  1  3
4  2  1  3
5  2  2  4

(the order of the idx indexes doesn't matter)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is the solution using ngroup (available as of pandas 0.20.2) from a comment above by Constantino, for those still looking for this function (the equivalent of dplyr::group_indices in R, or egen group() in Stata if you were trying to search with those keywords like me). This is also about 25% faster than the solution given by maxliving according to my own timing.

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,1,1,2,2,2],'b':[1,1,2,1,1,2]})
>>> df['idx'] = df.groupby(['a', 'b']).ngroup()
>>> df
   a  b  idx
0  1  1    0
1  1  1    0
2  1  2    1
3  2  1    2
4  2  1    2
5  2  2    3

>>> %timeit df['idx'] = create_index_usingduplicated(df, grouping_cols=['a', 'b'])
1.83 ms ± 67.2 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['idx'] = df.groupby(['a', 'b']).ngroup()
1.38 ms ± 30 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...