Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
466 views
in Technique[技术] by (71.8m points)

python - Pandas Percentage count on a DataFrame groupby

I have a DataFrame (mydf) along the lines of the following:

Index   Feature ID  Stuff1  Stuff2
1       True    1   23      12
2       True    1   54      12
3       False   0   45      67
4       True    0   38      29
5       False   1   32      24
6       False   1   59      39
7       True    0   37      32
8       False   0   76      65
9       False   1   32      12
10      True    0   23      15
..n     True    1   21      99

I am trying to calculate the True and False percentages of the Feature for each ID (0 or 1), and I am looking for two output for each ID:

Feature ID  Percent
True    1   20%
False   1   30%

Feature ID  Percent
True    0   30%
False   0   20%

I have tried a few attempts, but I start getting counts for all columns and then a percentage for all columns.

Here's my bad attempt:

percentageID0 = mydf[ mydf['ID']==0 ].set_index(['Feature']).count()
percentageID1 = mydf[ mydf['ID']==1 ].set_index(['Feature']).count()
fullcount = (mydf.groupby(['ID']).count()).sum()

print (percentageID0/fullcount) * 100
print (percentageID1/fullcount) * 100

Think I am getting mixed up with the groupby/index format.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Could be just this:

In [73]:

print pd.DataFrame({'Percentage': df.groupby(('ID', 'Feature')).size() / len(df)})
            Percentage
ID Feature            
0  False           0.2
   True            0.3
1  False           0.3
   True            0.2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...