Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
680 views
in Technique[技术] by (71.8m points)

python - How to drop unique rows in a pandas dataframe?

I am stuck with a seemingly easy problem: dropping unique rows in a pandas dataframe. Basically, the opposite of drop_duplicates().

Let's say this is my data:

    A       B   C  
0   foo     0   A
1   foo     1   A
2   foo     1   B
3   bar     1   A

I would like to drop the rows when A, and B are unique, i.e. I would like to keep only the rows 1 and 2.

I tried the following:

# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates()
duplicates = df[~df.index.isin(uniques.index)]

But I only get the row 2, as 0, 1, and 3 are in the uniques!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Solutions for select all duplicated rows:

You can use duplicated with subset and parameter keep=False for select all duplicates:

df = df[df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

Solution with transform:

df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

A bit modified solutions for select all unique rows:

#invert boolean mask by ~
df = df[~df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...