Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
272 views
in Technique[技术] by (71.8m points)

pandas - Find duplicate values in two arrays, Python

I have two arrays (A and B) with about 50 000 values in each. Every value represents an ID. I want to create a pandas dataframe with three columns, col1: values from array A, col2: values from array B, col3: a string with the labels "unique" or "duplicate". In each array the ID:s are unique.

The arrays is of different length. So I can't do something like this to get started.

a = np.array([1, 2, 3, 4, 5])
a = np.array([5, 6, 7, 8, 9, 10])
pd.DataFrame({'a':a, 'a':b})

I was then thinking to create a different pandas dataframe, also with three columns. One for ID, another for which array the ID comes from (a or b). And thereafter group on ID and count occurrences. if >=2 then we have a duplicate.

But I couldn’t figure out how get to numpy arrays after one another in the same column (like rbind in R) and at the same time create the other column based on which array the value come from.

Most likely there are far better solutions that those I have suggested above. Any ideas?

question from:https://stackoverflow.com/questions/65923114/find-duplicate-values-in-two-arrays-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For finding duplicate elements in two arrays, use numpy.intersect1d:

In [458]: a = np.array([1, 2, 3, 4, 5])

In [459]: b = np.array([5, 6, 7, 8, 9, 10])

In [462]: np.intersect1d(a,b)
Out[462]: array([5])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...