Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
454 views
in Technique[技术] by (71.8m points)

python - Pairwise count of common elements in 2D numpy array

I have a numpy array of shape 5000, 9 and dtype int. I am trying to create an array of shape 5000, 5000 of dtype int that contains a count of shared elements in each pair of arrays.

I can accomplish this using itertools.combinations and a loop, but that approach is pretty slow (3-4 minutes on my machine), so I'm searching for a more efficient alternative. Any suggestions would be greatly appreciated!

from itertools import combinations
import numpy as np

# create random array where row don't have duplicates
data = np.random.rand(5000, 9).argsort(axis=0)
counts = np.zeros((5000, 9), dtype=int)
for i, j in combinations(range(len(data)), 2):
    counts[i, j] = len(np.intersect1d(data[i], data[j]))

question from:https://stackoverflow.com/questions/65557481/pairwise-count-of-common-elements-in-2d-numpy-array

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Let's try:

# sample data with 200 unique values
np.random.seed(1)
data = np.array([np.random.choice(np.arange(200), size=9, replace=False)
                 for _ in range(5000)]
               )

# identify the unique values:
uniques = np.unique(data)

# dummy for each row
a = (data[...,None] == uniques).sum(1)

# output
out = np.einsum('ij,kj->ik',a,a)

Takes about 4.5s on my system.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...