python - Pairwise count of common elements in 2D numpy array

Question

Welcome To Ask or Share your Answers For Others

python - Pairwise count of common elements in 2D numpy array

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pairwise count of common elements in 2D numpy array

I have a numpy array of shape 5000, 9 and dtype int. I am trying to create an array of shape 5000, 5000 of dtype int that contains a count of shared elements in each pair of arrays.

I can accomplish this using itertools.combinations and a loop, but that approach is pretty slow (3-4 minutes on my machine), so I'm searching for a more efficient alternative. Any suggestions would be greatly appreciated!

from itertools import combinations
import numpy as np

# create random array where row don't have duplicates
data = np.random.rand(5000, 9).argsort(axis=0)
counts = np.zeros((5000, 9), dtype=int)
for i, j in combinations(range(len(data)), 2):
    counts[i, j] = len(np.intersect1d(data[i], data[j]))

question from:https://stackoverflow.com/questions/65557481/pairwise-count-of-common-elements-in-2d-numpy-array

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:50:44+0000

Let's try:

# sample data with 200 unique values
np.random.seed(1)
data = np.array([np.random.choice(np.arange(200), size=9, replace=False)
                 for _ in range(5000)]
               )

# identify the unique values:
uniques = np.unique(data)

# dummy for each row
a = (data[...,None] == uniques).sum(1)

# output
out = np.einsum('ij,kj->ik',a,a)

Takes about 4.5s on my system.

Categories

python - Pairwise count of common elements in 2D numpy array

python - Pairwise count of common elements in 2D numpy array

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags