I have a numpy array of shape 5000, 9 and dtype int. I am trying to create an array of shape 5000, 5000 of dtype int that contains a count of shared elements in each pair of arrays.
I can accomplish this using itertools.combinations and a loop, but that approach is pretty slow (3-4 minutes on my machine), so I'm searching for a more efficient alternative. Any suggestions would be greatly appreciated!
from itertools import combinations
import numpy as np
# create random array where row don't have duplicates
data = np.random.rand(5000, 9).argsort(axis=0)
counts = np.zeros((5000, 9), dtype=int)
for i, j in combinations(range(len(data)), 2):
counts[i, j] = len(np.intersect1d(data[i], data[j]))
question from:
https://stackoverflow.com/questions/65557481/pairwise-count-of-common-elements-in-2d-numpy-array 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…