Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
310 views
in Technique[技术] by (71.8m points)

python - Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?

I'm getting some efficiency test results that I can't explain.

I want to assemble a matrix B whose i-th entries B[i,:,:] = A[i,:,:].dot(x), where each A[i,:,:] is a 2D matrix, and so is x.

I can do this three ways, to test performance I make random (numpy.random.randn) matrices A = (10,1000,1000), x = (1000,1200). I get the following time results:

(1) single multidimensional dot product

B = A.dot(x)

total time: 102.361 s

(2) looping through i and performing 2D dot products

   # initialize B = np.zeros([dim1, dim2, dim3])
   for i in range(A.shape[0]):
       B[i,:,:] = A[i,:,:].dot(x)

total time: 0.826 s

(3) numpy.einsum

B3 = np.einsum("ijk, kl -> ijl", A, x)

total time: 8.289 s

So, option (2) is the fastest by far. But, considering just (1) and (2), I don't see the big difference between them. How can looping through and doing 2D dot products be ~ 124 times faster? They both use numpy.dot. Any insights?

I include the code used for the above results just below:

import numpy as np
import numpy.random as npr
import time

dim1, dim2, dim3 = 10, 1000, 1200
A = npr.randn(dim1, dim2, dim2)
x = npr.randn(dim2, dim3)

# consider three ways of assembling the same matrix B: B1, B2, B3

t = time.time()
B1 = np.dot(A,x)
td1 = time.time() - t
print "a single dot product of A [shape = (%d, %d, %d)] with x [shape = (%d, %d)] completes in %.3f s" 
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td1)


B2 = np.zeros([A.shape[0], x.shape[0], x.shape[1]])
t = time.time()
for i in range(A.shape[0]):
    B2[i,:,:] = np.dot(A[i,:,:], x)
td2 = time.time() - t
print "taking %d dot products of 2D dot products A[i,:,:] [shape = (%d, %d)] with x [shape = (%d, %d)] completes in %.3f s" 
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td2)

t = time.time()
B3 = np.einsum("ijk, kl -> ijl", A, x)
td3 = time.time() - t
print "using np.einsum, it completes in %.3f s" % td3
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With smaller dims 10,100,200, I get a similar ranking

In [355]: %%timeit
   .....: B=np.zeros((N,M,L))
   .....: for i in range(N):
              B[i,:,:]=np.dot(A[i,:,:],x)
   .....: 
10 loops, best of 3: 22.5 ms per loop
In [356]: timeit np.dot(A,x)
10 loops, best of 3: 44.2 ms per loop
In [357]: timeit np.einsum('ijk,km->ijm',A,x)
10 loops, best of 3: 29 ms per loop

In [367]: timeit np.dot(A.reshape(-1,M),x).reshape(N,M,L)
10 loops, best of 3: 22.1 ms per loop

In [375]: timeit np.tensordot(A,x,(2,0))
10 loops, best of 3: 22.2 ms per loop

the itererative is faster, though not by as much as in your case.

This is probably true as long as that iterating dimension is small compared to the other ones. In that case the overhead of iteration (function calls etc) is small compared to the calculation time. And doing all the values at once uses more memory.

I tried a dot variation where I reshaped A into 2d, thinking that dot does that kind of reshaping internally. I'm surprised that it is actually fastest. tensordot is probably doing the same reshaping (that code if Python readable).


einsum sets up a 'sum of products' iteration involving 4 variables, the i,j,k,m - that is dim1*dim2*dim2*dim3 steps with the C level nditer. So the more indices you have the larger the iteration space.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...