Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
280 views
in Technique[技术] by (71.8m points)

python - Efficient way to compute the Vandermonde matrix

I'm calculating Vandermonde matrix for a fairly large 1D array. The natural and clean way to do this is using np.vander(). However, I found that this is approx. 2.5x slower than a list comprehension based approach.

In [43]: x = np.arange(5000)
In [44]: N = 4

In [45]: %timeit np.vander(x, N, increasing=True)
155 μs ± 205 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# one of the listed approaches from the documentation
In [46]: %timeit np.flip(np.column_stack([x**(N-1-i) for i in range(N)]), axis=1)
65.3 μs ± 235 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [47]: np.all(np.vander(x, N, increasing=True) == np.flip(np.column_stack([x**(N-1-i) for i in range(N)]), axis=1))
Out[47]: True

I'm trying to understand where the bottleneck is and the reason why does the implementation of native np.vander() is ~ 2.5x slower.

Efficiency matters for my implementation. So, even faster alternatives are also welcome!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

How about broadcasted exponentiation?

%timeit (x ** np.arange(N)[:, None]).T
43 μs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Sanity check -

np.all((x ** np.arange(N)[:, None]).T == np.vander(x, N, increasing=True))
True

The caveat here is that this speedup is possible only if your input array x has a dtype of int. As @Warren Weckesser pointed out in a comment, the broadcasted exponentiation slows down for floating point arrays.


As for why np.vander is slow, take a look at the source code -

x = asarray(x)
if x.ndim != 1:
    raise ValueError("x must be a one-dimensional array or sequence.")
if N is None:
    N = len(x)

v = empty((len(x), N), dtype=promote_types(x.dtype, int))
tmp = v[:, ::-1] if not increasing else v

if N > 0:
    tmp[:, 0] = 1
if N > 1:
    tmp[:, 1:] = x[:, None]
    multiply.accumulate(tmp[:, 1:], out=tmp[:, 1:], axis=1)

return v

The function has to cater to a lot more use cases besides yours, so it uses a more generalized method of computation which is reliable, but slower (I'm specifically pointing to multiply.accumulate).


As a matter of interest, I found another way of computing the Vandermonde matrix, ending up with this:

%timeit x[:, None] ** np.arange(N)
150 μs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

It does the same thing, but is so much slower. The answer lies in the fact that the operations are broadcast, but inefficiently.

On the flip side, for float arrays, this actually ends up performing the best.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...