python - numerically stable way to multiply log probability matrices in numpy

Question

Welcome To Ask or Share your Answers For Others

python - numerically stable way to multiply log probability matrices in numpy

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - numerically stable way to multiply log probability matrices in numpy

I need to take the matrix product of two NumPy matrices (or other 2d arrays) containing log probabilities. The naive way np.log(np.dot(np.exp(a), np.exp(b))) is not preferred for obvious reasons.

Using

from scipy.misc import logsumexp
res = np.zeros((a.shape[0], b.shape[1]))
for n in range(b.shape[1]):
    # broadcast b[:,n] over rows of a, sum columns
    res[:, n] = logsumexp(a + b[:, n].T, axis=1)

works but runs about 100 times slower than np.log(np.dot(np.exp(a), np.exp(b)))

Using

logsumexp((tile(a, (b.shape[1],1)) + repeat(b.T, a.shape[0], axis=0)).reshape(b.shape[1],a.shape[0],a.shape[1]), 2).T

or other combinations of tile and reshape also work but run even slower than the loop above due to the prohibitively large amounts of memory required for realistically sized input matrices.

I am currently considering writing a NumPy extension in C to compute this, but of course I'd rather avoid that. Is there an established way to do this, or does anybody know of a less memory intensive way of performing this computation?

EDIT: Thanks to larsmans for this solution (see below for derivation):

def logdot(a, b):
    max_a, max_b = np.max(a), np.max(b)
    exp_a, exp_b = a - max_a, b - max_b
    np.exp(exp_a, out=exp_a)
    np.exp(exp_b, out=exp_b)
    c = np.dot(exp_a, exp_b)
    np.log(c, out=c)
    c += max_a + max_b
    return c

A quick comparison of this method to the method posted above (logdot_old) using iPython's magic %timeit function yields the following:

In  [1] a = np.log(np.random.rand(1000,2000))

In  [2] b = np.log(np.random.rand(2000,1500))

In  [3] x = logdot(a, b)

In  [4] y = logdot_old(a, b) # this takes a while

In  [5] np.any(np.abs(x-y) > 1e-14)
Out [5] False

In  [6] %timeit logdot_old(a, b)
1 loops, best of 3: 1min 18s per loop

In  [6] %timeit logdot(a, b)
1 loops, best of 3: 264 ms per loop

Obviously larsmans' method obliterates mine!

question from:https://stackoverflow.com/questions/23630277/numerically-stable-way-to-multiply-log-probability-matrices-in-numpy

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T05:13:08+0000

logsumexp works by evaluating the right-hand side of the equation

log(∑ exp[a]) = max(a) + log(∑ exp[a - max(a)])

I.e., it pulls out the max before starting to sum, to prevent overflow in exp. The same can be applied before doing vector dot products:

log(exp[a] ? exp[b])
 = log(∑ exp[a] × exp[b])
 = log(∑ exp[a + b])
 = max(a + b) + log(∑ exp[a + b - max(a + b)])     { this is logsumexp(a + b) }

but by taking a different turn in the derivation, we obtain

log(∑ exp[a] × exp[b])
 = max(a) + max(b) + log(∑ exp[a - max(a)] × exp[b - max(b)])
 = max(a) + max(b) + log(exp[a - max(a)] ? exp[b - max(b)])

The final form has a vector dot product in its innards. It also extends readily to matrix multiplication, so we get the algorithm

def logdotexp(A, B):
    max_A = np.max(A)
    max_B = np.max(B)
    C = np.dot(np.exp(A - max_A), np.exp(B - max_B))
    np.log(C, out=C)
    C += max_A + max_B
    return C

This creates two A-sized temporaries and two B-sized ones, but one of each can be eliminated by

exp_A = A - max_A
np.exp(exp_A, out=exp_A)

and similarly for B. (If the input matrices may be modified by the function, all the temporaries can be eliminated.)

Categories

python - numerically stable way to multiply log probability matrices in numpy

python - numerically stable way to multiply log probability matrices in numpy

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags