Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
235 views
in Technique[技术] by (71.8m points)

python - Working with floating point NumPy arrays for comparison and related operations

I have an array of random floats and I need to compare it to another one that has the same values in a different order. For that matter I use the sum, product (and other combinations depending on the dimension of the table hence the number of equations needed).

Nevertheless, I encountered a precision issue when I perform the sum (or product) on the array depending on the order of the values.

Here is a simple standalone example to illustrate this issue :

import numpy as np

n = 10
m = 4

tag = np.random.rand(n, m)

s1 = np.sum(tag, axis=1)
s2 = np.sum(tag[:, ::-1], axis=1)

# print the number of times s1 is not equal to s2 (should be 0)
print np.nonzero(s1 != s2)[0].shape[0]

If you execute this code it sometimes tells you that s1 and s2 are not equal and the differents is of magnitude of the computer precision.

The problem is I need to use those in functions like np.in1d where I can't really give a tolerance...

Is there a way to avoid this issue?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For the listed code, you can use np.isclose and with it tolerance values could be specified too.

Using the provided sample, let's see how it could be used -

In [201]: n = 10
     ...: m = 4
     ...: 
     ...: tag = np.random.rand(n, m)
     ...: 
     ...: s1 = np.sum(tag, axis=1)
     ...: s2 = np.sum(tag[:, ::-1], axis=1)
     ...: 

In [202]: np.nonzero(s1 != s2)[0].shape[0]
Out[202]: 4

In [203]: (~np.isclose(s1,s2)).sum() # So, all matches!
Out[203]: 0

To make use of tolerance values in other scenarios, we need to work on a case-by-case basis. So, let's say for an implementation that involve elementwise comparison like in np.in1d, we can bring in broadcasting to do those elementwise equality checks for all elems in first input against all elems in the second one. Then, we use np.abs to get the "closeness factor" and finally compare against the input tolerance to decide the matches. As needed to simulate np.in1d, we do ANY operation along one of the axis. Thus, np.in1d with tolerance using broadcasting could be implemented like so -

def in1d_with_tolerance(A,B,tol=1e-05):
    return (np.abs(A[:,None] - B) < tol).any(1)

As suggested in the comments by OP, we can also round floating-pt numbers after scaling them up and this should be memory efficient, as being needed for working with large arrays. So, a modified version would be like so -

def in1d_with_tolerance_v2(A,B,tol=1e-05):
    S = round(1/tol)
    return np.in1d(np.around(A*S).astype(int),np.around(B*S).astype(int))

Sample run -

In [372]: A = np.random.rand(5)
     ...: B = np.random.rand(7)
     ...: B[3] = A[1] + 0.0000008
     ...: B[6] = A[4] - 0.0000007
     ...: 

In [373]: np.in1d(A,B) # Not the result we want!
Out[373]: array([False, False, False, False, False], dtype=bool)

In [374]: in1d_with_tolerance(A,B)
Out[374]: array([False,  True, False, False,  True], dtype=bool)

In [375]: in1d_with_tolerance_v2(A,B)
Out[375]: array([False,  True, False, False,  True], dtype=bool)

Finally, on how to make it work for other implementations and use cases - It would depend on the implementation itself. But for most cases, np.isclose and broadcasting should help.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...