Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

python - numpy.sum() giving strange results on large arrays

I seem to have found a pitfall with using .sum() on numpy arrays but I'm unable to find an explanation. Essentially, if I try to sum a large array then I start getting nonsensical answers but this happens silently and I can't make sense of the output well enough to Google the cause.

For example, this works exactly as expected:

a = sum(xrange(2000)) 
print('a is {}'.format(a))

b = np.arange(2000).sum()
print('b is {}'.format(b))

Giving the same output for both:

a is 1999000
b is 1999000

However, this does not work:

c = sum(xrange(200000)) 
print('c is {}'.format(c))

d = np.arange(200000).sum()
print('d is {}'.format(d))

Giving the following output:

c is 19999900000
d is -1474936480

And on an even larger array, it's possible to get back a positive result. This is more insidious because I might not identify that something unusual was happening at all. For example this:

e = sum(xrange(100000000))
print('e is {}'.format(e))

f = np.arange(100000000).sum()
print('f is {}'.format(f))

Gives this:

e is 4999999950000000
f is 887459712

I guessed that this was to do with data types and indeed even using the python float seems to fix the problem:

e = sum(xrange(100000000))
print('e is {}'.format(e))

f = np.arange(100000000, dtype=float).sum()
print('f is {}'.format(f))

Giving:

e is 4999999950000000
f is 4.99999995e+15

I have no background in Comp. Sci. and found myself stuck (perhaps this is a dupe). Things I've tried:

  1. numpy arrays have a fixed size. Nope; this seems to show I should hit a MemoryError first.
  2. I might somehow have a 32-bit installation (probably not relevant); nope, I followed this and confirmed I have 64-bit.
  3. Other examples of weird sum behaviour; nope (?) I found this but I can't see how it applies.

Can someone please explain briefly what I'm missing and tell me what I need to read up on? Also, other than remembering to define a dtype each time, is there a way to stop this happening or give a warning?

Possibly relevant:

Windows 7

numpy 1.11.3

Running out of Enthought Canopy on Python 2.7.9

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

On Windows (on 64-bit system too) the default integer NumPy uses if you convert from Python ints is 32-bit. On Linux and Mac it is 64-bit.

Specify a 64-bit integer and it will work:

d = np.arange(200000, dtype=np.int64).sum()
print('d is {}'.format(d))

Output:

c is 19999900000
d is 19999900000

While not most elegant, you can do some monkey patching, using functools.partial:

from functools import partial

np.arange = partial(np.arange, dtype=np.int64)

From now on np.arange works with 64-bit integers as default.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...