Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
380 views
in Technique[技术] by (71.8m points)

python - Why numpy arrays are slower than lists with for loops?

Aren't arrays supposed to be faster since they consume less memory and as I know with arrays python doesn't apply type method on the elements as it in the lists.

import numpy as np
import time


length = 150000000

my_list = range(length)


list_start_time = time.time()


for item in my_list:
    pass

print(f'my_list finished in: {time.time() - list_start_time}')
# # Output => my_list finished in: 3.57804799079895

my_array = np.arange(length)

array_start_time = time.time()


for item in my_array:
    pass

print(f'my_array finished in: {time.time() - array_start_time}')
# # Output => my_array finished in: 11.598113536834717
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

my_list = range(length) is a range object, more of a generator than a list

In the loop:

 for i in range(10):
      pass

there's no significant memory use. But even if we did iterate on a list, each i would just be a reference to an item in the list. In effect a simple pointer. The list has a data buffer, which contains pointers to objects elsewhere in memory. Iteration simply requires fetching those pointers, without any object creation or processing.

In arr = np.arange(10), arr is an array object with a datebuffer containing bytes representing the values of the integers, 8 bytes per item (in the default dtype).

 for i in arr:
      pass

numpy indexes each element, fetching the relevant 8 bytes (relatively fast), and converting them to a number. The whole process is more involved than simply fetching a reference from a list's data buffer. This process is sometimes called 'unboxing'.

To illustrate, make alist and array from that list:

In [4]: alist = list(range(1000))
In [5]: arr = np.array(alist)

Indexing the list returns a python int object; from the array we get a numpy object:

In [6]: type(alist[0])
Out[6]: int
In [7]: type(arr[0])
Out[7]: numpy.int64

Some timings:

In [8]: timeit [i for i in alist]
27.9 μs ± 889 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [9]: timeit [i for i in arr]
124 μs ± 625 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

iteration on the list is much faster (as you note); and based on the following timing it looks like the array iteration effectively does [i for i in list(arr)]:

In [10]: timeit list(arr)
98 μs ± 661 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The tolist method converts the array to a list, all the way down (to native elements), and is much faster. [i for i in arr.tolist()] will actually save time.

In [11]: timeit arr.tolist()
22.8 μs ± 28 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Another way to illustrate the 'unboxing' is to look at the id of the same element (taking care to avoid memory reuse):

In [13]: x, y = alist[10], alist[10]; id(x), id(y)
Out[13]: (10914784, 10914784)
In [14]: x, y = arr[10], arr[10]; id(x), id(y)
Out[14]: (140147220887808, 140147220887832)

Each time we index a list element, we get the same id, the same object.

Each time we index an array element, we get a new object. That object creation takes time.

numpy arrays are faster - if we do the iteration is compiled c code.

For example to add 100 to each element of the array or list:

In [17]: timeit arr + 100
3.46 μs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [18]: timeit [i+100 for i in alist]
60.1 μs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...