Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
440 views
in Technique[技术] by (71.8m points)

python - NumPy "record array" or "structured array" or "recarray"

What, if any, is the difference between a NumPy "structured array", a "record array" and a "recarray"?

The NumPy docs imply that the first two are the same: if they are, which is the prefered term for this object?

The same documentation says (at the bottom of the page): You can find some more information on recarrays and structured arrays (including the difference between the two) here. Is there a simple explanation of this difference?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The answer in a nutshell is you should generally use structured arrays rather than recarrays because structured arrays are faster and the only advantage of recarrays is to allow you to write arr.x instead of arr['x'], which can be a convenient shortcut, but also error prone if your column names conflict with numpy methods/attributes.

See this excerpt from @jakevdp's book for a more detailed explanation. In particular, he notes that simply accessing columns of structured arrays can be around 20x to 30x faster than accessing columns of recarrays. However, his example uses a very small dataframe with just 4 rows and doesn't perform any standard operations.

For simple operations on larger dataframes, the difference is likely to be much smaller although structured arrays are still faster. For example, here's are a structured and record array each with 10,000 rows (code to create the arrays from a dataframe borrowed from @jpp answer here).

n = 10_000
df = pd.DataFrame({ 'x':np.random.randn(n) })
df['y'] = df.x.astype(int)

rec_array = df.to_records(index=False)

s = df.dtypes
struct_array = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))

If we do a standard operation such as multiplying a column by 2 it's about 50% faster for the structured array:

%timeit struct_array['x'] * 2
9.18 μs ± 88.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit rec_array.x * 2
14.2 μs ± 314 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...