python - Why does Pandas iterate over DataFrame columns by default?

Question

Welcome To Ask or Share your Answers For Others

python - Why does Pandas iterate over DataFrame columns by default?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Why does Pandas iterate over DataFrame columns by default?

Trying to understand the design rationale behind some of Pandas' features.

If I have a DataFrame with 3560 rows and 18 columns, then

len(frame)

is 3560, but

len([a for a in frame])

is 18.

Maybe this feels natural to someone coming from R; to me it doesn't feel very 'Pythonic'. Is there an introduction to the underlying design rationales for Pandas somewhere?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:50:12+0000

A DataFrame is primarily a column-based data structure. Under the hood, the data inside the DataFrame is stored in blocks. Roughly speaking there is one block for each dtype. Each column has one dtype. So accessing a column can be done by selecting the appropriate column from a single block. In contrast, selecting a single row requires selecting the appropriate row from each block and then forming a new Series and copying the data from each block's row into the Series. Thus, iterating through rows of a DataFrame is (under the hood) not as natural a process as iterating through columns.

If you need to iterate through the rows, you still can, however, by calling df.iterrows(). You should avoid using df.iterrows if possible for the same reason why it's unnatural -- it requires copying which makes the process slower than iterating through columns.

Categories

python - Why does Pandas iterate over DataFrame columns by default?

python - Why does Pandas iterate over DataFrame columns by default?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags