Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
313 views
in Technique[技术] by (71.8m points)

python - In Pandas, does .iloc method give a copy or view?

I find the result is a little bit random. Sometimes it's a copy sometimes it's a view. For example:

df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}],index=['student1','student2'])

df
              age   name
   student1   21  Marry
   student2   24   John

Now, Let me try to modify it a little bit.

df2= df.loc['student1']
df2 [0] = 23
df
              age   name
   student1   21  Marry
   student2   24   John

As you can see, nothing changed. df2 is a copy. However, if I add another student into the dataframe...

df.loc['student3'] = ['old','Tom']
df
               age   name
    student1   21  Marry
    student2   24   John
    student3  old    Tom

Try to change the age again..

df3=df.loc['student1']
df3[0]=33
df
               age   name
    student1   33  Marry
    student2   24   John
    student3  old    Tom

Now df3 suddenly became a view. What is going on? I guess the value 'old' is the key?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are starting with a DataFrame that has two columns with two different dtypes:

df.dtypes
Out: 
age      int64
name    object
dtype: object

Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them:

df.blocks

Out: 
{'int64':           age
 student1   21
 student2   24, 'object':            name
 student1  Marry
 student2   John}

If you attempt to slice the first row of this DataFrame, it has to get one value from each different block which makes it necessary to create a copy.

df2.is_copy
Out[40]: <weakref at 0x7fc4487a9228; to 'DataFrame' at 0x7fc4488f9dd8>

In the second attempt, you are changing the dtypes. Since 'old' cannot be stored in an integer array, it casts the Series as an object Series.

df.loc['student3'] = ['old','Tom']

df.dtypes
Out: 
age     object
name    object
dtype: object

Now all data for this DataFrame is stored in a single block (and in a single numpy array):

df.blocks

Out: 
{'object':           age   name
 student1   21  Marry
 student2   24   John
 student3  old    Tom}

At this step, slicing the first row can be done on the numpy array without creating a copy, so it returns a view.

df3._is_view
Out: True

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...