Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
343 views
in Technique[技术] by (71.8m points)

python - How to fill the missing record of Pandas dataframe in pythonic way?

I have a Pandas dataframe 'df' like this :

         X   Y  
IX1 IX2
A   A1  20  30
    A2  20  30
    A5  20  30
B   B2  20  30
    B4  20  30

It lost some rows, and I want to fill in the gap in the middle like this:

         X   Y  
IX1 IX2
A   A1  20  30
    A2  20  30
    A3  NaN NaN
    A4  NaN NaN
    A5  20  30
B   B2  20  30
    B3  NaN NaN
    B4  20  30

Is there a pythonic way to do this ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to construct your full index, and then use the reindex method of the dataframe. Like so...

import pandas
import StringIO
datastring = StringIO.StringIO("""
C1,C2,C3,C4
A,A1,20,30
A,A2,20,30
A,A5,20,30
B,B2,20,30
B,B4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = [('A', 'A1'), ('A', 'A2'), ('A', 'A3'), 
              ('A', 'A4'), ('A', 'A5'), ('B', 'B1'), 
              ('B', 'B2'), ('B', 'B3'), ('B', 'B4')]
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
A A1  20  30
  A2  20  30
  A3 NaN NaN
  A4 NaN NaN
  A5  20  30
B B1 NaN NaN
  B2  20  30
  B3  20  30
  B4  20  30

And then you can use the fillna method to set the NaNs to whatever you want.

update (June 2014)

Just had to revisit this myself... In the current version of pandas, there is a function to build MultiIndex from the Cartesian product of iterables. So the above solution could become:

datastring = StringIO.StringIO("""
C1,C2,C3,C4
A,1,20,30
A,2,20,30
A,5,20,30
B,2,20,30
B,4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = pandas.MultiIndex.from_product([('A', 'B'), range(6)], names=['C1', 'C2'])
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
C1 C2
 A  1  20  30
    2  20  30
    3 NaN NaN
    4 NaN NaN
    5  20  30
 B  1 NaN NaN
    2  20  30
    3  20  30
    4  20  30
    5 NaN NaN

Pretty elegant, in my opinion.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...