Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
159 views
in Technique[技术] by (71.8m points)

python - Sort pandas MultiIndex

I have created a Dataframe with a MultiIndex by using another Dataframe:

arrays = [df['bus_uid'], df['bus_type'], df['type'],
          df['obj_uid'], df['datetime']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['bus_uid', 'bus_type', 'type',
                                                 'obj_uid', 'datetime'])
multindex_df = pd.DataFrame(df['val'].values, index=index)

This worked fine as described in the documentation http://pandas.pydata.org/pandas-docs/stable/advanced.html .

In the documentation it also says that the labels need to be sorted for the correct working of indexing and slicing functionalities under "The need for sortedness with MultiIndex".

But somehow

multindexed_df.sort_index(level=0)

or

multindexed_df.sort_index(level='bus_uid')

does not work anymore and throws TypeError: sort_index() got an unexpected keyword argument 'level'.

Looking up the object information on sort_index() it looks as "by" is my new friend instead of "levels":

by:object
  Column name(s) in frame. Accepts a column name or a list for a nested sort. A tuple will be interpreted as the levels of a multi-index.

My question is the following: How can I sort my MultiIndex so that all functionalities (slicing,etc.) are working correctly?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The answer depends on the pandas version you are working with. With the latest pandas (>= 0.17.0), you can indeed use the level keyword to specify to sort which level of the multi-index:

df = df.sort_index(level=0)

But, if you have an older pandas (< 0.17.0), this level keyword is not yet available, but you can use the sortlevel method:

df = df.sortlevel(level=0)

But note that if you want to sort all levels, you don't need to specify the level keyword, and you can just do:

df = df.sort_index()

This will work for both the recent and older versions of pandas.


For a summary of these changes in the sorting API, see http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#changes-to-sorting-api


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...