Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
271 views
in Technique[技术] by (71.8m points)

python - Convert pandas series of lists to dataframe

I have a series made of lists

import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])

and I want a DataFrame with each column a list.

None of from_items, from_records, DataFrame Series.to_frame seem to work.

How to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As @Hatshepsut pointed out in the comments, from_items is deprecated as of version 0.23. The link suggests to use from_dict instead, so the old answer can be modified to:

pd.DataFrame.from_dict(dict(zip(s.index, s.values)))

--------------------------------------------------OLD ANSWER-------------------------------------------------------------

You can use from_items like this (assuming that your lists are of the same length):

pd.DataFrame.from_items(zip(s.index, s.values))

   0  1
0  1  4
1  2  5
2  3  6

or

pd.DataFrame.from_items(zip(s.index, s.values)).T

   0  1  2
0  1  2  3
1  4  5  6

depending on your desired output.

This can be much faster than using an apply (as used in @Wen's answer which, however, does also work for lists of different length):

%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 μs per loop

%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop

and

%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 μs per loop

%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop

Also @Hatshepsut's answer is quite fast (also works for lists of different length):

%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 μs per loop

and

%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 μs per loop

Fastest solution seems to be @Abdou's answer (tested for Python 2; also works for lists of different length; use itertools.zip_longest in Python 3.6+):

%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 μs per loop

An additional option:

pd.DataFrame(dict(zip(s.index, s.values)))

   0  1
0  1  4
1  2  5
2  3  6

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...