Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
268 views
in Technique[技术] by (71.8m points)

pandas - How to split data from a merged cell into other cells in its same row of a Python data frame?

I have a sample of a data frame which looks like this:

+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
|   | Date                                                                                 | Professional  | Description                                |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 0 | 2019-12-19 00:00:00                                                                  | Katie Cool    | Travel to Space ...                        |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 1 | 2019-12-20 00:00:00                                                                  | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 2 | 2019-12-27 00:00:00                                                                  | Jenn Blossoms | Review lots of stuff/o...                  |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 3 | 2019-12-27 00:00:00                                                                  | Jenn Blossoms | Draft email to world leader...             |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 4 | 2019-12-30 00:00:00                                                                  | Jenn Blossoms | Review this thing.                         |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 5 | 12-30-2019 Jenn Blossoms Telephone   Call   to   A.   Bell   return   her   multiple | NaN           | NaN                                        |
|   | voicemails.                                                                          |               |                                            |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+

Much of the row's data is in the date cell.

I would like for the sample to look like this:

+---+---------------------+---------------+-------------------------------------------------------------+
|   | Date                | Professional  | Description                                                 |
+---+---------------------+---------------+-------------------------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool    | Travel to Space ...                                         |
+---+---------------------+---------------+-------------------------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ...                  |
+---+---------------------+---------------+-------------------------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o...                                   |
+---+---------------------+---------------+-------------------------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader...                              |
+---+---------------------+---------------+-------------------------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing.                                          |
+---+---------------------+---------------+-------------------------------------------------------------+
| 5 | 12-30-2019          | Jenn Blossoms | Telephone   Call   to   A.   Bell   return   her   multiple |
|   |                     |               | voicemails.                                                 |
+---+---------------------+---------------+-------------------------------------------------------------+

I have tried this code:

date = dftopdata['Date'].str.extract('(d{2}-d{2}-d{4})(sw+sw+)s(w+.*)')[0]
name = dftopdata['Date'].str.extract('(d{2}-d{2}-d{4})(sw+sw+)s(w+.*)')[1]
description = dftopdata['Date'].str.extract('(d{2}-d{2}-d{4})(sw+sw+)s(w+.*)')[2]

dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Professional'] = name
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Description'] = description
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Date'] = date

But when I run the above code, the data frame sample looks like this:

+---+------------+---------------+--------------------------------------------+
|   | Date       | Professional  | Description                                |
+---+------------+---------------+--------------------------------------------+
| 0 | 12/19/2019 | Katie Cool    | Travel to space ...                        |
+---+------------+---------------+--------------------------------------------+
| 1 | 12/20/2019 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+------------+---------------+--------------------------------------------+
| 2 | 12/27/2019 | Jenn Blossoms | Review lots of stuff/o…                    |
+---+------------+---------------+--------------------------------------------+
| 3 | 12/27/2019 | Jenn Blossoms | Draft email to world leader...             |
+---+------------+---------------+--------------------------------------------+
| 4 | 12/30/2019 | Jenn Blossoms | Review this thing.                         |
+---+------------+---------------+--------------------------------------------+
| 5 | NaN        | NaN           | NaN                                        |
+---+------------+---------------+--------------------------------------------+
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the str.split method to split the string into "words".

df['list_of_words'] = dftopdata['Date'].str.split()

If there is a pattern to split the Professional and Description parts from this list_of_words - you can use it. For instance, if the first 2 words of list_of_words make up the name of the professional then you can do -

df['Professional'] = df.apply(lambda x: ' '.join(x['list_of_words'][:2]), axis=1)
df['Description'] = df.apply(lambda x: ' '.join(x['list_of_words'][2:]), axis=1)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...