Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
269 views
in Technique[技术] by (71.8m points)

python - Merge two datasets in Pandas

I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:

 df1:
 Date id Market_Cap
 2000 1  400
 2000 2  200
 2001 1  410
 2001 2  220

 df2:
 id Ticker
 1   Shell
 2   ExxonMobil

My aim now is to get the following dataset:

Date id Market_Cap  Ticker
2000 1  400        Shell 
2000 2  200        ExxonMobil 
2001 1  410        Shell 
2001 2  220        ExxonMobil

I tried the following command:

merged= pd.merge(df1, df2, how="left", on="id")

This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.

Am I missing something crucial?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Thyere is problem your column id in one df is object (obviously string) and another int, so no match and get NaN.

If have same dtypes:

print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64

merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

Another solution if need add only one new column is map:

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

Simulate your problem:

print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date id  Market_Cap Ticker
0  2000  1         400    NaN
1  2000  2         200    NaN
2  2001  1         410    NaN
3  2001  2         220    NaN

And solution is convert to int by astype (or column id in df2 to str):

df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...