Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
462 views
in Technique[技术] by (71.8m points)

python - Trying to merge 2 dataframes but get ValueError

These are my two dataframes saved in two variables:

> print(df.head())
>
          club_name  tr_jan  tr_dec  year
    0  ADO Den Haag    1368    1422  2010
    1  ADO Den Haag    1455    1477  2011
    2  ADO Den Haag    1461    1443  2012
    3  ADO Den Haag    1437    1383  2013
    4  ADO Den Haag    1386    1422  2014
> print(rankingdf.head())
>
           club_name  ranking  year
    0    ADO Den Haag    12    2010
    1    ADO Den Haag    13    2011
    2    ADO Den Haag    11    2012
    3    ADO Den Haag    14    2013
    4    ADO Den Haag    17    2014

I'm trying to merge these two using this code:

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

The how='left' is added because I have less datapoints in my ranking_df than in my standard df.

The expected behaviour is as such:

> print(new_df.head()) 
> 

      club_name  tr_jan  tr_dec  year    ranking
0  ADO Den Haag    1368    1422  2010    12
1  ADO Den Haag    1455    1477  2011    13
2  ADO Den Haag    1461    1443  2012    11
3  ADO Den Haag    1437    1383  2013    14
4  ADO Den Haag    1386    1422  2014    17

But I get this error:

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

But I do not wish to use concat since I want to merge the trees not just add them on.

Another behaviour that's weird in my mind is that my code works if I save the first df to .csv and then load that .csv into a dataframe.

The code for that:

df = pd.DataFrame(data_points, columns=['club_name', 'tr_jan', 'tr_dec', 'year'])
df.to_csv('preliminary.csv')

df = pd.read_csv('preliminary.csv', index_col=0)

ranking_df = pd.DataFrame(rankings, columns=['club_name', 'ranking', 'year'])

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

I think that it has to do with the index_col=0 parameter. But I have no idea to fix it without having to save it, it doesn't matter much but is kind of an annoyance that I have to do that.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In one of your dataframes the year is a string and the other it is an int64 you can convert it first and then join (e.g. df['year']=df['year'].astype(int) or as RafaelC suggested df.year.astype(int))

Edit: Also note the comment by Anderson Zhu: Just in case you have None or missing values in one of your dataframes, you need to use Int64 instead of int. See the reference here.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...