Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
995 views
in Technique[技术] by (71.8m points)

pandas - Change values in a column from a list

I've got a dataframe with my index 'Country' I want to change the name of multiple countries, I have the old/new values in a dictionary, like below:

I tried splitting the values in a from list and to list, and that wouldn't work either. The code doesn't error, but the values in my dataframe haven't changed.

`import pandas as pd
import numpy as np

energy = (pd.read_excel('Energy Indicators.xls', 
                        skiprows=17, 
                        skip_footer=38))

energy = (energy.drop(energy.columns[[0, 1]], axis=1))
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']          
energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: x*1000000)

#This code isn't working properly
energy['Country'] = energy['Country'].replace({'China, Hong Kong Special Administrative Region':'Hong Kong', 'United Kingdom of Great Britain and Northern Ireland':'United Kingdom', 'Republic of Korea':'South Korea', 'United States of America':'United States', 'Iran (Islamic Republic of)':'Iran'})`

SOLVED: This was a problem with the data that I hadn't noticed.

energy['Country'] = (energy['Country'].str.replace('s*(.*?)s*', '').str.replace('d+',''))

This line was sat under the 'problem' line, when actually it was required to clean it up before the replace could work. eg. United States of America20 was actually in the excel file so replace skipped right over it

Thanks for your help!!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need remove supercript by replace:

d = {'China, Hong Kong Special Administrative Region':'Hong Kong', 
     'United Kingdom of Great Britain and Northern Ireland':'United Kingdom', 
     'Republic of Korea':'South Korea', 'United States of America':'United States', 
     'Iran (Islamic Republic of)':'Iran'}

energy['Country'] = energy['Country'].str.replace('d+', '').replace(d)

Also you can improve your solution - use parameter usecols for filtering columns and names for set new column names:

names = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']

energy = pd.read_excel('Energy Indicators.xls', 
                        skiprows=17, 
                        skip_footer=38,
                        usecols=range(2,6), 
                        names=names)


d = {'China, Hong Kong Special Administrative Region':'Hong Kong', 
     'United Kingdom of Great Britain and Northern Ireland':'United Kingdom', 
     'Republic of Korea':'South Korea', 'United States of America':'United States', 
     'Iran (Islamic Republic of)':'Iran'}

#for multiple is faster use *
energy['Energy Supply'] = energy['Energy Supply'] * 1000000
energy['Country'] = energy['Country'].str.replace('d', '').replace(d)
#print (energy)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...