Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
255 views
in Technique[技术] by (71.8m points)

python - 如何根据列表替换列值?(How to replace column values based on a list?)

I have a list like this:

(我有一个这样的清单:)

x = ['Las Vegas', 'San Francisco, 'Dallas']

And a dataframe that looks a bit like this:

(还有一个看起来像这样的数据框:)

import pandas as pd
data = [['Las Vegas (Clark County), 25], ['New York', 23], 
        ['Dallas', 27]]
df = pd.DataFrame(data, columns = ['City', 'Value'])

I want to replace my city values in the DF "Las Vegas (Clark County)" with "Las Vegas" .

(我想将DF “拉斯维加斯(克拉克县)”中的城市值替换为“ Las Vegas” 。)

In my dataframe are multiple cities with different names which needs to be changed.

(在我的数据框中,有多个城市名称不同,需要更改。)

I know I could do a regex expression to just strip off the part after the parentheses, but I was wondering if there was a more clever, generic way.

(我知道我可以做一个正则表达式来删除括号后的部分,但是我想知道是否有更聪明的通用方法。)

  ask by Kevin translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use Series.str.extract with join ed values of list by |

(将Series.str.extract与list的join值一起使用|)

for regex OR and then replace non matched values to original by Series.fillna :

(对于正则表达式OR ,然后用Series.fillna将不匹配的值替换为原始值:)

df['City'] = df['City'].str.extract(f'({"|".join(x)})', expand=False).fillna(df['City'])
print (df)
        City  Value
0  Las Vegas     25
1   New York     23
2     Dallas     27

Another idea is use Series.str.contains with loop, but it should be slow if large Dataframe and many values in list :

(另一个想法是使用带有循环的Series.str.contains ,但是如果大型Dataframe和list许多值,它应该比较慢:)

for val in x:
    df.loc[df['City'].str.contains(val), 'City'] = val

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...