Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
173 views
in Technique[技术] by (71.8m points)

regex - re: remove string in brackets and its whitespace

what's the best re way to remove brackets and their content, as well as the trailing whitespace within a string? Note that not every string is formatted equally.

Script:

import pandas as pd
import re

df = pd.DataFrame({'name':
          ['University of Southampton (UK)', 
          'The College of William and Mary', 
          'University of Reading (UK)', 
          'Queensland University (Australia)']})

def cleaning(text):
    cleaned = re.findall(re.compile('^([^,]+).+'), text)
    cleaned = re.findall(re.compile('(.*)'), str(cleaned)) # Why do I have to str() here btw?
    return cleaned

df['name'].apply(lambda x: cleaning(x))

Returns:

0    []
1    []
2    []
3    []

Desired output (no whitespace at the end):

0    University of Southampton
1    The College of William and Mary
2    University of Reading
3    Queensland University

Thanks for your help!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Only work for this specific case, but you can do

df.name.str.split('(',expand=True)[0].str.strip()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...