Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
456 views
in Technique[技术] by (71.8m points)

regex - Remove Sub String by using Python

I already extract some information from a forum. It is the raw string I have now:

string = 'i think mabe 124 + <font color="black"><font face="Times New Roman">but I don't have a big experience it just how I see it in my eyes <font color="green"><font face="Arial">fun stuff'

The thing I do not like is the sub string "<font color="black"><font face="Times New Roman">" and "<font color="green"><font face="Arial">". I do want to keep the other part of string except this. So the result should be like this

resultString = "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"

How could I do this? Actually I used beautiful soup to extract the string above from a forum. Now I may prefer regular expression to remove the part.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
import re
re.sub('<.*?>', '', string)
"i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"

The re.sub function takes a regular expresion and replace all the matches in the string with the second parameter. In this case, we are searching for all tags ('<.*?>') and replacing them with nothing ('').

The ? is used in re for non-greedy searches.

More about the re module.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...