Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
291 views
in Technique[技术] by (71.8m points)

Get a string between two variable strings that contain regex specific characters using python

so I have a string such as this:

r'irrelevant data (~symbol)relevant data(/~symbol) irrelevant data'

and want to get to the relevant data. However, the (~symbol) tag is variable, meaning that in order to find the relevant regex phrase we would need to go something like

tags = ["(~symbol)","(/~symbol)"]
string = r'irrelevant data (~symbol)relevant data(/~symbol) irrelevant data'
regex = r'{}([^"]*){}'.format(tags[0],tags[1])
result = re.findall(regex , string)[0]

the problem is that our tags contain characters that would need to be escaped if used in a regular expression, so in this case the result would contain the tags themselves instead of just the desired string.

Is there a good solution that doesn't involve replace?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There's a lot in your question, so I'll try addressing them one-by-one:

  • For getting the "irrelevant data" in between, you might want to look into re.split.
  • For separators with special characters, use re.escape.
  • To exclude the separators in the result, use non-capturing groups (?:).

For your example, it would be something like this:

import re
patterns = ["(~symbol)", "(/~symbol)"]
string = r'irrelevant data (~symbol)relevant data(/~symbol) irrelevant data'
result = re.split('(?:' + '|'.join(map(re.escape, patterns)) + ')', string)

which then gives

['irrelevant data ', 'relevant data', ' irrelevant data']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...