Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
193 views
in Technique[技术] by (71.8m points)

python - regex to extract a set number of words around a matched word

I was looking around for a way to grab words around a found match, but they were much too complicated for my case. All I need is a regex statement to grab, lets say 10, words before and after a matched word. Would anybody be able to help me set up a pattern to do that?

For example, let's take the sentence (won't make sense):

    sentence = "The hairy yellow, stinkin' dog, sat round' the c4mpfir3 and ate the brown/yellow smore's that the kids(*adults) were makin."

and let's say we want to match 3 words before and after smore's (already cleaned to match). The output would be:

   "ate the brown/yellow smore's that the were"

now lets take the example of wanting to take one word before and after stinkin' :

   "yellow, stinkin' dog"

Another example. "sat":

   "yellow, stinkin' dog, round' the and

Let's make a new sentence now:

   sentence = "If the problem is still there after 30 minutes. Give up"

If I was trying to match the word there, and take 2 words before and after the output would be:

   "is still there after minutes"

I know it's not 10, but I think you get the example? If not, let me know and I will provide more. As I made this, I realized how much more I want than I originally thought. I'm rather new to regex, but I'm going to give the pattern a shot.

    ('[a-zA-Z'.,/]{3}(word_to_match)[a-zA-Z'.,/]{3}')

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This regex will get you started

((?:w*s*){2})s*word3s*((?:s*w*){2})

Group 1 will have the words before your target and group 2 will have the words that come after

In the example I choose to capture 2 words but you can adjust this at will.

Let me know how it goes and if it works on your input.

You can improve your question by reading this short advice http://worksol.be/regex.html

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...