python - Using BeautifulSoup to find a HTML tag that contains certain text

Question

Welcome To Ask or Share your Answers For Others

python - Using BeautifulSoup to find a HTML tag that contains certain text

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Using BeautifulSoup to find a HTML tag that contains certain text

I'm trying to get the elements in an HTML doc that contain the following pattern of text: #S{11}

<h2> this is cool #12345678901 </h2>

So, the previous would match by using:

soup('h2',text=re.compile(r' #S{11}'))

And the results would be something like:

[u'blahblah #223409823523', u'thisisinteresting #293845023984']

I'm able to get all the text that matches (see line above). But I want the parent element of the text to match, so I can use that as a starting point for traversing the document tree. In this case, I'd want all the h2 elements to return, not the text matches.

Ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:06:59+0000

from BeautifulSoup import BeautifulSoup
import re

html_text = """
<h2>this is cool #12345678901</h2>
<h2>this is nothing</h2>
<h1>foo #126666678901</h1>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
"""

soup = BeautifulSoup(html_text)


for elem in soup(text=re.compile(r' #S{11}')):
    print elem.parent

Prints:

<h2>this is cool #12345678901</h2>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>

Categories

python - Using BeautifulSoup to find a HTML tag that contains certain text

python - Using BeautifulSoup to find a HTML tag that contains certain text

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags