python - Extracting Text Between HTML Comments with BeautifulSoup

Question

Welcome To Ask or Share your Answers For Others

python - Extracting Text Between HTML Comments with BeautifulSoup

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Extracting Text Between HTML Comments with BeautifulSoup

Using Python 3 and BeautifulSoup 4, I would like to be able to extract text from an HTML page that only delineated by a comment above it. An example:

<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text

I have found various ways to extract a page's text or comments, but no way to do what I'm looking for. Any help would be greatly appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:06:54+0000

You just need to iterate through all of the available comments to see if it is one of your required entries, and then display the text for the following element as follows:

from bs4 import BeautifulSoup, Comment

html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')

for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
    if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:
        print comment.next_element.strip()

This would display the following:

I would like to get this text
I would also like to find this text

Categories

python - Extracting Text Between HTML Comments with BeautifulSoup

python - Extracting Text Between HTML Comments with BeautifulSoup

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags