Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
915 views
in Technique[技术] by (71.8m points)

python - How do I ignore tags while getting the .string of a Beautiful Soup element?

I'm working with HTML elements that have child tags, which I want to "ignore" or remove, so that the text is still there. Just now, if I try to .string any element with tags, all I get is None.

import bs4

soup = bs4.BeautifulSoup("""
    <div id="main">
      <p>This is a paragraph.</p>
      <p>This is a paragraph <span class="test">with a tag</span>.</p>
      <p>This is another paragraph.</p>
    </div>
""")

main = soup.find(id='main')
for child in main.children:
    print child.string

Output:

This is a paragraph.
None
This is another paragraph.

I want the second line to be This is a paragraph with a tag.. How do I do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
for child in soup.find(id='main'):
    if isinstance(child, bs4.Tag):
        print child.text

And, you'll get:

This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...