Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
551 views
in Technique[技术] by (71.8m points)

python - How to find the comment tag <!--...--> with BeautifulSoup?

I tried soup.find('!--') but it doesn't seem to work. Thanks in advance.

Edit: Thanks for the tip on how to find all comments. I have a follow up question. How do I specifically search out for a comment?

For example, I have the following comment tag:

<!-- <span class="titlefont"> <i>Wednesday 110518</i>(05:00PM)<br /></span> -->

I really just want this stuff <i>Wednesday 110518</i>. The "110518" is the date YYMMDD which I'm leaning on using as my search target. However, I don't know how to find something within a specific comment tag.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can find all the comments in a document with via the findAll method. See this example showing how to do exactly what you're trying to do Removing elements:

In brief, you want this:

comments = soup.findAll(text=lambda text:isinstance(text, Comment))

Edit: If you're trying to search within the columns, you can try:

import re
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
  e = re.match(r'<i>([^<]*)</i>', comment.string).group(1)
  print e

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...