Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
429 views
in Technique[技术] by (71.8m points)

python - Complex Beautiful Soup query

Here is a snippet of an HTML file I'm exploring with Beautiful Soup.

<td width="50%">
    <strong class="sans"><a href="http:/website">Site</a></strong> <br /> 

I would like to get the <a href> for any line which has the <strong class="sans"> and which is inside a <td width="50%">.

Is it possible to query a HTML file for those multiple conditions using Beautiful Soup ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

BeautifulSoup's search mechanisms accept a callable, which the docs appear to recommend for your case: "If you need to impose complex or interlocking restrictions on a tag's attributes, pass in a callable object for name,...". (ok... they're talking about attributes specifically, but the advice reflects an underlying spirit to the BeautifulSoup API).

If you want a one-liner:

soup.findAll(lambda tag: tag.name == 'a' and 
tag.findParent('strong', 'sans') and 
tag.findParent('strong', 'sans').findParent('td', attrs={'width':'50%'}))

I've used a lambda in this example, but in practice you may want to define a callable function if you have multiple chained requirements as this lambda has to make two findParent('strong', 'sans') calls to avoid raising an exception if an <a> tag has no strong parent. Using a proper function, you could make the test more efficient.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...