Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
876 views
in Technique[技术] by (71.8m points)

python-3.x - 刮Python a(Scraping Python a)

I have 2 tags with different contents inside the href tag and I just want one I was wondering if it is possible for BeautifulSoup to be able to select only the href that starts with a particular word.

(我在href标记内有2个具有不同内容的标记,我只想问一个问题,BeautifulSoup是否有可能只能选择以特定单词开头的href。)

If I Know Thank You.

(如果我知道,谢谢。)

<a href="https://facebook.com/" </a> 

and the other

(和另一个)

<a href="https://Instagram.com/" </a>
  ask by Jacksuel Soares Braga translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There many option to do it, here is 3 most common (CSS selector, regex and lambda):

(有很多选项可以做到,这是3种最常见的选项(CSS选择器,正则表达式和lambda):)

data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/">TAG 2</a>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

# 1st option - CSS selector
print(soup.select_one('a[href^="https://instagram"]'))

# 2nd option - using regexp
import re
print(soup.find('a', {'href': re.compile(r'^https://instagram')}))

# 3rd option - using lambda
print(soup.find(lambda tag: 'href' in tag.attrs and tag['href'].startswith('https://instagram')))

Prints:

(印刷品:)

<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>

EDIT: To select multiple links that starts with some string:

(编辑:要选择以某些字符串开头的多个链接:)

data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/A">TAG 2</a>
<a href="https://facebook.com/">TAG 3</a>
<a href="https://instagram.com/B">TAG 4</a>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

for link in soup.select('a[href^="https://instagram"]'):
    print(link)

Prints:

(印刷品:)

<a href="https://instagram.com/A">TAG 2</a>
<a href="https://instagram.com/B">TAG 4</a>

For CSS Selector reference use this link .

(对于CSS选择器参考,请使用此链接 。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...