beautifulsoup - why is nothing getting parsed in my web scraping program?

Question

Welcome To Ask or Share your Answers For Others

beautifulsoup - why is nothing getting parsed in my web scraping program?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

beautifulsoup - why is nothing getting parsed in my web scraping program?

I made this code to search all the top links in google search. But its returning none.

import webbrowser, requests
from bs4 import BeautifulSoup
string = 'selena+gomez'
website = f'http://google.com/search?q={string}'
req_web = requests.get(website).text
parser = BeautifulSoup(req_web, 'html.parser')
gotolink = parser.find('div', class_='r').a["href"]
print(gotolink)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:37+0000

Google needs that you specify User-Agent http header to return correct page. Without the correct User-Agent specified, Google returns page that doesn't contain <div> tags with r class. You can see it when you do print(soup) with and without User-Agent.

For example:

import requests
from bs4 import BeautifulSoup

string = 'selena+gomez'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
website = f'http://google.com/search?hl=en&q={string}'

req_web = requests.get(website, headers=headers).text
parser = BeautifulSoup(req_web, 'html.parser')
gotolink = parser.find('div', class_='r').a["href"]
print(gotolink)

Prints:

https://www.instagram.com/selenagomez/?hl=en

Categories

beautifulsoup - why is nothing getting parsed in my web scraping program?

beautifulsoup - why is nothing getting parsed in my web scraping program?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags