Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.8k views
in Technique[技术] by (71.8m points)

beautifulsoup - Scraper in Python gives "Access Denied"

I'm trying to code a scraper in Python to get some info from a page. Like the title of the offers that appear on this page:
https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585

By now I use this code :

import bs4
import requests

def extract_source(url):
    source=requests.get(url).text
    return source

def extract_data(source):
    soup=bs4.BeautifulSoup(source)
    names=soup.findAll('title')
    for i in names:
        print i

extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))

But when I execute this code, it gives me an error:

<titlee> Access Denied</titlee>

What can I do to solve this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As was mentioned in comments, you need to specify allowable user-agent and pass it as headers:

def extract_source(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
    source=requests.get(url, headers=headers).text
    return source

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...