Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

download image from url using python urllib but receiving HTTP Error 403: Forbidden

I want to download image file from a url using python module "urllib.request", which works for some website (e.g. mangastream.com), but does not work for another (mangadoom.co) receiving error "HTTP Error 403: Forbidden". What could be the problem for the latter case and how to fix it?

I am using python3.4 on OSX.

import urllib.request

# does not work
img_url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png'
img_filename = 'my_img.png'
urllib.request.urlretrieve(img_url, img_filename)

At the end of error message it said:

... 
HTTPError: HTTP Error 403: Forbidden

However, it works for another website

# work
img_url = 'http://img.mangastream.com/cdn/manga/51/3140/006.png'
img_filename = 'my_img.png'
urllib.request.urlretrieve(img_url, img_filename)

I have tried the solutions from the post below, but none of them works on mangadoom.co.

Downloading a picture via urllib and python

How do I copy a remote image in python?

The solution here also does not fit because my case is to download image. urllib2.HTTPError: HTTP Error 403: Forbidden

Non-python solution is also welcome. Your suggestion will be very appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This website is blocking the user-agent used by urllib, so you need to change it in your request. Unfortunately I don't think urlretrieve supports this directly.

I advise for the use of the beautiful requests library, the code becomes (from here) :

import requests
import shutil

r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png', stream=True)
if r.status_code == 200:
    with open("img.png", 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)

Note that it seems this website does not forbide requests user-agent. But if need to be modified it is easy :

r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png',
                 stream=True, headers={'User-agent': 'Mozilla/5.0'})

Also relevant : changing user-agent in urllib


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...