How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

Question

Welcome To Ask or Share your Answers For Others

How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

So, I'm a big fan of Gustave Doré, and I would like to download all his engravings from the Wikimedia Commons folders that are neatly organized.

So, given a Wikimedia Commons folder I need to download all the pictures in it in the highest resolution.

I started writing something, but I'm not that good, so it's just a template:

import os, requests, bs4

url = 'URL OF THE WIKIMEDIA COMMONS FOLDER'

os.makedirs('NAME OF THE FOLDER', exist_ok=True)
for n in range(NUMBER OF PICTURES IN THE PAGE - 1):
    print('I am downloading page number %s...' %(n+1))
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    #STUFF I STILL NEED TO ADD
    
print('Done')

For example, I would feed this as the URL of the folder:

https://commons.wikimedia.org/wiki/Category:Crusades_by_Gustave_Dor%C3%A9

Then I would like to click every link and go to the picture page, like this one:

https://commons.wikimedia.org/wiki/File:Astonishment_of_the_Crusaders_at_the_Wealth_of_the_East.jpg

And then download the 'original file' by clicking the link below the picture that says 'original file'. Except sometimes the pic has no higher resolution available, like in this case:

https://commons.wikimedia.org/wiki/File:Andel_krizaci.jpg

And it would just need to click the link below the picture to download it.

I am completely stuck, thanks in advance for your help!

Bonus points if the pic has the name stated in its page when saved

(e.g. in the second link the picture should be saved as 'Astonishment of the Crusaders at the Wealth of the East.jpg')

question from:https://stackoverflow.com/questions/65923348/how-can-i-create-a-python-script-with-beautifulsoup-on-windows-to-download-the-h

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:07:32+0000

Hey big fan of Gustave Doré, here is a way you can do it

r = requests.get('https://commons.wikimedia.org/wiki/Category:Crusades_by_Gustave_Dor%C3%A9')
soup = BeautifulSoup(r.text, 'html.parser')
links = [i.find('img').get('src') for i in soup.find_all('a', class_='image')]
links = ['/'.join(i.split('/')[:-1]).replace('/thumb', '') for i in links]
for l in links:
    im = requests.get(l)
    with open(l.split('/')[-1], 'wb') as f:
        f.write(im.content)

Categories

How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags