Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
743 views
in Technique[技术] by (71.8m points)

python - Scrapy Splash Screenshots?

I'm trying to scrape a site whilst taking a screenshot of every page. So far, I have managed to piece together the following code:

import json
import base64
import scrapy
from scrapy_splash import SplashRequest


class ExtractSpider(scrapy.Spider):
    name = 'extract'

    def start_requests(self):
        url = 'https://stackoverflow.com/'
        splash_args = {
            'html': 1,
            'png': 1
        }
        yield SplashRequest(url, self.parse_result, endpoint='render.json', args=splash_args)

    def parse_result(self, response):
        png_bytes = base64.b64decode(response.data['png'])

        imgdata = base64.b64decode(png_bytes)
        filename = 'some_image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)

It gets onto the site fine (example, stackoverflow) and returns data for png_bytes, but when written to a file - returns a broken image (doesn't load).

Is there a way to fix this, or alternatively find a more efficient solution? I have read that Splash Lua Scripts can do this, but have been unable to find a way to implement this. Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are decoding from base64 twice:

       png_bytes = base64.b64decode(response.data['png'])
       imgdata = base64.b64decode(png_bytes)

Simply do:

    def parse_result(self, response):
        imgdata = base64.b64decode(response.data['png'])
        filename = 'some_image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...