Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
454 views
in Technique[技术] by (71.8m points)

python - Using loginform with scrapy

The scrapy framework (https://github.com/scrapy/scrapy) provides a library for use when logging into websites that require authentication, https://github.com/scrapy/loginform.
I have looked through the docs for both programs however I cannot seem to figure out how to get scrapy to call loginform before running. The login works fine with just loginform.
Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

loginform is just a library, totally decoupled from Scrapy.

You have to write the code to plug it in the spider you want, probably in a callback method.

Here is an example of a structure to do this:

import scrapy
from loginform import fill_login_form

class MySpiderWithLogin(scrapy.Spider):
    name = 'my-spider'

    start_urls = [
        'http://somewebsite.com/some-login-protected-page',
        'http://somewebsite.com/another-protected-page',
    ]

    login_url = 'http://somewebsite.com/login-page'

    login_user = 'your-username'
    login_password = 'secret-password-here'

    def start_requests(self):
        # let's start by sending a first request to login page
        yield scrapy.Request(self.login_url, self.parse_login)

    def parse_login(self, response):
        # got the login page, let's fill the login form...
        data, url, method = fill_login_form(response.url, response.body,
                                            self.login_user, self.login_password)

        # ... and send a request with our login data
        return scrapy.FormRequest(url, formdata=dict(data),
                           method=method, callback=self.start_crawl)

    def start_crawl(self, response):
        # OK, we're in, let's start crawling the protected pages
        for url in self.start_urls:
            yield scrapy.Request(url)

    def parse(self, response):
        # do stuff with the logged in response

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...