python - Scrapy will start by command line but not with CrawlerProcess

Question

Welcome To Ask or Share your Answers For Others

python - Scrapy will start by command line but not with CrawlerProcess

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Scrapy will start by command line but not with CrawlerProcess

Here is my spider :

import scrapy


class PhonesCDSpider(scrapy.Spider):
    name = "phones_CD"

    custom_settings = {
        "FEEDS": {
            "Spiders/spiders/cd.json": {"format": "json"},
        },
    }

    start_urls = [
        'https://www.cdiscount.com/telephonie/telephone-mobile/smartphones/tous-nos-smartphones/l-144040211.html'
    ]

    def parse(self, response):
        for phone in response.css('div.prdtBlocInline.jsPrdtBlocInline'):
        phone_url = phone.css('div.prdtBlocInline.jsPrdtBlocInline a::attr(href)').get()

            # go to the phone page

            yield response.follow(phone_url, callback=self.parse_phone


    def parse_phone(self, response):
        yield {
            'title': response.css('h1::text').get(),
            'price': response.css('span.fpPrice.price.jsMainPrice.jsProductPrice.hideFromPro::attr(content)').get(),
            'EAN' : response.css('script').getall(),
            'image_url' : response.css('div.fpMainImg a::attr(href)').get(),
            'url': response.url

        }

If I start it in the terminal with: scrapy crawl phones_CD -O test.json, it works fine. But if I run it in my python script (where the other crawlers work and are configured the same way):

    def all_crawlers():
        process = CrawlerProcess()
        process.crawl(PhonesCBSpider)
        process.crawl(PhonesKFSpider)
        process.crawl(PhonesMMSpider)
        process.crawl(PhonesCDSpider)
        process.start()
    all_crawlers()

I get an error, here is the traceback :

2021-01-05 18:16:06 [scrapy.core.engine] INFO: Spider opened
2021-01-05 18:16:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-01-05 18:16:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6026
2021-01-05 18:16:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.cdiscount.com/telephonie/telephone-mobile/smartphones/tous-nos-smartphones/l-144040211.html> (referer: None)
2021-01-05 18:16:07 [scrapy.core.engine] INFO: Closing spider (finished)

Thanks in advance for your time!

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-19T04:10:44+0000

According to Scrapy docs feed-exports
Scrapy FEEDS setting does not support relative path like your "Spiders/spiders/cd.json".

Categories

python - Scrapy will start by command line but not with CrawlerProcess

python - Scrapy will start by command line but not with CrawlerProcess

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags