Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
265 views
in Technique[技术] by (71.8m points)

python - Scrapy will start by command line but not with CrawlerProcess

Here is my spider :

import scrapy


class PhonesCDSpider(scrapy.Spider):
    name = "phones_CD"

    custom_settings = {
        "FEEDS": {
            "Spiders/spiders/cd.json": {"format": "json"},
        },
    }

    start_urls = [
        'https://www.cdiscount.com/telephonie/telephone-mobile/smartphones/tous-nos-smartphones/l-144040211.html'
    ]

    def parse(self, response):
        for phone in response.css('div.prdtBlocInline.jsPrdtBlocInline'):
        phone_url = phone.css('div.prdtBlocInline.jsPrdtBlocInline a::attr(href)').get()

            # go to the phone page

            yield response.follow(phone_url, callback=self.parse_phone


    def parse_phone(self, response):
        yield {
            'title': response.css('h1::text').get(),
            'price': response.css('span.fpPrice.price.jsMainPrice.jsProductPrice.hideFromPro::attr(content)').get(),
            'EAN' : response.css('script').getall(),
            'image_url' : response.css('div.fpMainImg a::attr(href)').get(),
            'url': response.url

        }

If I start it in the terminal with: scrapy crawl phones_CD -O test.json, it works fine. But if I run it in my python script (where the other crawlers work and are configured the same way):

    def all_crawlers():
        process = CrawlerProcess()
        process.crawl(PhonesCBSpider)
        process.crawl(PhonesKFSpider)
        process.crawl(PhonesMMSpider)
        process.crawl(PhonesCDSpider)
        process.start()
    all_crawlers()

I get an error, here is the traceback :

2021-01-05 18:16:06 [scrapy.core.engine] INFO: Spider opened
2021-01-05 18:16:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-01-05 18:16:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6026
2021-01-05 18:16:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.cdiscount.com/telephonie/telephone-mobile/smartphones/tous-nos-smartphones/l-144040211.html> (referer: None)
2021-01-05 18:16:07 [scrapy.core.engine] INFO: Closing spider (finished)

Thanks in advance for your time!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

According to Scrapy docs feed-exports
Scrapy FEEDS setting does not support relative path like your "Spiders/spiders/cd.json".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...