Here is my spider :
import scrapy
class PhonesCDSpider(scrapy.Spider):
name = "phones_CD"
custom_settings = {
"FEEDS": {
"Spiders/spiders/cd.json": {"format": "json"},
},
}
start_urls = [
'https://www.cdiscount.com/telephonie/telephone-mobile/smartphones/tous-nos-smartphones/l-144040211.html'
]
def parse(self, response):
for phone in response.css('div.prdtBlocInline.jsPrdtBlocInline'):
phone_url = phone.css('div.prdtBlocInline.jsPrdtBlocInline a::attr(href)').get()
# go to the phone page
yield response.follow(phone_url, callback=self.parse_phone
def parse_phone(self, response):
yield {
'title': response.css('h1::text').get(),
'price': response.css('span.fpPrice.price.jsMainPrice.jsProductPrice.hideFromPro::attr(content)').get(),
'EAN' : response.css('script').getall(),
'image_url' : response.css('div.fpMainImg a::attr(href)').get(),
'url': response.url
}
If I start it in the terminal with: scrapy crawl phones_CD -O test.json, it works fine. But if I run it in my python script (where the other crawlers work and are configured the same way):
def all_crawlers():
process = CrawlerProcess()
process.crawl(PhonesCBSpider)
process.crawl(PhonesKFSpider)
process.crawl(PhonesMMSpider)
process.crawl(PhonesCDSpider)
process.start()
all_crawlers()
I get an error, here is the traceback :
2021-01-05 18:16:06 [scrapy.core.engine] INFO: Spider opened
2021-01-05 18:16:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-01-05 18:16:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6026
2021-01-05 18:16:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.cdiscount.com/telephonie/telephone-mobile/smartphones/tous-nos-smartphones/l-144040211.html> (referer: None)
2021-01-05 18:16:07 [scrapy.core.engine] INFO: Closing spider (finished)
Thanks in advance for your time!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…