Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
316 views
in Technique[技术] by (71.8m points)

python - scrapy spider sends spider_close signal before it closes

I have a spider that takes a file as a parameter, this file contains the xpaths.

The spider parses the file and gets the xpaths and starts crawling.

Everything is working fine

Now, I want that spider to run many times, So I did this:

script.py

def setup_crawler(file):
    spider = MySpider(attributesXMLFilePath=file)
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()


for oneFile in myFiles:
    setup_crawler(oneFile')
log.start()
reactor.run()

and in MySpider I do this:

def __init__(self, attributesXMLFilePath):
    dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
        log.msg('The number of pages in the spider {1} are {0}'.format(self.numbers, self.attributesXMLFilePath))
        log.msg('The number of details pages in the spider {1} are {0}'.format(self.numbers2, self.attributesXMLFilePath))
        log.msg('The spider {0} with xml {2} finished working on {1}'.format(self.name, datetime.now(), self.attributesXMLFilePath), level=log.INFO)

but in the log file, I see this:

2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file1.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file1.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file1.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file1.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file1.xml finished working on 2014-06-08 18:18:03.746000
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file1.xml finished working on 2014-06-08 18:18:03.746000
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file2.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file2.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file2.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file2.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file2.xml finished working on 2014-06-08 18:18:03.748000
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file2.xml finished working on 2014-06-08 18:18:03.748000

As you see:

  1. every line is duplicated twice, why?
  2. there are many times that the spider_close function is executed

Note

I have that log data in my log file a lot and I just showed you a sample to explain my problem

Note2

Ofc I am not using `MySpider`, `file1.xml`, and `file2.xml` names, but I couldn't show you the real name for privacy issues. See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...