Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
369 views
in Technique[技术] by (71.8m points)

python scrapy conversion to exe file using pyinstaller

I am trying to convert a scrapy script to a exe file. The main.py file looks like this:

from scrapy.crawler import CrawlerProcess
from amazon.spiders.amazon_scraper import Spider

spider = Spider()
process = CrawlerProcess({
    'FEED_FORMAT': 'csv',
    'FEED_URI': 'data.csv',
    'DOWNLOAD_DELAY': 3,
    'RANDOMIZE_DOWNLOAD_DELAY': True,
    'ROTATING_PROXY_LIST_PATH': 'proxies.txt',
    'USER_AGENT_LIST': 'useragents.txt',
    'DOWNLOADER_MIDDLEWARES' : 
    {
        'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
        'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
        'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
        'random_useragent.RandomUserAgentMiddleware': 400
    }
})

process.crawl(spider)
process.start() # the script will block here until the crawling is finished

The scrapy script looks like any other. I am using pyinstaller.exe --onefile main.py to convert it to an exe file. When I try to open the main.exe file inside dist folder it starts outputing errors:

FileNotFoundError: [Errno 2] No such file or directory: '...\scrapy\VERSION'

I can fix it by creating a scrapy folder inside the dist folder and uploading a VERSION file from lib/site-packages/scrapy. After that, many other errors occur but I can fix them by uploading some scrapy libraries.

In the end it starts outputing error:

ModuleNotFoundError: No module named 'email.mime'

I don`t even know what does it mean. I have never seen it.

I am using:

Python 3.6.5
Scrapy 1.5.0
pyinstaller 3.3.1
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I had the same situation.
Instead of trying to make pyinstaller count this file (I failed all my attempts to do it) I decided to check and change some part of scrapy code in order to avoid this error.

I noticed that there is only one place where scrapyVERSION file used-- scrapy\__init__.py
I decided to hardcode that value from scrapyversion by changing scrapy__init__.py :

#import pkgutil
__version__ = "1.5.0" #pkgutil.get_data(__package__, 'VERSION').decode('ascii').strip()
version_info = tuple(int(v) if v.isdigit() else v
                     for v in __version__.split('.'))
#del pkgutil

After this change there is no need to store version in external file. As there is no reference to scrapyversion file - that error will not occure.

After that I had the same FileNotFoundError: [Errno 2] with scrapymime.types file.
There is the same situation with scrapymime.types - it used only in scrapy esponsetypes.py

...
#from pkgutil import get_data
...
    def __init__(self):
        self.classes = {}
        self.mimetypes = MimeTypes()
        #mimedata = get_data('scrapy', 'mime.types').decode('utf8')
        mimedata = """
        Copypaste all 750 lines of scrapymime.types here
"""
        self.mimetypes.readfp(StringIO(mimedata))
        for mimetype, cls in six.iteritems(self.CLASSES):
            self.classes[mimetype] = load_object(cls)

This change resolved FileNotFoundError: [Errno 2] with scrapymime.types file. I agree that hardcode 750 lines of text into python code is not the best decision.

After that I started to recieve ModuleNotFoundError: No module named scrapy.spiderloader . I added "scrapy.spiderloader" into hidden imports parameter of pyinstaller.
Next Issue ModuleNotFoundError: No module named scrapy.statscollectors.
Final version of pyinstaller command for my scrapy script consist of 46 hidden imports - after that I received working .exe file.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...