Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
623 views
in Technique[技术] by (71.8m points)

phantomjs 内存泄漏的问题

各位好,

菜鸟这里想通过phantomjs + scrapy爬取网站,但发现随着爬取页面的增长,phantomjs 的内存使用量也一直增加直到内存耗尽,搜了一圈无果。现在简单想法就是每爬取一个网站就把phantomjs 给quit掉,比如直接这样放好像不行,

self.browser.get(response.url)
sel = self.browser.find_element_by_xpath("//pre").text
self..browser.quit()

直接报错,恳求帮忙下

Traceback (most recent call last):
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/app/Project/scrapy/new_stock/new_stock/spiders/newstock.py", line 86, in parse_items
    self.browser.get(response.url)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 250, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 415, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 489, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)

URLError: <urlopen error [Errno 111] Connection refused>


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
try:
    self.browser.get(response.url)
    sel = self.browser.find_element_by_xpath("//pre").text
finally:
    self.browser.quit()

要保证即使异常browser也要quit


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...