Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
529 views
in Technique[技术] by (71.8m points)

python - Selenium headless browser webdriver [Errno 104] Connection reset by peer

I am trying to scrape data from the URLs below. But selenium fails when driver.get(url) Some times the error is [Errno 104] Connection reset by peer, sometimes [Errno 111] Connection refused. On rare days it works just fine and on my mac with real browser the same spider works fine every single time. So this isn't related to my spider.

Have tried many solutions like waiting got selectors on page, implicit wait, using selenium-requests yo pass proper request headers, etc. But nothing seems to work.

http://www.snapdeal.com/offers/deal-of-the-day
https://paytm.com/shop/g/paytm-home/exclusive-discount-deals

I am using python, selenium & headless Firefox webdriver to achieve this. The os is centos 6.5.

Note: I have many AJAX heavy pages that gets scraped successfully some are below.

http://www.infibeam.com/deal-of-the-day.html, http://www.amazon.in/gp/goldbox/ref=nav_topnav_deals

Already spent many days trying to debug the issue with no luck. Any help would be appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

After days of jingling around this issue, finally found the cause. Writing it here for the benefit of the community. The headless browser was failing due to lack of RAM on the server, strange error messages from webdriver were real pita.

The server was running straight up for 60 days without reboot, Rebooting it did the trick. After increasing the swap by 3 times, has not faced issue for past few days. Also scheduled a task to cleanup page file caches (http://www.yourownlinux.com/2013/10/how-to-free-up-release-unused-cached-memory-in-linux.html).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...