python - Bypass rate limit for requests.get

Question

Welcome To Ask or Share your Answers For Others

python - Bypass rate limit for requests.get

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

python - Bypass rate limit for requests.get

I want to constantly scrape a website - once every 3-5 seconds with

requests.get('http://www.example.com', headers=headers2, timeout=35).json()

But the example website has a rate limit and I want to bypass that. How can I do so?? I thought about doing it with proxies but was hoping there were some other ways?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:15:55+0000

You would have to do some very low level stuff. Utilizing likely socket and urllib2.
First do your research. How are they limiting your query rate? Is it by IP, or session based (server side cookie) or local cookies? I suggest going to the site manually as your first step of research, and using a web-developer tool to view all headers communicated.

One you figure this out, create a plan to manipulate it. Lets say it is session based, you could utilize multiple threads to control several individual instances of a scraper, each with unique sessions.

Now, if it is IP based, then you must spoof your IP which is much more complex.

Categories

python - Bypass rate limit for requests.get

python - Bypass rate limit for requests.get

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags