python - Scraping ajax pages with scrapy; the case of slightly adjusted urls - OGeek|极客中国-技术改变生活,极客改变未来

I am scraping some websites with scrapy and am encountering some ajax pages. I have enabled the Scrapy ajax middleware, but that does provide me the html data for these pages.

I noticed that the website offers an html version of its ajax pages, which is only slightly different from the regular urls:

ajax-page: https://www.example.com/general/search/#&ajax=true&page=2
html-equivalent: https://www.example.com/general/search/?&ajax=true&page=2

So basically replacing the # with a ? would change the page from ajax to html. Following this observation, I have several questions:

Is this a common way of websites to just replace one character to get to the html content of an ajax-page?
Should the Scrapy ajax middleware not take care of these "simple" changes?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - Scraping ajax pages with scrapy; the case of slightly adjusted urls

python - Scraping ajax pages with scrapy; the case of slightly adjusted urls

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags