I am scraping some websites with scrapy and am encountering some ajax pages. I have enabled the Scrapy ajax middleware, but that does provide me the html data for these pages.
I noticed that the website offers an html version of its ajax pages, which is only slightly different from the regular urls:
So basically replacing the # with a ? would change the page from ajax to html. Following this observation, I have several questions:
- Is this a common way of websites to just replace one character to get to the html content of an ajax-page?
- Should the Scrapy ajax middleware not take care of these "simple" changes?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…