html - Cant Scrape webpage with Python Requests Library

Question

Welcome To Ask or Share your Answers For Others

html - Cant Scrape webpage with Python Requests Library

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

html - Cant Scrape webpage with Python Requests Library

I am trying to get some info from a webpage (link below) using Requests in python; however, the HTML data that I see in my browser doesn't seem to exist when I connect via python's request library. None of the xpath queries return any information. I am able to use requests for other sites such as amazon (the site below is actually owned by Amazon, but I can't seem to scrape any information from it).

url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'
user_agent = {'User-agent': 'Mozilla/5.0'} 
page = requests.get(url, headers=user_agent)
tree = html.fromstring(page.text)
query = tree.xpath("//span[@id=ourPrice]/text()")

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:33+0000

The element is generated using javascript, you can use selenium to get the source, to get headless browsing combine it with phantomjs:

url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'

from selenium import webdriver

browser = webdriver.PhantomJS()
browser.get(url)
_html = browser.page_source

from bs4 import BeautifulSoup

print(BeautifulSoup(_html).find("span",{"id":"ourPrice"}).text)
$50

Categories

html - Cant Scrape webpage with Python Requests Library

html - Cant Scrape webpage with Python Requests Library

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags