Basically the page is dynamically rendered via JavaScript
once it's loads. so you will not be able to parse the objects until you render it firstly. Therefore requests
module will not render the JavaScript
.
You can use selenium
approach to achieve that. otherwise you can use HTMLSession
from html_request
module to render it on the fly.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
import re
from time import sleep
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.sec.gov/ix?doc=/Archives/edgar/data/1090727/000109072720000003/form8-kq42019earningsr.htm")
sleep(1)
soup = BeautifulSoup(driver.page_source, 'html.parser')
for item in soup.findAll("a", style=re.compile("^text")):
print(item.get("href"))
driver.quit()
Output:
https://www.sec.gov/Archives/edgar/data/1090727/000109072720000003/exhibit991-q42019earni.htm
https://www.sec.gov/Archives/edgar/data/1090727/000109072720000003/exhibit992-q42019finan.htm
However if you want just the first url;
url = soup.find("a", style=re.compile("^text")).get("href")
print(url)
Output:
https://www.sec.gov/Archives/edgar/data/1090727/000109072720000003/exhibit991-q42019earni.htm
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…