I've created a script to get different names from this website filtering State Province
to Alabama
and Country
to United States
in the search box. The script can parse the names from the first page. However, I can't figure out how I can get the results from next pages as well using requests.
There are two options in there to get all the names. Option one: using this show all 410
and option two: making use of next button
.
I've tried with (capable of grabbing names from the first page):
import re
import requests
from bs4 import BeautifulSoup
URL = "https://cci-online.org/CCI/Verify/CCI/Credential_Verification.aspx"
params = {
'errorpath': '/CCI/Verify/CCI/Credential_Verification.aspx'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'
r = s.get(URL)
params['WebsiteKey'] = re.search(r"gWebsiteKey[^']+'(.*?)'",r.text).group(1)
params['hkey'] = re.search(r"gHKey[^']+'(.*?)'",r.text).group(1)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['ctl01$TemplateBody$WebPartManager1$gwpciPeopleSearch$ciPeopleSearch$ResultsGrid$Sheet0$Input4$DropDown1'] = 'AL'
payload['ctl01$TemplateBody$WebPartManager1$gwpciPeopleSearch$ciPeopleSearch$ResultsGrid$Sheet0$Input5$DropDown1'] = 'United States'
r = s.post(URL,params=params,data=payload)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select("table.rgMasterTable > tbody > tr a[title]"):
print(item.text)
In case someone comes up with any solution based on selenium, I've found success already with the same. However, I'm not willing to go that route:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://cci-online.org/CCI/Verify/CCI/Credential_Verification.aspx"
with webdriver.Chrome() as driver:
driver.get(link)
wait = WebDriverWait(driver,15)
Select(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select[id$='Input4_DropDown1']")))).select_by_value("AL")
Select(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select[id$='Input5_DropDown1']")))).select_by_value("United States")
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input[id$='SubmitButton']"))).click()
wait.until(EC.visibility_of_element_located((By.XPATH, "//a[contains(.,'show all')]"))).click()
wait.until(EC.invisibility_of_element_located((By.XPATH, "//span[@id='ctl01_LoadingLabel' and .='Loading']")))
soup = BeautifulSoup(driver.page_source,"lxml")
for item in soup.select("table.rgMasterTable > tbody > tr a[title]"):
print(item.text)
How can I get the rest of the names from that webpage leading to the next pages using requests module?
question from:
https://stackoverflow.com/questions/65642333/unable-to-fetch-the-rest-of-the-names-leading-to-the-next-pages-from-a-webpage-u 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…