Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
213 views
in Technique[技术] by (71.8m points)

python - How to open multiple hrefs within a webtable to scrape through selenium

I'm trying to scrape this website using python and selenium. However all the information I need is not on the main page, so how would I click the links in the 'Application number' column one by one go to that page scrape the information then return to original page?

Ive tried:

def getData():
  data = []
  select = Select(driver.find_elements_by_xpath('//*[@id="node-41"]/div/div/div/div/div/div[1]/table/tbody/tr/td/a/@href'))
  list_options = select.options
  for item in range(len(list_options)):
    item.click()
  driver.get(url)

URL: http://www.scilly.gov.uk/planning-development/planning-applications

Screenshot of the site: enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To open multiple hrefs within a webtable to scrape through selenium you can use the following solution:

  • Code Block:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
    
      hrefs = []
      options = Options()
      options.add_argument("start-maximized")
      options.add_argument("disable-infobars")
      options.add_argument("--disable-extensions")
      options.add_argument("--disable-gpu")
      options.add_argument("--no-sandbox")
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:WebDriversChromeDriverchromedriver_win32chromedriver.exe')
      driver.get('http://www.scilly.gov.uk/planning-development/planning-applications')
      windows_before  = driver.current_window_handle # Store the parent_window_handle for future use
      elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.views-field.views-field-title>a"))) # Induce WebDriverWait for the visibility of the desired elements
      for element in elements:
          hrefs.append(element.get_attribute("href")) # Collect the required href attributes and store in a list
      for href in hrefs:
          driver.execute_script("window.open('" + href +"');") # Open the hrefs one by one through execute_script method in a new tab
          WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # Induce  WebDriverWait for the number_of_windows_to_be 2
          windows_after = driver.window_handles
          new_window = [x for x in windows_after if x != windows_before][0] # Identify the newly opened window
          # driver.switch_to_window(new_window) <!---deprecated>
          driver.switch_to.window(new_window) # switch_to the new window
          # perform your webscraping here
          print(driver.title) # print the page title or your perform your webscraping
          driver.close() # close the window
          # driver.switch_to_window(windows_before) <!---deprecated>
          driver.switch_to.window(windows_before) # switch_to the parent_window_handle
      driver.quit() #Quit your program
    
  • Console Output:

      Planning application: P/18/064 | Council of the ISLES OF SCILLY
      Planning application: P/18/063 | Council of the ISLES OF SCILLY
      Planning application: P/18/062 | Council of the ISLES OF SCILLY
      Planning application: P/18/061 | Council of the ISLES OF SCILLY
      Planning application: p/18/059 | Council of the ISLES OF SCILLY
      Planning application: P/18/058 | Council of the ISLES OF SCILLY
      Planning application: P/18/057 | Council of the ISLES OF SCILLY
      Planning application: P/18/056 | Council of the ISLES OF SCILLY
      Planning application: P/18/055 | Council of the ISLES OF SCILLY
      Planning application: P/18/054 | Council of the ISLES OF SCILLY
    

References

You can find a couple of relevant detailed discussions in:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...