I wrote this code and it works to fine to scrape H1 tags from a list of websites. There are some particular websites that don't have an H1 so an empty list is returned and it gives IndexError: list index out of range, and stops the script.
list_flagged = df['Websites'].to_list()
new_flagged_list = []
for site in list_flagged:
quote_page = requests.get(site, headers=random_header)
soup = BeautifulSoup(quote_page.text, 'html.parser')
h1tag = soup.find_all('h1')
titles = [(h1.get_text()).strip() for h1 in h1tag]
appended = new_flagged_list.append(titles)
print('appended')
if new_flagged_list == ['']:
['x']
new = [x[0] for x in new_flagged_list]
I tried with if new_flagged_list == ['']: to change an empty row but still the error appears. I don't understand anyway why
new = [x[0] for x in new_flagged_list]
ignores an empty list in a list with list index error. Why it cannot keep an empty list?
How can I change the empty list in a list with whatever string to avoid the error?
Thanks!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…