Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
340 views
in Technique[技术] by (71.8m points)

how to return date and title from the below link using BeautifulSoup in python

I want to extract data from the link below using BeautifulSoup package in python I am trying to get all the links of the first page and then get all the related data of each link

example as : publish_date & title

but the system crash and display the below error :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-95-0fd35627bc48> in <module>
     52     s = BeautifulSoup(requests.get(link).content, "lxml")
     53 
---> 54     date_published = s.find("span", class_="t-mute").getText(strip=True)
     55     title = s.find("h1", class_="h3 t-break").getText(strip=True)
     56     print(f"{date_published} {title}

", "-" * 80)

AttributeError: 'NoneType' object has no attribute 'getText'

==================================

import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)


links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])


for link in links:
    
    s = BeautifulSoup(requests.get(link).content, "lxml")

    date_published = s.find("span", class_="t-mute").getText(strip=True)
    title = s.find("h1", class_="h3 t-break").getText(strip=True)
    print(f"{date_published} {title}

", "-" * 80)
question from:https://stackoverflow.com/questions/65902564/how-to-return-date-and-title-from-the-below-link-using-beautifulsoup-in-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

you search the wrong element


import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)


links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])


for link in links:
    s = BeautifulSoup(requests.get(link).content, "lxml")

    date_info = s.find_all("li", class_="t-mute")[-1]
    date_published = date_info.find("span", class_="u-none").getText(strip=True)

    title = s.find("h1", class_="h3 t-break").getText(strip=True)
    print(f"{date_published} {title}

", "-" * 80)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...