python - Webscraping using Beautifulsoup 4 - extracting contact info

Question

Welcome To Ask or Share your Answers For Others

python - Webscraping using Beautifulsoup 4 - extracting contact info

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Webscraping using Beautifulsoup 4 - extracting contact info

This is my first post, please forgive me if I break some rules. Im trying to webscrape vendor information using code which looks like

  soup.find_all('span', class_ = "class-name")

Please refer to the image attached. I wanted to get the contact number but it is not given as text or something similar. Each digit seems to be in its own class tag and even inside that the digit isnt in text. Im also not familiar with webdev so if anyone could give suggestions I would really appreciate it.

url : https://www.justdial.com/Pune/Sunrise-Enterprises-Budhwar-Peth/020PXX20-XX20-130817131104-Z3I2_BZDET?xid=UHVuZSBFbGVjdHJvbmljIENvbXBvbmVudCBEZWFsZXJz

another similar page with multiple contact details is : https://www.justdial.com/Pune/Galaxy-Enterprises-And-Electronics-Behind-Bharti-Vidyapeeth-Near-Ichapurti-Mandir-Ambegaon-Budruk/020PXX20-XX20-140930130951-K4X6_BZDET?xid=UHVuZSBFbGVjdHJvbmljIENvbXBvbmVudCBEZWFsZXJz

Thanks

question from:https://stackoverflow.com/questions/65892112/webscraping-using-beautifulsoup-4-extracting-contact-info

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:17:30+0000

The second style tag contains css code in which the sequence of the icon-xx properties defines which number the property matches with. This is used to load an image with this number on the webpage, so there are no numbers to scrape. The solution is to 1) map the icon-xx properties to numbers based on their sequence in the css string; 2) find the phone number spans in the html body and retrieve the matching numbers:

import requests
from bs4 import BeautifulSoup

url = 'https://www.justdial.com/Pune/Sunrise-Enterprises-Budhwar-Peth/020PXX20-XX20-130817131104-Z3I2_BZDET?xid=UHVuZSBFbGVjdHJvbmljIENvbXBvbmVudCBEZWFsZXJz'
r = requests.get(url, headers={'User-Agent' : "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"})
soup = BeautifulSoup(r.text, "html.parser")

text = soup.find_all('style', {"type": "text/css"}, text=True)[1]
data = text.contents[0].split('smoothing:grayscale}', 1)[1].split('
')[0]
icon_items = [i.split(':')[0] for i in data.split('.') if len(i)>0]
items = ['0','1','2','3','4','5','6','7','8','9','+','-',')','(']
full_list = dict(zip(icon_items, items))

phone_numbers = soup.find_all('span',{'class':'telnowpr'})
for i in phone_numbers:
    numbers = i.find_all('span')
    number = [full_list[y.attrs['class'][1]] for y in numbers]
    print("phone number: " + ''.join([str(elem) for elem in number]) )

Result:

phone number: 07947197693
phone number: 07947197693
phone number: 07947197693

Categories

python - Webscraping using Beautifulsoup 4 - extracting contact info

python - Webscraping using Beautifulsoup 4 - extracting contact info

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags