Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
957 views
in Technique[技术] by (71.8m points)

beautifulsoup - Price scraping from array of urls with sendMail function

I am pretty new in scraping but the main idea is simple. I want to make an array of URL's with product that I am interesting of one website.

If I want to monitor a new product, I will just put the new URL in the array.

The problem is here: when I scrape price it will always return to me the current price, but that way how I can compare it, is it cheaper now or it is above the last price.

Here my test solution for one item for the moment:

import requests
from bs4 import BeautifulSoup
import smtplib

#Get list of URLS insted of just one
#Loop thru all urls in array and get name and price
#Store last price when inserting new item for monitoring
#Assign price to every single URL to know what is the current price


url = 'https://www.example.com'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
def check_price():
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    title = soup.find(class_='base').get_text().strip()
    price = soup.find(class_='price').get_text().strip()
    replace_price = price.replace(",",".")
    converted_price = float(replace_price[0:4])
    print(converted_price)
    if(converted_price < 80):
        send_mail()
    

def send_mail():
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()

    server.login('example@gmail.com', 'example123')

    subject = 'Price change scraper'
    body = 'The price of the following item has just been changed: https://www.example.com/example.html'
    old_price = 'Old price is: 74,88  with VAT'
    msg = f"Subject: {subject}

{body}

{old_price}"
    server.sendmail(
        'example@gmail.com',
        'example@gmail.com',
        msg
    )
    print('Email has been sent successfully!') 
    server.quit() 


check_price()
question from:https://stackoverflow.com/questions/65845414/price-scraping-from-array-of-urls-with-sendmail-function

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Put url to scrape and init price in to a list of dicts. Loop as you would do it and compare scraped price against the init price.

Define your data to scrape:

initData = [
    {'url':'https://www.example.com', 'price': 100},
    {'url':'https://www.example1.com', 'price': 200},
    {'url':'https://www.example2.com', 'price': 300},
    {'url':'https://www.example3.com', 'price': 400},
    {'url':'https://www.example4.com', 'price': 500}
] 

Loop over the data:

for item in initData:
        check_price(item)

Scrape, compare and in case send mail:

def check_price(data):

    url = data['url']
    initPrice = data['price']
    ...
    if(converted_price < initPrice):
        send_mail()

Example

import requests
from bs4 import BeautifulSoup
import smtplib

#Get list of URLS insted of just one
#Loop thru all urls in array and get name and price
#Store last price when inserting new item for monitoring
#Assign price to every single URL to know what is the current price


initData = [
    {'url':'https://www.example.com', 'price': 100},
    {'url':'https://www.example1.com', 'price': 200},
    {'url':'https://www.example2.com', 'price': 300},
    {'url':'https://www.example3.com', 'price': 400},
    {'url':'https://www.example4.com', 'price': 500}
]


url = 'https://www.example.com'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}


def check_price(data):
    
    url = data['url']
    initPrice = data['price']
    
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    title = soup.find(class_='base').get_text().strip()
    price = soup.find(class_='price').get_text().strip()
    replace_price = price.replace(",",".")
    converted_price = float(replace_price[0:4])
    print(converted_price)
    if(converted_price < initPrice):
        data['convertedPrice'] = converted_price
        send_mail(data)       

def send_mail(data):
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()

    server.login('example@gmail.com', 'example123')

    subject = 'Price change scraper'
    body = 'The price of the following item has just been changed: https://www.example.com/example.html'
    old_price = 'Old price is: 74,88  with VAT'
    msg = f"Subject: {subject}

{body}

{old_price}"
    server.sendmail(
        'example@gmail.com',
        'example@gmail.com',
        msg
    )
    print('Email has been sent successfully!') 
    server.quit() 

for item in initData:
    check_price(item)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...