Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
150 views
in Technique[技术] by (71.8m points)

python - Webscraping with a loop returns only a single element

When I run a for loop to collect elements within a <div> tag it only returns the first from a list of all with the same class.

For example:

r = requests.get("https://one-versus-one.com/en/rankings/all/statistics")

soup = BeautifulSoup(r.content, 'lxml')

data = {
    'players': [],
    'club': [],
    'rank': []
}
def getstuff(soup):
    products = soup.find_all('div', {'class':'rankings-table'})
    for name in products:
        players = name.find('div', {'class':'player-name rankings-table__player-name'}).text
        club = name.find('span', {'class':'rankings-table__club-name'}).text
        rank = name.find('div', {'class':'rankings-table-cell value rankings-table__value'}).text.strip()
        data['players'] = players
        data['club'] = club
        data['rank'] = rank
        print(data)

getstuff(soup)

This returns:

{'players': 'Lionel Messi', 'club': 'Barcelona', 'rank': '100'}

Where I expected all players, clubs and ranks to be printed within the page.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can try this:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://one-versus-one.com/en/rankings/all/statistics")
soup = BeautifulSoup(r.content, 'lxml')

data = {'players': [],'club': [],'rank': []}

def getstuff(soup):
    products = soup.find('div', {'class':'rankings-table'}).find_all("a")
    for name in products:
        players = name.find('div', {'class':'player-name rankings-table__player-name'}).text
        club = name.find('span', {'class':'rankings-table__club-name'}).text
        rank = name.find('div', {'class':'rankings-table-cell value rankings-table__value'}).text.strip()
        data['players'].append(players)
        data['club'].append(club)
        data['rank'].append(rank)
    print(data)

getstuff(soup)
"""
{'players': ['Lionel Messi', 'Junior Neymar', 'Robert Lewandowski', 'Joao Cancelo', 'Kevin de Bruyne', 'Rodri', 'Jesse Lingard', 'Riyad Mahrez', 'Ilkay Gundogan', 'John Stones'], 'club': ['Barcelona', 'Paris Saint-Germain', 'Bayern Munich', 'Manchester City', 'Manchester City', 'Manchester City', 'West Ham United', 'Manchester City', 'Manchester City', 'Manchester City'], 'rank': ['100', '95', '93', '92', '91', '90', '90', '89', '88', '88']}
"""

You have to use .find_all("a") to get info about all players. And additional you're just making adding new player in data['players'] insted of adding new player and for club, rank same.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...