As I am still quite new to web scraping I am currently practicing some basics such as this one. I have scraped the categories from 'th' tag and the players from the 'tr' tag and appended it to a couple empty lists. The categories come out fine from get_text(), but when I try printing the players it has a number rank before the first letter of the name, and the player's team abbreviation letters after the last name.
3 things I am trying to do:
1)output only the first and last name of each player by doing some slicing from the list but I cannot figure out any easier way to do it. There is probably a quicker way inside the tags where I can call the class or using soup.findAll again in the html, or something else I am unware of, but I currently do not know how or what I am missing.
2)take the number ranks before the name and append it to an empty list.
3)take the 3 last abbreviated letters and append it to an empty list
Any suggestions would be much appreciated!
from bs4 import BeautifulSoup as bs4
import requests
import pandas as pd
from time import sleep
players = []
categories = []
url ='https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgPoints/dir/desc'
source = requests.get(url)
soup = bs4(source.text, 'lxml')
for i in soup.findAll('th'):
c = i.get_text()
categories.append(c)
for i in soup.findAll('tr'):
player = i.get_text()
players.append(player)
players = players[1:51]
print(categories)
print(players)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…