Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
215 views
in Technique[技术] by (71.8m points)

beautifulsoup - Python BS: Fetching rows with and without color attribute

I have some html that looks like this (this represents rows of data in a table, i.e the data between tr and /tr is one row in a table)

<tr bgcolor="#f4f4f4">
<td height="25" nowrap="NOWRAP">&nbsp;CME_ES&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:46&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;Connected&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:00&nbsp;</td>
**<td height="25" nowrap="NOWRAP" bgcolor="#55aa2a">&nbsp;--:--:--&nbsp;</td>**
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;01:25:00 &nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp; 22:00:00&nbsp;</td>
</tr>
.
.
.
<tr bgcolor="#ffffff">
<td height="25" nowrap="NOWRAP">&nbsp;CME_NQ&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:46&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;Connected&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;191&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:01&nbsp;</td>
**<td height="25" nowrap="NOWRAP">&nbsp;--:--:--&nbsp;</td>**
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;01:25:00 &nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp; 22:00:00&nbsp;</td>
</tr>

I have code that grabs the color from each row set:

mrkt_stat = []
for td in site.findAll('td'):
 if 'bgcolor' in td.attrs:
  mrkt_stat.append(td.attrs['bgcolor'])

Issue is that when the row set has no bgcolor attribute, no data is added to mrkt_stat list.

How do I scrape this so that even if a row has no bgcolor attr, it will still be added to the list as NULL or N/A?

It is useful to know that the bgcolor attr (that may or may not be present) will always appear in the 9th line of a row set whether that row has the attr or not (look at the html lines enclosed with **)

EDIT: Output should look like the following (a list of all color attrs from row 9 of each row set and display 'N/A' if there is no color attr present):

['#55aa2a',...,'N/A'] 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could add an else statement to your if statement:

mrkt_stat = []

for td in site.findAll('td'):
    if 'bgcolor' in td.attrs:
        mrkt_stat.append(td.attrs['bgcolor'])
    else:
        mrkt_stat.append('N/A')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...