I have a website that contains tables (trs and tds). I want to create a structured CSV file from the table data. I'm trying to create field names from the scraped table as those field names can change depending upon the month or selections.
While I have been successful at iterating through the table and actually scraping the data I want to use as my field names I have yet to figure out how to yield that data into the CSV file.
Right now I have them scraped into an Item named "h1header" and when yielded to a CSV file they appear as rows under that item key "h1header" so:
Project Owning Org
Project Date Range
Fee Factor
Project Organization
Project Manager
Fee Calculation Method
Project Code
Project Lead
Status
Project Title
Total Project Value
Condition
External System Code
Funded Value
Billing Type
What I would ultimately like is the following:
Project Owning Org, Project Date Range, Fee Factor, Project Organization ...etc
so instead of rows they are columns and then I can populate the multiple tables on the page that are formatted with the same h1header with the data as field values of those columns.
Below is an example of the html that I'm scraping. This particular tbody.h1 repeats multiple times on the page depending on the results.
<table class="report">
<tbody class="h1"><tr><td colspan="22">
<table class="report" >
<tbody class="h1">
<tr>
<td class="label">Project Owning Organization:</td><td>1.02.10</td>
<td class="label">Project Date Range:</td><td>8/12/2020 - 8/11/2021</td>
<td class="label">Fee Factor:</td><td>—</td>
</tr>
<tr>
<td class="label">Project Organization:</td><td>1.2.26.1</td>
<td class="label">Project Manager:</td><td>Smith, John</td>
<td class="label">Fee Calculation Method:</td><td>—</td>
</tr>
<tr>
<td class="label">Project Code:</td><td>PROJECT.001</td>
<td class="label">Project Lead:</td><td>Doe, Jane</td>
<td class="label">Status:</td><td>Backlog</td>
</tr>
<tr>
<td class="label">Project Title:</td><td>Scrapy Project</td>
<td class="label">Total Project Value:</td><td>1,438.00</td>
<td class="label">Condition:</td><td>Green<img src="/images/status_green.png" alt="Green"
title="Green"></td>
</tr>
<tr>
<td class="label">External System Code:</td><td>—</td>
<td class="label">Funded Value:</td><td>1,438.00</td>
<td class="label">Billing Type:</td><td>FP</td>
</tr>
</tbody>
There are other tables within this html (tbody.h1 and tbody.detail) where I will then need to append columns to the above.
I've done this in Java using Beautiful Soup by creating and writing to arrays then ultimately exporting those built arrays as csv files. Python Scrapy is FAR easier to get the data than Java was and I'm sure I'm over complicating this but am stuck trying to figure it out so any guidance would be appreciated!
question from:
https://stackoverflow.com/questions/65927304/structuring-a-table-using-scrapy-data 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…