Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
330 views
in Technique[技术] by (71.8m points)

c# - How to get specific table data by html agility pack

I am making a web scraper to pull stock info and save to a database. My plan is to get the name of the company and prices only (Latest price, Closing price YCP, etc) and store as objects.

URL = view-source:https://www.dsebd.org/latest_share_price_scroll_l.php If need, then please follow from 5460 line

Here I need to escape first tr and then pull every td[3-7].

<div class="table-responsive inner-scroll">
                                <table class='table table-bordered background-white shares-table fixedHeader'>
                                    <thead>
                                        <tr>
                                            <th width="4%">#</th>
                                            <th width="12%">TRADING CODE</th>
                                            <th width="12%">LTP*</th>
                                            <th width="12%">HIGH</th>
                                            <th width="12%">LOW</th>
                                            <th width="12%">CLOSEP*</th>
                                            <th width="12%">YCP*</th>
                                            <th width="12%">CHANGE</th>
                                            <th width="12%">TRADE</th>
                                            <th width="12%">VALUE (mn)</th>
                                            <th width="12%">VOLUME</th>
                                        </tr>
                                    </thead>
                                    <tbody>
                                                                                <tr>
                                            <td width="4%">1</td>
                                            <td width="15%">
                                                <a href="displayCompany.php?name=1JANATAMF" class='ab1'>
                                                    1JANATAMF                                               </a>
                                            </td>
                                            <td width="10%">6.3</td>
                                            <td width="10%">6.7</td>
                                            <td width="12%">6.3</td>
                                            <td width="11%">6.5</td>
                                            <td width="12%">6.6</td>
                                            <td width="12%" style="color: red">-0.3</td>
                                            <td width="11%">218</td>
                                            <td width="11%">11.593</td>
                                            <td width="11%">1,771,986</td>
                                        </tr>
                                    </tbody>
                                                                            <tr>
                                            <td width="4%">2</td>
                                            <td width="15%">
                                                <a href="displayCompany.php?name=1STPRIMFMF" class='ab1'>
                                                    1STPRIMFMF                                              </a>
                                            </td>
                                            <td width="10%">20.2</td>
                                            <td width="10%">21.9</td>
                                            <td width="12%">20</td>
                                            <td width="11%">20.2</td>
                                            <td width="12%">21.3</td>
                                            <td width="12%" style="color: red">-1.1</td>
                                            <td width="11%">420</td>
                                            <td width="11%">16.914</td>
                                            <td width="11%">815,552</td>
                                        </tr>
                                    </tbody>... More stocks

Here is my code.

    public Worker(ILogger<Worker> logger, IParseService parseService)
            {
                _logger = logger;
                _parseService = parseService;
                _url = "https://www.dsebd.org/latest_share_price_scroll_l.php";
            }
    
            protected override async Task ExecuteAsync(CancellationToken stoppingToken)
            {
                while (!stoppingToken.IsCancellationRequested)
                {
                    var HtmlDoc = GetHtml(_url);
                    var mainNode = HtmlDoc.DocumentNode.SelectSingleNode("//div[@class='table-responsive inner-scroll']/table[contains(@class, 'table table-bordered background-white shares-table fixedHeader')]").ChildNodes;
    
                

foreach (var nodes in mainNode)
            {
                //Code to get the info
}

Thanks for reading my problem, any help is very much appreciated.

question from:https://stackoverflow.com/questions/65833369/how-to-get-specific-table-data-by-html-agility-pack

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
foreach (HtmlNode node in mainNode.SelectNodes("//tr"))
                {
                    var latestPrice = node.SelectSingleNode("td[2]") == null ? "" : node.SelectSingleNode("td[2]").InnerText;
                    var highestPrice = node.SelectSingleNode("td[3]") == null ? "" : node.SelectSingleNode("td[3]").InnerText;
                    var closingPrice = node.SelectSingleNode("td[4]") == null ? "" : node.SelectSingleNode("td[4]").InnerText;
                    var yesterdayPrice = node.SelectSingleNode("td[5]") == null ? "" : node.SelectSingleNode("td[5]").InnerText;
                    var change = node.SelectSingleNode("td[6]") == null ? "" : node.SelectSingleNode("td[6]").InnerText;
                    var trade = node.SelectSingleNode("td[7]") == null ? "" : node.SelectSingleNode("td[7]").InnerText;
                    var value = node.SelectSingleNode("td[8]") == null ? "" : node.SelectSingleNode("td[8]").InnerText;
                    var volume = node.SelectSingleNode("td[9]") == null ? "" : node.SelectSingleNode("td[9]").InnerText;

                    Regex regex = new Regex(@"^[a - zA - Z]{ 3,}$/"); 

                          Match match = regex.Match(latestPrice);

                    if (match.Success) { Console.WriteLine("{0} {1} {2} {3} {4} {5} {6} {7} {8}", latestPrice, highestPrice, closingPrice, yesterdayPrice, change, trade, value, volume); }
                    continue;
                    
                }

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...