Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
540 views
in Technique[技术] by (71.8m points)

select - Parsing HTML Reading Option Tag Content with HtmlAgillityPack

I am trying to use HtmlAgilityPack to parse HTML, but am having problems.

Sample HTML Doc:

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>V?etky regiony</option>
      <optgroup>Banskobystricky kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystricky kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - ústecky kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínsky kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba pou?ívatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">V?etci</option>
          .
          .
          .
      <optgroup label="Zá?uby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

I need parse value from <select name="akt-miest" id="onoffaci">

For example:

<option value="**a_0**">**V?etci**</option>

I need get value **a_0** and text **V?etci**.

So I try first access to select by Id:

var selectNode = htmlDoc.GetElementbyId("onoffaci");

Then with Xpath select all option node.

var nodes = selectNode.SelectNodes("//option");

And get values:

foreach (var node in nodes)
{
    string roomName = node.NextSibling.InnerText;
    string roomId = node.Attributes["value"].Value;
    rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}

But I get values from another select (<select id="region" name="region">) this select is on the top of html code.

EDITED:

I apply advice of Darin Dimitrov an try this:

HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");

var nodes = selectNode.SelectNodes("option");

foreach (var node in nodes)
{
    string roomName = node.NextSibling.InnerText;
    string roomId = node.Attributes["value"].Value;
    rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}

return rooms;

I parse only first three option element, because I think the problem is that select consist

optgroup tag.

<select name="akt-miest" id="onoffaci">
  <option value="a_0">V?etci</option>
  <option value="a_1">Iba prihlásení</option>
  <option value="a_5" selected="selected">Teraz na Pokeci</option>
  <optgroup label="Hlavné miestnosti">
    <option value="m_13">&nbsp;&nbsp;&nbsp;Bez záv?zkov</option>
    <option value="m_9">&nbsp;&nbsp;&nbsp;Do pohody</option>
    <option value="m_39">&nbsp;&nbsp;&nbsp;Dámsky klub</option>
  </optgroup>
  .
  .
  .

I try select all following node with this

var nodes = selectNode.SelectNodes("option::*");

But I get this error: xpath has an invalid token.

I would like access to all childs of selectNode:

HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");

EDIT #2:

Here is it all html file, from which I need parse option tags.

http://hotfile.com/dl/98442053/577b556/source.html

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

By default, the <OPTION> tag is treated by Html Agility Pack as "Empty", which means it does not need a closing </OPTION>. In this case, the closing tag is discarded. You can change this behavior using the HtmlNode.ElementFlags collection.

Here is a code that should do what you want:

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(yourHtml);

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='onoffaci']//option"))
{
    Console.WriteLine("Value=" + node.Attributes["value"].Value);
    Console.WriteLine("InnerText=" + node.InnerText);
    Console.WriteLine();
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...