Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

ruby on rails - Nokogiri: Searching for <div> using XPath

I use Nokogiri (Rubygem) css search to look for certain <div> inside my html. It looks like Nokogiri's css search doesn't like regex. I would like to switch to Nokogiri's xpath search as this seems to support regex in search strings.

How do I implement the (pseudo) css search mentioned below in an xpath search?

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <p id='para-22'>B</p>
      <h1>Bla</h1>
      <p id='para-3'>C</p>
      <p id='para-4'>D</p>
      <div class="foo" id="eq-1_bl-1">
        <p id='para-5'>3</p>
      </div>
    </body>
  </html>"
HTML_END

# my_block is given
my_bl = "1"
# my_eq corresponds to this regex
my_eq = "/[0-9]+/"

# FIXME The following line should be changed to an xpath search.
if my_div = value.css("div#eq-#{my_eq}_bl-#{my_bl}.foo").first
  # doing some stuff with the <p> inside the div
end
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Mike Dalessio (one half of the Nokogiri core developers) gave me an answer on #nokogiri (irc.freenode.net). Looks like neither Nokogiri CSS nor XPath search do support regex matching. This is his solution on how to search for regular expressions with Nokogiri:

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <p id='para-22'>B</p>
      <h1>Bla</h1>
      <p id='para-3'>C</p>
      <p id='para-4'>D</p>
      <div class="foo" id="eq-1_bl-1">
        <p id='para-5'>3</p>
      </div>
      <div class="bar" id="eq-1_bl-1">
        <p id='para-5'>3</p>
      </div>
    </body>
  </html>"
HTML_END

# my_block is given
my_bl = "1"
# my_eq corresponds to this regex
my_eq = "[0-9]+"
# full regex to search for in node ids
full_regex = %r(eq-#{my_eq}_bl-#{my_bl})

filter_by_id = Class.new do
  attr_accessor :matches

  def initialize(regex)
    @regex = regex
    @matches = []
  end

  def filter(node_set)
    @matches += node_set.find_all { |x| x['id'] =~ @regex }
  end
end.new(full_regex)

value.css("div.foo:filter()", filter_by_id)
filter_by_id.matches.each do |node|
  puts node
end

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.7k users

...