Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
561 views
in Technique[技术] by (71.8m points)

ruby - How can I create a nokogiri case insensitive Xpath selector?

I'm using nokogiri to select the 'keywords' attribute like this:

puts page.parser.xpath("//meta[@name='keywords']").to_html

One of the pages I'm working with has the keywords label with a capital "K" which has motivated me to make the query case insensitive.

<meta name="keywords"> AND <meta name="Keywords"> 

So, my question is: What is the best way to make a nokogiri selection case insensitive?

EDIT Tomalak's suggestion below works great for this specific problem. I'd like to also use this example to help understand nokogiri better though and have a couple issues that I'm wondering about and have not been successful searching for. For example, are the regex 'pseudo classes' Nokogiri Docs appropriate for a problem like this?

I'm also curious about the matches?() method in nokogiri. I have not been able to find any clarification on the method. Does it have anything to do with the 'matches' concept in XPath 2.0 (and therefore could it be used to solve this problem)?

Thanks very much.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Nokogiri allows custom XPath functions. The nokogiri docs that you link to show an inline class definition for when you're only using it once. If you have a lot of custom functions or if you use the case-insensitive match a lot, you may want to define it in a class.

class XpathFunctions

  def case_insensitive_equals(node_set, str_to_match)
    node_set.find_all {|node| node.to_s.downcase == str_to_match.to_s.downcase }
  end

end

Then call it like any other XPath function, passing in an instance of your class as the 2nd argument.

page.parser.xpath("//meta[case_insensitive_equals(@name,'keywords')]",
                  XpathFunctions.new).to_html

In your Ruby method, node_set will be bound to a Nokogiri::XML::NodeSet. In the case where you're passing in an attribute value like @name, it will be a NodeSet with a single Nokogiri::XML::Attr. So calling to_s on it gives you its value. (Alternatively, you could use node.value.)

Unlike using XPath translate where you have to specify every character, this works on all the characters and character encodings that Ruby works on.

Also, if you're interested in doing other things besides case-insensitive matching that XPath 1.0 doesn't support, it's just Ruby at this point. So this is a good starting point.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...