Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
515 views
in Technique[技术] by (71.8m points)

ruby on rails - Sorting UTF-8 strings in RoR

I am trying to figure out a 'proper' way of sorting UTF-8 strings in Ruby on Rails.

In my application, I have a select box that is populated with countries. As my application is localized, each existing locale has a countries.yml file that relates a country's id to the localized name for that country. I can't sort the strings manually in the yml file because I need the ID to be consistent across all locales.

What I have done is create a ascii_name method which uses the unidecode gem to convert accented and non-latin characters to their ascii equivalent (for instance, "Afeganist?o" would become "Afeganistao"), and then sort on that:

require 'unidecode'

class Country
  def ascii_name
    Unidecoder.decode(name).gsub("[?]", "").gsub(/`/, "'").strip
  end
end

Country.all.sort_by(:&ascii_name)

However, there are obvious issues with this:

  • It cannot properly sort non-latin locales, as there may not be a direct analogous latin character.
  • It makes no distinction between a letter and all accented forms of that letter (so, for instance, A and ? become interchangeable)

Does anyone know of a better way that I could sort my strings?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Ruby peforms string comparisons based on byte values of characters:

%w[à a e].sort
# => ["a", "e", "à"]

To properly collate strings according to locale, the ffi-icu gem could be used:

require "ffi-icu"

ICU::Collation.collate("it_IT", %w[à a e])
# => ["a", "à", "e"]

ICU::Collation.collate("de", %w[a s x ?])
# => ["a", "s", "?", "x"]

As an alternative:

collator = ICU::Collation::Collator.new("it_IT")
%w[à a e].sort { |a, b| collator.compare(a, b) }
# => %w[a à e]

Update To test how strings should collate according to locale rules the ICU project provides this nice tool.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...