Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
735 views
in Technique[技术] by (71.8m points)

utf 8 - Ruby method to remove accents from UTF-8 international characters

I am trying to create a 'normalized' copy of a string, to help reduce duplicate names in a database. The names contain many international characters (ie. accented letters), and I want to create a copy with the accents removed.

I did come across the method below, but cannot get it to work. I can't seem to find what the Unicode Hacks plugin is.

  # Utility method that retursn an ASCIIfied, downcased, and sanitized string.
  # It relies on the Unicode Hacks plugin by means of String#chars. We assume
  # $KCODE is 'u' in environment.rb. By now we support a wide range of latin
  # accented letters, based on the Unicode Character Palette bundled inMacs.
  def self.normalize(str)
     n = str.chars.downcase.strip.to_s
     n.gsub!(/[? ???¢?£?¤?¥???]/u,    'a')
     n.gsub!(/?|/u,                  'ae')
     n.gsub!(/[???]/u,                'd')
     n.gsub!(/[?§???????]/u,          'c')
     n.gsub!(/[?¨???a????????????]/u, 'e')
     n.gsub!(/??/u,                   'f')
     n.gsub!(/[??????£]/u,            'g')
     n.gsub!(/[?¥?§]/,                'h')
     n.gsub!(/[?????-???ˉ?????-]/u,     'i')
     n.gsub!(/[?ˉ?±?3?μ]/u,           'j')
     n.gsub!(/[?·??]/u,               'k')
     n.gsub!(/[?????o????]/u,         'l')
     n.gsub!(/[?±??????????]/u,       'n')
     n.gsub!(/[?2?3?′?μ?????????]/u,  'o')
     n.gsub!(/??/u,                  'oe')
     n.gsub!(/??/u,                   'q')
     n.gsub!(/[??????]/u,             'r')
     n.gsub!(/[???????è?]/u,          's')
     n.gsub!(/[?¥?£?§è?]/u,           't')
     n.gsub!(/[?1?o???????ˉ?±?-???3]/u,'u')
     n.gsub!(/?μ/u,                   'w')
     n.gsub!(/[?????·]/u,             'y')
     n.gsub!(/[?????o]/u,             'z')
     n.gsub!(/s+/,                   ' ')
     n.gsub!(/[^sa-z0-9_-]/,          '')
     n
  end

Do I need to 'require' a particular library/gem? Or maybe someone could recommend another way to go about this.

I am not using Rails, nor do I plan on doing so.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I generally use I18n to handle this:

1.9.3p392 :001 > require "i18n"
 => true
1.9.3p392 :002 > I18n.transliterate("Hé les mecs!")
 => "He les mecs!"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...