Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
535 views
in Technique[技术] by (71.8m points)

unicode - Ruby 1.9: how can I properly upcase & downcase multibyte strings?

So matz made the decision to keep upcase and downcase limited to /[A-Z]/i in ruby 1.9.1.

ActiveSupport::Multibyte has long had great i18n case jiggering in ruby 1.8.x via String#mb_chars.

However, when tried under ruby 1.9.1, it doesn't seem to work. Here's a simple test script I wrote, along with the output I'm getting:

$ cat test.rb
# encoding: UTF-8

puts("@ #{RUBY_VERSION} " + (__ENCODING__ rescue $KCODE).to_s)
sd, su = "I?t?rnati?nàliz?ti?n", "I?T?RN?TI?NàLIZ?TI?N"
def ps(u, d, k); puts "%-30s:  %24s / %-24s" % [k, u, d] end
ps sd.upcase, su.downcase, "Plain ruby"

require 'rubygems'; require 'active_support'
ps sd.upcase, su.downcase, "With active_support"
ps sd.mb_chars.upcase.to_s, su.mb_chars.downcase.to_s, "With active_support mb_chars"

$ ruby -KU test.rb
@ 1.8.7 UTF8
Plain ruby                    :  I?T?RNaTI?NàLIZ?TI?N / i?t?rn?ti?nàliz?ti?n
With active_support           :  I?T?RNaTI?NàLIZ?TI?N / i?t?rn?ti?nàliz?ti?n
With active_support mb_chars  :  I?T?RN?TI?NàLIZ?TI?N / i?t?rnati?nàliz?ti?n

$ ruby1.9 test.rb
@ 1.9.1 UTF-8
Plain ruby                    :      I?T?RNaTI?NàLIZ?TI?N / i?t?rn?ti?nàliz?ti?n
With active_support           :      I?T?RNaTI?NàLIZ?TI?N / i?t?rn?ti?nàliz?ti?n
With active_support mb_chars  :      I?T?RNaTI?NàLIZ?TI?N / i?t?rn?ti?nàliz?ti?n

So, how do I get internationalized upcase and downcase with ruby 1.9.1?

update

I should add that I also tested with ActiveSupport from the current master, 2-3-* and 3-0-unstable rails branches at GitHub. Same results.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

for anybody coming from Google by ruby upcase utf8:

> "your problem chars here ????ü I?t?rnati?nàliz?ti?n".mb_chars.upcase.to_s
=> "YOUR PROBLEM CHARS HERE ???Iü I?T?RN?TI?NàLIZ?TI?N"

solution is to use mb_chars.

Documentation:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...