Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
552 views
in Technique[技术] by (71.8m points)

encoding - Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

Suppose you have a string like "€fooxA0", encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo" )

In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€fooxA0") but that is now deprecated. "€fooxA0".encode('UTF-8') doesn't do anything, since it is already UTF-8. I tried:

"€fooxA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '')

which yields

"foo"

But that also loses the valid multibyte character €

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
"€fooxA0".encode('UTF-16le', invalid: :replace, replace: '').encode('UTF-8')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...