Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

node.js - How to convert character encoding from CP932 to UTF-8 in nodejs javascript, using the nodejs-iconv module (or other solution)

I'm attempting to convert a string from CP932 (aka Windows-31J) to utf8 in javascript. Basically I'm crawling a site that ignores the utf-8 request in the request header and returns cp932 encoded text (even though the html metatag indicates that the page is shift_jis).

Anyway, I have the entire page stored in a string variable called "html". From there I'm attempting to convert it to utf8 using this code:

var Iconv = require('iconv').Iconv;
var conv = new Iconv('CP932', 'UTF-8//TRANSLIT//IGNORE');

var myBuffer = new Buffer(html.length * 3);
myBuffer.write(html, 0, 'utf8')
var utf8html = (conv.convert(myBuffer)).toString('utf8');

The result is not what it's supposed to be. For example, the string: "投稿者さんの 稚内全日空ホテル のクチコミ (感想?情報)" comes out as "??????e???ゑ?????????? ??t??????S??????????z??e???? ???ク??`??R??~ (??????z??E????????)"

If I remove //TRANSLIT//IGNORE (Which should cause it to return similar characters for missing characters, and failing that omit non-transcode-able characters), I get this error: Error: EILSEQ, Illegal character sequence.

I'm open to using any solution that can be implemented in nodejs, but my search results haven't yielded many options outside of the nodejs-iconv module.

nodejs-iconv ref: https://github.com/bnoordhuis/node-iconv

Thanks!

Edit 24.06.2011: I've gone ahead and implemented a solution in Java. However I'd still be interested in a javascript solution to this problem if somebody can solve it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I got same trouble today :)
It depends libiconv. You need libiconv-1.13-ja-1.patch.
Please check followings.

or you can avoid problem using iconv-jp try

npm install iconv-jp

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...