I'm working on a node module that parses RTF files and does some find and replace.
(我正在研究一个解析RTF文件并进行一些查找和替换的节点模块。)
I have already come up with a solution for special characters expressed in escaped unicode here , but have ran into a wall when it comes to CJK characters.(我已经想出了在逃跑的unicode表示特殊字符的解决方案在这里 ,但都撞上了墙,当涉及到CJK字符。)
Is there an easy way to do these conversions in JavaScript, either with a library or built in?(有没有简单的方法可以通过JavaScript或使用库或内置方法来进行这些转换?)
Example:
(例:)
An RTF file viewed in plain text contains:
(以纯文本格式查看的RTF文件包含:)
Now testing symbols {鈴:200638d}
When parsed in NodeJS, this part of the file looks like:
(在NodeJS中进行解析时,文件的这一部分看起来像:)
Now testing symbols {
f1 'e2'8f
f0 :200638d}
I understand that \f1
and \f0
denote font changes, and the \'e2\'8f
block is the actual character... but how can I take \'e2\'8f
and convert it back to 鈴
, or conversely, convert 鈴
to \'e2\'8f
?
(我知道\f1
和\f0
表示字体更改,而\'e2\'8f
块是实际的字符...但是我如何才能将\'e2\'8f
转换回鈴
,或者相反,将其转换为鈴
\'e2\'8f
?)
I have tried looking up the character in different encodings and am not seeing anything that remotely resembles \'e2\'8f
.
(我尝试用不同的编码查找字符,但没有看到与\'e2\'8f
相似的内容。)
I understand that the RTF control \'hh
is A hexadecimal value, based on the specified character set (may be used to identify 8-bit values)
( source ) or maybe the better definition comes from Microsoft RTF Spec;(我知道RTF控件\'hh
是A hexadecimal value, based on the specified character set (may be used to identify 8-bit values)
( 源 ),或者更好的定义来自Microsoft RTF Spec;)
%xHH (OCTET with the hexadecimal value of HH)
( download ) but I have no idea what to do with that information to get conversions going on this.(%xHH (OCTET with the hexadecimal value of HH)
( 下载 ),但我不知道该如何处理该信息以进行转换。)
ask by DjH translate from so
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…