Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
676 views
in Technique[技术] by (71.8m points)

document - 是否有将文档转换为纯文本的通用解决方案?(Is there a general solution for converting documents to plain text?)

By documents, I mean word, libreoffice etc, and maybe also pdfs and web pages.

(对于文档,我的意思是单词,libreoffice等,也许还有pdfs和网页。)

In particular, for purposes of comparison, it would be nice if the plain text was in the same order as it would appear to a reader of the printed document, and if the plain text was stable, which is to say that a trivial change such as making a word boldface shouldn't change the plain text version.

(特别地,出于比较的目的,如果纯文本的顺序与打印文档的读者所看到的顺序相同,并且纯文本是稳定的,那将是很好的,也就是说,因为用黑体字表示不应更改纯文本版本。)

Unixy answers preferred, but I'll take what I can get!

(首选Unixy答案,但我会尽我所能!)

  ask by John Lawrence Aspden translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

libreoffice does a good job on all the types of things it can read:

(libreoffice在它可以读取的所有类型的内容上都做得很好:)

libreoffice --headless --convert-to txt:Text name.doc

or (looping in bash):

(或(以bash循环播放):)

for i in * ; 
do 
  echo "$i" ;
  libreoffice --headless --convert-to txt:Text "$i" ; 
done

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...