Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
555 views
in Technique[技术] by (71.8m points)

ghostscript - Converting searchable PDF to a non-searchable PDF

I have a PDF which is searchable and I need to convert it into a non-searchable one.

I tried using Ghostscript and change it to JPEG and then back to PDF which does the trick but the file size is way too large and not acceptable.

I tried using Ghostscript to convert the PDF to PS first and then PDF which does the trick as well but the quality is not good enough.

gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pswrite -r1000 -sOutputFile=out.ps in.pdf
gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -dDEVICEWIDTHPOINTS=596 -dDEVICEHEIGHTPOINTS=834 -dPDFSETTINGS=/ebook -sDEVICE=pdfwrite -sOutputFile=out.pdf out.ps

Is there a way to give a good quality to the PDF?

Alternatively is there an easier way to convert a searchable PDF to a non-searchable one?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use Ghostscript to achieve that. You need 2 steps:

  1. Convert the PDF to a PostScript file, which has all used fonts converted to outline shapes. The key here is the -dNOCACHE paramenter:

    gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf
  2. Convert the PS back to PDF (and, maybe delete the intermediate PS again):

    gs -o somepdf-with-outlines.pdf -sDEVICE=pdfwrite somepdf.ps
    rm somepdf.ps

Note, that the resulting PDF will very likely be larger than the original one. (And, without additional command line parameters, all images in the original PDF will likely also be converted according to Ghostscript builtin defaults, unless you add more command line parameters to do otherwise. But the quality should be better than your own attempt to use Ghostscript...)


Update

Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:

 -dNoOutputFonts

which will cause the output devices pdfwrite, ps2write and eps2write "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".

This means that the above two steps can be avoided, and the desired result be achieved with a single command:

 gs -o somepdf-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite somepdf.pdf

Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...