c# - description on each constans specified in pdfname, since i need to be able to retrieve both images and text at the same time

Question

Welcome To Ask or Share your Answers For Others

c# - description on each constans specified in pdfname, since i need to be able to retrieve both images and text at the same time

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

c# - description on each constans specified in pdfname, since i need to be able to retrieve both images and text at the same time

i am having a trouble in retrieving images and text in a pdf file at the same, i was able to get images and text in a pdf file but not at the same time (this will cause a question of whether to render the image first or the text first for example in my panel control?), maybe if you guys can help me define what does each constants in pdfname means? i tried using pdfname.all but it returns null, but when using pdfname.resources it returns procset, font and xobject. i used xobject for image, but what are procset and font (could this be the style of the text? does it have pdfname.text for retrieving text)?

thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:15:51+0000

First of all,

i am having a trouble in retrieving images and text in a pdf file at the same

for this task you should use the iText(Sharp) parser API. In iTextSharp you essentially implement IRenderListener (an interface with methods for being informed about (bitmap) images and text fragments in a content stream) and process the page contents with it:

PdfReader reader = new PdfReader(...);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
int pageNumber = [... the number of the page you are interested in; may be a loop variable ...];

IRenderListener listener = new [... your IRenderListener implementation ...]
parser.ProcessContent(pageNumber, listener);

You ask

whether to render the image first or the text first for example in my panel control

The IRenderListener methods also retrieve information on the location of the bitmap or text fragment in question.

For ideas how the text fragments may be combined in your listener, you may want to be inspired by the implementations SimpleTextExtractionStrategy or LocationTextExtractionStrategy present in iTextSharp.

If you insist on doing it manually, though...

maybe if you guys can help me define what does each constants in pdfname means?

You find the definitions of what the names map to in the PDF specification ISO 32000-1:2008 a copy of which Adobe made available here.

when using pdfname.resources it returns procset, font and xobject. i used xobject for image, but what are procset and font (could this be the style of the text?

The contents of the page Resource Dictionaries are explained in section 7.8.3 of the specification.

does it have pdfname.text for retrieving text)?

You'll find how test is presented in page content streams and xobjects in section 9.

Categories

c# - description on each constans specified in pdfname, since i need to be able to retrieve both images and text at the same time

c# - description on each constans specified in pdfname, since i need to be able to retrieve both images and text at the same time

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags