Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
839 views
in Technique[技术] by (71.8m points)

itext - How to read table from PDF using itextsharp?

I am having an problem with reading a table from pdf file. It's a very simple pdf file with some text and a table. The tool i am using is itextsharp. I know there is no table concept in PDF. After some googling, someone said it might be possible to achieve that using itextsharp + custom ITextExtractionStrategy. But I have no idea how to start it. Can someone please give me some hints? or a small piece of sample code?

Cheers

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This code is for reading a table content. all the values are enclosed by ()Tj, so we look for all the values, you can do anything then with the string resulst.

    string _filePath = @"~MyPDF.pdf";
    public List<String> Read()
    {
        var pdfReader = new PdfReader(_filePath);
        var pages = new List<String>();

        for (int i = 0; i < pdfReader.NumberOfPages; i++)
        {
            string textFromPage = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, pdfReader.GetPageContent(i + 1)));

            pages.Add(GetDataConvertedData(textFromPage));
        }

        return pages;
    }

    string GetDataConvertedData(string textFromPage)
    {
        var texts = textFromPage.Split(new[] { "
" }, StringSplitOptions.None)
                                .Where(text => text.Contains("Tj")).ToList();

        return texts.Aggregate(string.Empty, (current, t) => current + 
                   t.TrimStart('(')
                    .TrimEnd('j')
                    .TrimEnd('T')
                    .TrimEnd(')'));
    }

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...