Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
289 views
in Technique[技术] by (71.8m points)

objective c - Reading PDF files as string through iPhone application

I am facing some problem in iPhone application development for "Reading PDF". I have tried following code. I know I have used wrong methods for parsing - parsing methods are just used for searching purpose. But I want to convert entire pdf text in to a string. Say for example Apple's MobileHIG.pdf - I have used in this code.

@implementation NetPDFViewController

size_t totalPages;  // a variable to store total pages

// a method to get the pdf ref
CGPDFDocumentRef MyGetPDFDocumentRef (const char *filename) {
    CFStringRef path;
    CFURLRef url;
    CGPDFDocumentRef document;
    path = CFStringCreateWithCString (NULL, filename,kCFStringEncodingUTF8);
    url = CFURLCreateWithFileSystemPath (NULL, path, kCFURLPOSIXPathStyle, 0);
    CFRelease (path);
    document = CGPDFDocumentCreateWithURL (url);// 2
    CFRelease(url);
    int count = CGPDFDocumentGetNumberOfPages (document);// 3
    if (count == 0) {
        printf("`%s' needs at least one page!", filename);
        return NULL;
    }
    return document;
}

// table methods to parse pdf
static void op_MP (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("MP /%s
", name);   
}

static void op_DP (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("DP /%s
", name);   
}

static void op_BMC (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("BMC /%s
", name);  
}

static void op_BDC (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("BDC /%s
", name);  
}

static void op_EMC (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("EMC /%s
", name);  
}

// a method to display pdf page.

void MyDisplayPDFPage (CGContextRef myContext,size_t pageNumber,const char *filename) {
    CGPDFDocumentRef document;
    CGPDFPageRef page;
    document = MyGetPDFDocumentRef (filename);// 1
    totalPages=CGPDFDocumentGetNumberOfPages(document);
    page = CGPDFDocumentGetPage (document, pageNumber);// 2

    CGPDFDictionaryRef d;

    d = CGPDFPageGetDictionary(page);

// ----- edit   problem here - CGPDFDictionary is completely unknown 
// ----- as we don't know keys & values of it.
    CGPDFScannerRef myScanner; 
    CGPDFOperatorTableRef myTable;
    myTable = CGPDFOperatorTableCreate();
    CGPDFOperatorTableSetCallback (myTable, "MP", &op_MP);
    CGPDFOperatorTableSetCallback (myTable, "DP", &op_DP);
    CGPDFOperatorTableSetCallback (myTable, "BMC", &op_BMC);
    CGPDFOperatorTableSetCallback (myTable, "BDC", &op_BDC);
    CGPDFOperatorTableSetCallback (myTable, "EMC", &op_EMC);

    CGPDFContentStreamRef myContentStream = CGPDFContentStreamCreateWithPage (page);// 3
    myScanner = CGPDFScannerCreate (myContentStream, myTable, NULL);// 4

    CGPDFScannerScan (myScanner);// 5

//  CGPDFDictionaryRef d;

    CGPDFStringRef str; // represents a sequence of bytes

    d = CGPDFPageGetDictionary(page);

    if (CGPDFDictionaryGetString(d, "Thumb", &str)){
        CFStringRef s;
        s = CGPDFStringCopyTextString(str);
        if (s != NULL) {
            //need something in here in case it cant find anything
            NSLog(@"%@ testing it", s);
        }
        CFRelease(s);       
//      CFDataRef data = CGPDFStreamCopyData (stream, CGPDFDataFormatRaw);
    }

// -----------------------------------  

    CGContextDrawPDFPage (myContext, page);// 3
    CGContextTranslateCTM(myContext, 0, 20);
    CGContextScaleCTM(myContext, 1.0, -1.0);
    CGPDFDocumentRelease (document);// 4
}

- (void)viewDidLoad {
    [super viewDidLoad];


// -------------------------------------------------------- 
// code for simple direct image from pdf docs.
    UIGraphicsBeginImageContext(CGSizeMake(320, 460));
    initialPage=28;
    MyDisplayPDFPage(UIGraphicsGetCurrentContext(), initialPage, [[[NSBundle mainBundle] pathForResource:@"MobileHIG" ofType:@"pdf"] UTF8String]);
    imgV.image=UIGraphicsGetImageFromCurrentImageContext();
    imgV.image=[imgV.image rotate:UIImageOrientationDownMirrored];  
}

- (void)touchesBegan:(NSSet *)touches withEvent:(UIEvent *)event{
    UITouch *touch = [touches anyObject];
    CGPoint LasttouchPoint =  [touch locationInView:self.view];
    int LasttouchX = LasttouchPoint.x;
    startpoint=LasttouchX;
}


- (void)touchesMoved:(NSSet *)touches withEvent:(UIEvent *)event{

}

- (void)touchesEnded:(NSSet *)touches withEvent:(UIEvent *)event{
    UITouch *touch = [touches anyObject];
    CGPoint LasttouchPoint =  [touch locationInView:self.view];
    int LasttouchX = LasttouchPoint.x;
    endpoint=LasttouchX;
    if(startpoint>(endpoint+75)){
        initialPage++;
        [self loadPage:initialPage nextOne:YES];
    } else if((startpoint+75)<endpoint){
        initialPage--;
        [self loadPage:initialPage nextOne:NO];
    }
}


-(void)loadPage:(NSUInteger)page nextOne:(BOOL)yesOrNo{
    if(page<=totalPages && page>0){
        UIGraphicsBeginImageContext(CGSizeMake(720, 720));  
        MyDisplayPDFPage(UIGraphicsGetCurrentContext(), page, [[[NSBundle mainBundle] pathForResource:@"MobileHIG" ofType:@"pdf"] UTF8String]);

        CATransition *transition = [CATransition animation];
        transition.duration = 0.75;
        transition.timingFunction = [CAMediaTimingFunction functionWithName:kCAMediaTimingFunctionEaseInEaseOut];
        transition.type=kCATransitionPush;
        if(yesOrNo){
            transition.subtype=kCATransitionFromRight;
        } else {
            transition.subtype=kCATransitionFromLeft;
        }

        transition.delegate = self;
        [imgV.layer addAnimation:transition forKey:nil];
        imgV.image=UIGraphicsGetImageFromCurrentImageContext();
        imgV.image=[imgV.image rotate:UIImageOrientationDownMirrored];
    }
}

But I didn't get success to read even a single line from the pdf document. What is still missing?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you want to extract some content from a pdf file, then you may want to read the following:

Parsing PDF Content

from the Quartz 2D programming guide.

Basically, you will use a CGPDFScanner object to parse the contents, which works as follows. You register a few callbacks that will be automatically invoked by Quartz 2D upon encountering some pdf operators in the pdf stream. After this initial step, you then actually start parsing the pdf stream.

Taking a brief look at your code, it appears that you are not following the steps required to parse the pdf content of the page you get through CGPDFDocumentGetPage(). You need first to setup the callbacks using CGPDFOperatorTableCreate() and CGPDFOperatorTableSetCallback(), then you get the page, you need to create a content stream using that page (using CGPDFContentStreamCreateWithPage()) and then instantiate a CGPDFScanner through CGPDFScannerCreate() and actually start scanning through CGPDFScannerScan().

The "Parsing PDF Content" section of the document pointed out by the above URL gives you all of the information required to implement pdf parsing.

Hope this helps.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...