Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

cygwin - readPDF (tm package) in R

I tried to read some online pdf document in R. I used readRDF function. My script goes like this

safex <- readPDF(PdftotextOptions='-layout')(elem=list(uri='C:/Users/FCG/Desktop/NoteF7000.pdf'),language='en',id='id1')

R showed the message that running command has status 309. I tried different pdftotext options. however, it is the same message. and the text file created has no content.

Can anyone read this pdf

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

readPDF has bugs and probably isn't worth bothering with (check out this well-documented struggle with it).

Assuming that...

  1. you've got xpdf installed (see here for details)

  2. your PATHs are all in order (see here for details of how to do that) and you've restarted your computer.

Then you might be better off avoiding readPDF and instead using this workaround:

system(paste('"C:/Program Files/xpdf/pdftotext.exe"', 
             '"C:/Users/FCG/Desktop/NoteF7000.pdf"'), wait=FALSE)

And then read the text file into R like so...

require(tm)
mycorpus <- Corpus(URISource("C:/Users/FCG/Desktop/NoteF7001.txt"))

And have a look to confirm that it went well:

inspect(mycorpus)

A corpus with 1 text document

The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
  create_date creator 
Available variables in the data frame are:
  MetaID 

[[1]]
Market Notice
Number: Date F7001 08 May 2013

New IDX SSF (EWJG) The following new IDX SSF contract will be added to the list and will be available for trade today.

Summary Contract Specifications Contract Code Underlying Instrument Bloomberg Code ISIN Code EWJG EWJG IShares MSCI Japan Index Fund (US) EWJ US EQUITY US4642868487 1 (R1 per point)

Contract Size / Nominal

Expiry Dates & Times

10am New York Time; 14 Jun 2013 / 16 Sep 2013

Underlying Currency Quotations Minimum Price Movement (ZAR) Underlying Reference Price

USD/ZAR Bloomberg Code (USDZAR Currency) Price per underlying share to two decimals. R0.01 (0.01 in the share price)

4pm underlying spot level as captured by the JSE.

Currency Reference Price

The same method as the one utilized for the expiry of standard currency futures on standard quarterly SAFEX expiry dates.

JSE Limited Registration Number: 2005/022939/06 One Exchange Square, Gwen Lane, Sandown, South Africa. Private Bag X991174, Sandton, 2146, South Africa. Telephone: +27 11 520 7000, Facsimile: +27 11 520 8584, www.jse.co.za

Executive Director: NF Newton-King (CEO), A Takoordeen (CFO) Non-Executive Directors: HJ Borkum (Chairman), AD Botha, MR Johnston, DM Lawrence, A Mazwai, Dr. MA Matooane , NP Mnxasana, NS Nematswerani, N Nyembezi-Heita, N Payne Alternate Directors: JH Burke, LV Parsons

Member of the World Federation of Exchanges

Company Secretary: GC Clarke
Settlement Method

Cash Settled

-

Clearing House Fees -

On-screen IDX Futures Trading: o 1 BP for Taker (Aggressor) o Zero Booking Fees for Maker (Passive) o No Cap o Floor of 0.01 Reported IDX Futures Trades o 1.75 BP for both buyer and seller o No Cap o Floor of 0.01

Initial Margin Class Spread Margin V.S.R. Expiry Date

R 10.00 R 5.00 3.5 14/06/2013, 16/09/2013

The above instrument has been designated as "Foreign" by the South African Reserve Bank

Should you have any queries regarding IDX Single Stock Futures, please contact the IDX team on 011 520-7399 or idx@jse.co.za

Graham Smale Director: Bonds and Financial Derivatives Tel: +27 11 520 7831 Fax:+27 11 520 8831 E-mail: grahams@jse.co.za

Distributed by the Company Secretariat +27 11 520 7346

Page 2 of 2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...