Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

google apps script - ERROR Get pdf-attachments from Gmail as text

I need help troubleshooting this code. I get error which is not documented.

Api's are enabled and instructions are followed original post: Get pdf-attachments from Gmail as text

i get error:

  1. var blob = attachments[0].getAs(MimeType.PDF);

Converting from application/octet-stream to application/pdf is not supported. (line 16, file "bla bla")

Original code is not altered:

 /**
  * Get messages labeled 'templabel', and send myself the text contents of
  * pdf attachments in new emails.
  */
 function myFunction() {

   var threads = GmailApp.search('label:invoices-parsed');
   var threadsMessages = GmailApp.getMessagesForThreads(threads);

   for (var thread = 0; thread < threadsMessages.length; ++thread) {
     var message = threadsMessages[thread][0];
     var messageBody = message.getBody();
     var messageSubject = message.getSubject();
     var attachments = message.getAttachments();

     var blob = attachments[0].getAs(MimeType.PDF);
     var filetext = pdfToText( blob, {keepTextfile: false} );

     GmailApp.sendEmail(Session.getActiveUser().getEmail(), messageSubject, filetext);
   }
 }

 /**
  * See gist: https://gist.github.com/mogsdad/e6795e438615d252584f
  *
  * Convert pdf file (blob) to a text file on Drive, using built-in OCR.
  * By default, the text file will be placed in the root folder, with the same
  * name as source pdf (but extension 'txt'). Options:
  *   keepPdf (boolean, default false)     Keep a copy of the original PDF file.
  *   keepGdoc (boolean, default false)    Keep a copy of the OCR Google Doc file.
  *   keepTextfile (boolean, default true) Keep a copy of the text file.
  *   path (string, default blank)         Folder path to store file(s) in.
  *   ocrLanguage (ISO 639-1 code)         Default 'en'.
  *   textResult (boolean, default false)  If true and keepTextfile true, return
  *                                        string of text content. If keepTextfile
  *                                        is false, text content is returned without
  *                                        regard to this option. Otherwise, return
  *                                        id of textfile.
  *
  * @param {blob}   pdfFile    Blob containing pdf file
  * @param {object} options    (Optional) Object specifying handling details
  *
  * @returns {string}          id of text file (default) or text content
  */
 function pdfToText ( pdfFile, options ) {
   // Ensure Advanced Drive Service is enabled
   try {
     Drive.Files.list();
   }
   catch (e) {
     throw new Error( "To use pdfToText(), first enable 'Drive API' in Resources > Advanced Google Services." );
   }

   // Set default options
   options = options || {};
   options.keepTextfile = options.hasOwnProperty("keepTextfile") ? options.keepTextfile : true;

   // Prepare resource object for file creation
   var parents = [];
   if (options.path) {
     parents.push( getDriveFolderFromPath (options.path) );
   }
   var pdfName = pdfFile.getName();
   var resource = {
     title: pdfName,
     mimeType: pdfFile.getContentType(),
     parents: parents
   };

   // Save PDF to Drive, if requested
   if (options.keepPdf) {
     var file = Drive.Files.insert(resource, pdfFile);
   }

   // Save PDF as GDOC
   resource.title = pdfName.replace(/pdf$/, 'gdoc');
   var insertOpts = {
     ocr: true,
     ocrLanguage: options.ocrLanguage || 'en'
   }
   var gdocFile = Drive.Files.insert(resource, pdfFile, insertOpts);

   // Get text from GDOC  
   var gdocDoc = DocumentApp.openById(gdocFile.id);
   var text = gdocDoc.getBody().getText();

   // We're done using the Gdoc. Unless requested to keepGdoc, delete it.
   if (!options.keepGdoc) {
     Drive.Files.remove(gdocFile.id);
   }

   // Save text file, if requested
   if (options.keepTextfile) {
     resource.title = pdfName.replace(/pdf$/, 'txt');
     resource.mimeType = MimeType.PLAIN_TEXT;

     var textBlob = Utilities.newBlob(text, MimeType.PLAIN_TEXT, resource.title);
     var textFile = Drive.Files.insert(resource, textBlob);
   }

   // Return result of conversion
   if (!options.keepTextfile || options.textResult) {
     return text;
   }
   else {
     return textFile.id
   }
 }

 // From: http://ramblings.mcpher.com/Home/excelquirks/gooscript/driveapppathfolder
 function getDriveFolderFromPath (path) {
   return (path || "/").split("/").reduce ( function(prev,current) {
     if (prev && current) {
       var fldrs = prev.getFoldersByName(current);
       return fldrs.hasNext() ? fldrs.next() : null;
     }
     else { 
       return current ? null : prev; 
     }
   },DriveApp.getRootFolder()); 
 }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try setting the contentType of attachment to "appl??ication/pdf".

var attachments = message.getAttachments();
attachments[0].setContentType("appl??ication/pdf");

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...