php - Weird characters when filling PDF with PDFTk

Question

Welcome To Ask or Share your Answers For Others

php - Weird characters when filling PDF with PDFTk

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:04:48+0000

You're right, utf8_decode() will work for characters which can be encoded as Windows-1252 (i.e. U+0000–U+00FF).

However it won't work for characters which can't be encoded in Windows-1252.

You can always encode characters using UTF-16BE, though. You can do this for a single field only, e.g. to encode the word "?zil":

<<
/V (t?^@?^@z^@i^@l)
/T (name)
>>

(Here the "^@" indicates a NUL character (U+0000). This is how it looks in my editor (vim), if the file is encoded in Windows-1252 (latin1).)

Note that you need to use a byte order mark (which will appear as "t?" if your file is encoded in Windows-1252) and you'll need to encode the entire string (between the two parentheses) in UTF-16.

If you're generating the FDF in a PHP script you can do something like this:

<<
/V (<?php echo chr(0xfe) . chr(0xff) . str_replace(array('\', '(', ')'), array('\\', '(', ')'), mb_convert_encoding("?zil", 'UTF-16BE')); ?>)
/T (name)
>>

You can also write out the hex codes like this (i.e. enclosed in angular brackets rather than parentheses):

<<
/V <FEFF00F6007A0069006C>
/T (name)
>>

This has exactly the same result (the string "?zil"). It's less efficient in terms of characters, but it actually seems to be more reliable in pdftk, which has some bugs I've found (in version 2.02).

Finally, you can also write out the Unicode code point for any character in octal notation (ddd). For example, ? has codepoint U+00F6, which in octal is 366, so you can write:

<<
/V (366zil)
/T (name)
>>

However, this only works up to U+00FF (octal 377). Beyond that, you'd have to use UTF-16.

The PDF standard allows you to set the encoding to UTF-8 for the whole FDF document. I tried this and it didn't work with pdftk, however in theory it would be done like this:

%FDF-1.2
1 0 obj
<<
/Version /1.3
/Encoding /utf_8
/FDF

(You would presumably have to set the FDF version to 1.3 (or more) in the header too, according to the standard.)

You can also do this at the field level:

<<
/V (?zil)
/T (name)
/Encoding /utf_8
>>

But as I said, I didn't manage to get any of this to work. pdftk just seems to ignore it.

Categories

php - Weird characters when filling PDF with PDFTk

php - Weird characters when filling PDF with PDFTk

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags