Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
364 views
in Technique[技术] by (71.8m points)

python - Message digest of pdf in digital signature

I want to manually verify the integrity of a signed pdf. I have been able to reach at:-

  • got the value of '/Content' node from pdf(using PyPDF2). This is a der encoded PKCS#7 certificate.

Now as per pdf specifications, the message digest of the pdf data is stored along with the certificate in /Content node. Tried a lot but I am not able to get the digest value which I would eventually compare with hashed pdf content(specified by /ByteRange).

  • PDF specification snapshot:- snap

Don't understand the last part that says write signature object data into the dictionary. where does this write actually happens and how can I extract the message digest?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

(This is more a comment than an answer. Due to the size and formatting restrictions of comments, I put it into an answer nonetheless.)

A signature in a PDF

In a prior question the OP already inserted a sketch illustrating a signature embedded in a PDF in case of SubFilter ETSI.CAdES.detached, adbe.pkcs7.detached, or adbe.pkcs7.sha1:

Figure 3 Digital ID and a signed PDF document

But this is merely a sketch, and interpreting it too literally may leave the incorrect impression that the value of the Contents entry in the signature dictionary is something like a list containing a "Certificate", a "Signed message digest" and a "Timestamp". Furthermore calling this list the "Signature value" can also confuse as that name is also used for a small part of the content, see below.

The actual content is specified (cf. this document) as:

When PKCS#7 signatures are used, the value of Contents shall be a DER-encoded PKCS#7 binary data object containing the signature. The PKCS#7 object shall conform to RFC3852 Cryptographic Message Syntax.

(As an aside: While the specification here requires the data object to be DER-encoded, there are many signed PDFs in the wild which use some much less strict BER-encoding for the object as a whole and DER only for parts also required by RFC3852 to be DER-encoded.)

The PKCS#7 binary data object

The PKCS#7 binary data object containing the signature conforming to RFC3852 more exactly is a ContentInfo object with a SignedData content, often named a "signature container".

According to RFC 3852

The CMS associates a content type identifier with a content. The syntax MUST have ASN.1 type ContentInfo:

  ContentInfo ::= SEQUENCE {
    contentType ContentType,
    content [0] EXPLICIT ANY DEFINED BY contentType }

The signed-data content type shall have ASN.1 type SignedData:

  SignedData ::= SEQUENCE {
    version CMSVersion,
    digestAlgorithms DigestAlgorithmIdentifiers,
    encapContentInfo EncapsulatedContentInfo,
    certificates [0] IMPLICIT CertificateSet OPTIONAL,
    crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
    signerInfos SignerInfos }

Here you see the optional collection certificates in which usually at least the signer certificate and often also its chain of issuer certificates are contained. Here is the "Certificate" from the sketch above.

You also see the signerInfos structure which contains actual signing information:

  SignerInfos ::= SET OF SignerInfo

Per-signer information is represented in the type SignerInfo:

  SignerInfo ::= SEQUENCE {
    version CMSVersion,
    sid SignerIdentifier,
    digestAlgorithm DigestAlgorithmIdentifier,
    signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
    signatureAlgorithm SignatureAlgorithmIdentifier,
    signature SignatureValue,
    unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }
  SignedAttributes ::= SET SIZE (1..MAX) OF Attribute
  Attribute ::= SEQUENCE {
    attrType OBJECT IDENTIFIER,
    attrValues SET OF AttributeValue }

(Here you see the structure the RFCs call the SignatureValue... as already mentioned, the sketch above calling the whole signature container "Signature value" can confuse as down here already is an entity of a type called like that.)

You are after the message digest of the signed PDF byte ranges for a adbe.pkcs7.detached type PDF signature. There actually are two possibilities:

  • In the rare case of the most simple SignerInfo instances, there are no SignedAttributes. In this case the SignatureValue is the value of a signature algorithm immediately applied to the signed byte ranges.

If the signature algorithm is based on RSA, you can retrieve the document digest value by decoding the value using the signer's public key (from his certificate) and extracting the digest from the decoded DigestInfo object.

    DigestInfo ::= SEQUENCE {
      digestAlgorithm DigestAlgorithmIdentifier,
      digest Digest }

If the signature algorithm is based on DSA or EC DSA, you cannot retrieve the digest value at all. These algorithm only allow you to check whether a digest value you provide (e.g. having hashed the signed byte range of the document as you have retrieved it) is the originally signed one.

  • In the far more common case of SignerInfo instances with SignedAttributes, you have to search these SignedAttributes for the message digest attribute which is identified by
 id-messageDigest OBJECT IDENTIFIER ::= { iso(1) member-body(2)
        us(840) rsadsi(113549) pkcs(1) pkcs9(9) 4 }

As already mentioned in comments, though, I cannot explain how to drill down here using Python or openssl. You will need some tool which knows these specific ASN.1 structures or ASN.1 structures in general.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...