Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
784 views
in Technique[技术] by (71.8m points)

ios - Swift Calculate MD5 Checksum for Large Files

I'm working on creating the MD5 Checksum for large video files. I'm currently using the code:

extension NSData {
func MD5() -> NSString {
    let digestLength = Int(CC_MD5_DIGEST_LENGTH)
    let md5Buffer = UnsafeMutablePointer<CUnsignedChar>.allocate(capacity: digestLength)

    CC_MD5(bytes, CC_LONG(length), md5Buffer)
    let output = NSMutableString(capacity: Int(CC_MD5_DIGEST_LENGTH * 2))
    for i in 0..<digestLength {
        output.appendFormat("%02x", md5Buffer[i])
    }

    return NSString(format: output)
    }
}

But that creates a memory buffer, and for large video files would not be ideal. Is there a way in Swift to calculate the MD5 Checksum reading a file stream, so the memory footprint will be minimal?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can compute the MD5 checksum in chunks, as demonstrated e.g. in Is there a MD5 library that doesn't require the whole input at the same time?.

Here is a possible implementation using Swift (now updated for Swift 5)

import CommonCrypto

func md5File(url: URL) -> Data? {

    let bufferSize = 1024 * 1024

    do {
        // Open file for reading:
        let file = try FileHandle(forReadingFrom: url)
        defer {
            file.closeFile()
        }

        // Create and initialize MD5 context:
        var context = CC_MD5_CTX()
        CC_MD5_Init(&context)

        // Read up to `bufferSize` bytes, until EOF is reached, and update MD5 context:
        while autoreleasepool(invoking: {
            let data = file.readData(ofLength: bufferSize)
            if data.count > 0 {
                data.withUnsafeBytes {
                    _ = CC_MD5_Update(&context, $0.baseAddress, numericCast(data.count))
                }
                return true // Continue
            } else {
                return false // End of file
            }
        }) { }

        // Compute the MD5 digest:
        var digest: [UInt8] = Array(repeating: 0, count: Int(CC_MD5_DIGEST_LENGTH))
        _ = CC_MD5_Final(&digest, &context)

        return Data(digest)

    } catch {
        print("Cannot open file:", error.localizedDescription)
        return nil
    }
}

The autorelease pool is needed to release the memory returned by file.readData(), without it the entire (potentially huge) file would be loaded into memory. Thanks to Abhi Beckert for noticing that and providing an implementation.

If you need the digest as a hex-encoded string then change the return type to String? and replace

return digest

by

let hexDigest = digest.map { String(format: "%02hhx", $0) }.joined()
return hexDigest

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...