Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
466 views
in Technique[技术] by (71.8m points)

audio - How can I detect whether a WAV file has a 44 or 46-byte header?

I've discovered it is dangerous to assume that all PCM wav audio files have 44 bytes of header data before the samples begin. Though this is common, many applications (ffmpeg for example), will generate wavs with a 46-byte header and ignoring this fact while processing will result in a corrupt and unreadable file. But how can you detect how long the header actually is?

Obviously there is a way to do this, but I searched and found little discussion about this. A LOT of audio projects out there assume 44 (or conversely, 46) depending on the authors own context.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You should be checking all of the header data to see what the actual sizes are. Broadcast Wave Format files will contain an even larger extension subchunk. WAV and AIFF files from Pro Tools have even more extension chunks that are undocumented as well as data after the audio. If you want to be sure where the sample data begins and ends you need to actually look for the data chunk ('data' for WAV files and 'SSND' for AIFF).

As a review, all WAV subchunks conform to the following format:

Subchunk Descriptor (4 bytes)
    Subchunk Size (4 byte integer, little endian)
    Subchunk Data (size is Subchunk Size)

This is very easy to process. All you need to do is read the descriptor, if it's not the one you are looking for, read the data size and skip ahead to the next. A simple Java routine to do that would look like this:

//
// Quick note for people who don't know Java well:
// 'in.read(...)' returns -1 when the stream reaches
// the end of the file, so 'if (in.read(...) < 0)'
// is checking for the end of file.
//
public static void printWaveDescriptors(File file)
        throws IOException {
    try (FileInputStream in = new FileInputStream(file)) {
        byte[] bytes = new byte[4];

        // Read first 4 bytes.
        // (Should be RIFF descriptor.)
        if (in.read(bytes) < 0) {
            return;
        }

        printDescriptor(bytes);

        // First subchunk will always be at byte 12.
        // (There is no other dependable constant.)
        in.skip(8);

        for (;;) {
            // Read each chunk descriptor.
            if (in.read(bytes) < 0) {
                break;
            }

            printDescriptor(bytes);

            // Read chunk length.
            if (in.read(bytes) < 0) {
                break;
            }

            // Skip the length of this chunk.
            // Next bytes should be another descriptor or EOF.
            int length = (
                  Byte.toUnsignedInt(bytes[0])
                | Byte.toUnsignedInt(bytes[1]) << 8
                | Byte.toUnsignedInt(bytes[2]) << 16
                | Byte.toUnsignedInt(bytes[3]) << 24
            );
            in.skip(Integer.toUnsignedLong(length));
        }

        System.out.println("End of file.");
    }
}

private static void printDescriptor(byte[] bytes)
        throws IOException {
    String desc = new String(bytes, "US-ASCII");
    System.out.println("Found '" + desc + "' descriptor.");
}

For example here is a random WAV file I had:

Found 'RIFF' descriptor.
Found 'bext' descriptor.
Found 'fmt ' descriptor.
Found 'minf' descriptor.
Found 'elm1' descriptor.
Found 'data' descriptor.
Found 'regn' descriptor.
Found 'ovwf' descriptor.
Found 'umid' descriptor.
End of file.

Notably, here both 'fmt ' and 'data' legitimately appear in between other chunks because Microsoft's RIFF specification says that subchunks can appear in any order. Even some major audio systems that I know of get this wrong and don't account for that.

So if you want to find a certain chunk, loop through the file checking each descriptor until you find the one you're looking for.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...