Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
636 views
in Technique[技术] by (71.8m points)

ios - Gender detection of the speaker from wave data of the audio

I would like to add a gender detection capability to a news video translator app I'm working on, so that the app can switch between male and female voice according to the voice onscreen. I'm not expecting 100% accuracy. I used EZAudio to obtain waveform data of a time period of audio and used the average RMS value to set a threshold(cutOff) value between male and female. Initially cutOff = 3.3.

    - (void)setInitialVoiceGenderDetectionParameters:(NSArray *)arrayAudioDetails
    {
        float initialMaleAvg = ((ConvertedTextDetails *)[arrayAudioDetails firstObject]).audioAverageRMS;
        // The average RMS value of a time period of Audio, say 5 sec
        float initialMaleVector = initialMaleAvg * 80;
        // MaleVector is the parameter to change the threshold according to different news clippings
        cutOff = (initialMaleVector < 5.3) ? initialMaleVector : 5.3;
        cutOff = (initialMaleVector > 23) ? initialMaleVector/2 : 5.3;
    }

Initially adjustValue = -0.9 and tanCutOff = 0.45. These values 5.3, 23, cutOff, adjustValue and tanCutOff are obtained from rigorous testing. Also tan of values are used to magnify the difference in values.

    - (BOOL)checkGenderWithPeekRMS:(float)pRMS andAverageRMS:(float)aRMS
{
    //pRMS is the peak RMS value in the audio snippet and aRMS is the average RMS value
    BOOL male = NO;
    if(tan(pRMS) < tanCutOff)
    {
        if(pRMS/aRMS > cutOff)
        {
            cutOff = cutOff + adjustValue;
            NSLog(@"FEMALE....");
            male = NO;
        }
        else
        {
            NSLog(@"MALE....");
            male = YES;
            cutOff = cutOff - adjustValue;
        }
    }
    else
    {
        NSLog(@"FEMALE.");
        male = NO;
    }

    return male;
}

Usage of the adjustValue is to calibrate the threshold each time a news video is translated as each video has different noise levels. But I know this method is noob-ish. What can I do create a stable threshold? or How can I normalise each audio snippet?

Alternate or more efficient ways to determine gender from audio wave data is also welcome.

Edit: From Nikolay's suggestion I researched on gender recognition using CMU Sphinx. Can anybody suggest how can I extract MFCC features and feed into a GMM/SVM classifier using Open Ears (CMU Sphinx for iOS platform) ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Accurate gender identification can be implemented with GMM classifier of MFCC features. You can read about it here:

AGE AND GENDER RECOGNITION FOR TELEPHONE APPLICATIONS BASED ON GMM SUPERVECTORS AND SUPPORT VECTOR MACHINES

To the date I am not aware of open source implementation of this, though many components are available in open source speech recognition toolkits like CMUSphinx.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...