speech recognition - Identifying segments when a person is speaking?

Question

Welcome To Ask or Share your Answers For Others

speech recognition - Identifying segments when a person is speaking?

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:51:20+0000

It's possible with the toolkit SHoUT: http://shout-toolkit.sourceforge.net/index.html

It's written in C++ and tested for Linux, but it should also run under Windows or OSX.

The toolkit was a by-product of my PhD research on automatic speech recognition (ASR). Using it for ASR itself is perhaps not that straightforward, but for Speech Activity Detection (SAD) and diarization (finding all speech of one specific person) it is quite easy to use. Here is an example:

Create a headerless pcm audio file of 16KHz, 16bits, little-endian, mono. I use ffmpeg to create the raw files: ffmpeg -i [INPUT_FILE] -vn -acodec pcm_s16le -ar 16000 -ac 1 -f s16le [RAW_FILE] Prefix the headerless data with little endian encoded file size (4 bytes). Be sure the file has .raw extension, as shout_cluster detects file type based on extension.
Perform speech/non-speech segmentation: ./shout_segment -a [RAW_FILE] -ams [SHOUT_SAD_MODEL] -mo [SAD_OUTPUT] The output file will provide you with segments in which someone is speaking (labeled with "SPEECH". Of course, because it is all done automatically, the system might make mistakes..), in which there is sound that is not speech ("SOUND"), or silence ("SILENCE").
Perform diarization: ./shout_cluster -a [RAW_FILE] -mo [DIARIZATION_OUTPUT] -mi [SAD_OUTPUT] Using the output of the shout_segment, it will try to determine how many speakers were active in the recording, label each speaker ("SPK01", "SPK02", etc) and then find all speech segments of each of the speakers.

I hope this will help!

Categories

speech recognition - Identifying segments when a person is speaking?

speech recognition - Identifying segments when a person is speaking?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags