Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
436 views
in Technique[技术] by (71.8m points)

text to speech - best practice for specifying pronunciation for Android TTS engine?

In general, I'm very impressed with Android's default text to speech engine (i.e., com.svox.pico). As expected, it mispronounces some words (as do I) and it therefore occasionally needs some pronunciation guidance. So I'm wondering about best practices for phonetically spelling out those words that the pico TTS engine mispronounces.

For example, the correct pronunciation of the bird Chachalaca is CHAH-chah-LAH-kah. Here is what the TTS engine produces:

mTts.speak("Chachalaca", TextToSpeech.QUEUE_ADD, null); // output: chuh-KAL-uh-KUH
mTts.speak("CHAH-chah-LAH-kah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-EL-AY-AYCH-dash-kuh
mTts.speak("CHAHchahLAHkah", TextToSpeech.QUEUE_ADD, null); // output: CHA-chah-LAH-ka
mTts.speak("CHAH chah LOCKah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-LAH-kah

Here are my questions.

  • Is there a standard phonetic spelling recognized by the Android TTS engine?
  • If not, are there some general rules for making custom pronunciation spellings that will make the spellings more likely to be correct in future TTS engines/versions?
  • It appears that the Android TTS engine ignores text case. What is the best way to specify emphasis?

By the way, this is what the TTS engine writes to logcat:

V/TtsService( 294): TTS processing: CHAH chah LOCKah
V/TtsService( 294): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 294): Language already loaded (en-US == en-US)
I/SynthProxy( 294): setting speech rate to 100
I/SynthProxy( 294): setting pitch to 100

[UPDATE]

I tried passing an XML document to TextToSpeech.speak() as follows:

            String text = "<?xml version="1.0"?>" +
                "<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" " +
                    "xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" " +
                    "xsi:schemaLocation="http://www.w3.org/2001/10/synthesis " +
                        "http://www.w3.org/TR/speech-synthesis/synthesis.xsd" " +
                    "xml:lang="en-US">" +

                    "That is a big car! " +
                    "That <emphasis>is</emphasis> a big car! " +
                    "That is a <emphasis>big</emphasis> car! " +
                    "That is a huge bank account! " +
                    "That <emphasis level="strong">is</emphasis> a huge bank account! " +
                    "That is a <emphasis level="strong">huge</emphasis> bank account!" +
                "</speak>";
            mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

As Android Eve suggested, the TTS engine read only the XML body (i.e., the comments about the big car and the huge bank account). I didn't realize the TTS engine was capable of parsing XML documents. However, I did not hear any emphasis in the TTS output.

[UPDATE 2]

I simplified the question to whether or not Android TTS supports Speech Synthesis Markup Language here.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

JW answered my question at the tts-for-android group:

Hi Greg,

The Pico engine recognizes the tag with the XSAMPA alphabet.

There are no easy rules to derive a certain pronunciation from the orthograpy, but you can use intuitive spellings and trial and error. Capitalizing and hyphens will introduce more problems than solving them. Using different spellings and introducing extra word boundaries (spaces) can work.

The emphasis tag and the exclamation mark will not change the synthesis result. Use , , and commands instead.


Some examples of the proper syntax for specifying the pronunciation using the SSML phoneme tag are in these tests of TextToSpeech.

Even with these simple test SSML documents, there are warning messages posted to logcat about the SSML document not being well-formed. So I opened an issue about these seemingly incorrect logcat messages to the Android issue tracker.


The syntax for specifying an x-SAMPA sequence to SVOX pico is

String text = "<speak xml:lang="en-US"> <phoneme alphabet="xsampa" ph="d_ZIn"/>.</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null); 

Although more examples would be helpful, a good reference for x-SAMPA is at http://en.wikipedia.org/wiki/Xsampa If I compile a couple dozen examples, I'll post them to that Wikipedia page.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...