Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
636 views
in Technique[技术] by (71.8m points)

speech recognition - Real-time translation using Alibaba Cloud

Has anyone tried using Alibaba Cloud SDKs to create a real-time video call apps? As I ask the support they said video call service is not available for international Alibaba cloud but the Chinese one does. They also mentioned that I could try making using their SDKs. I'm asking them about the mentioned SDKs right now, what are those SDKs.

If there's anyone who has experience in the related field or technologies, please help me figure out whether is it worth making it using Alibaba cloud or go with other cloud service, since Alibaba cloud does not support multi-clouds.

It would be much appreciated thanks!!!

Related document from Alibaba based in China:

Speech to text from audio data in RTC [Windows]

Speech to text from audio data in RTC [Android]

Real-time speech recognition

Alibaba Cloud Machine Translation

question from:https://stackoverflow.com/questions/65930024/real-time-translation-using-alibaba-cloud

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The good news: there are many potential providers and options for cobbling something together.

The bad news: this problem is not easy, and the products from the top research and product teams are not very robust.

You can find the list of all self-serve machine translation API providers at modelfront.com/compare. Most of those same providers also offer speech recognition APIs, and speech recognition is also available on many devices.

But, depending on your scenario, you may be better off using a speech-to-speech approach (vs. glueing together multiple systems), and even a local model (vs. an external API), for three reasons: quality and latency, and the interaction of the two - which is that users don't want to wait for the full sentence, but also don't like translated text flickering as new words come.

If you search r/machinetranslation for speech OR simultaneous OR interpreting, you'll find:

  • a launch announcement for "interpreter mode" from Google Assistant

  • a Baidu announcement on a quality improvement

  • two articles from Mattia di Gangi at FBK

  • the flickering paper from Google (Re-translation versus Streaming for Simultaneous Translation)

  • the Translatron article and paper from Google

  • a landscape survey from Apple

  • the NeurST toolkit GitHub repo from ByteDance (TikTok)

There was a keynote from Baidu Research on this at WMT 2019, and recently a bit more on flickering from Google, but both focussed on their own products, not offerings for external developers.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...