DeepSpeech 0.6+

jp_ipvideo · February 10, 2020, 8:15pm

Does mycroft use DeepSpeach 0.6 or later?
If not when can we expect it to.
DeepSpeech 0.6 and beyond introduces the use of tensorflow lite which makes it very viable on ARM, mobile, embedded devices.

JarbasAl · February 10, 2020, 8:50pm

accuracy is not good, not yet ready for prime time at all

But it does work basically real time, so it’s only a matter of time until mozilla releases a better model the issue now is the data available for training, deepspeech itself is getting really good!

gez-mycroft · February 11, 2020, 5:33am

What he said ^^

We are keen to switch over to DeepSpeech by default, we just haven’t got the accuracy high enough yet.

If you are trying out DeepSpeech yourself at home, you can point Mycroft to your local server for STT. All the details are here: https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customizations/stt-engine#mozilla-deepspeech

Stephen_O_Sullivan · February 12, 2020, 7:05pm

@gez-mycroft

Thanks for the info, this is good to know. Would the accuracy of DeepSpeech be good enough for keyword/keyphrase detection? What I would like to investigate is, could a subset of frequently used commands (i.e. volume up, volume down, play music, stop play back) be trained, and detected locally, with more complex recognition being sent to the cloud for number crunching.

Is it worth my time going down the DeepSpeech route, or would I be better off waiting until the accuracy of DeepSpeech improves overall?

Regards,

Stephen

gez-mycroft · February 13, 2020, 11:11am

Disclaimer: I am not a machine learning engineer. Others should feel free to chip in if you disagree.

My understanding is that your best bet would be to tune the publicly available model with your own data. This gets it to a usable accuracy for you as an individual, particularly if you are focusing on a smaller set of phrases.

The challenge with music playback will be more unique terms like song / artist / album names.

In terms of sending more complex recognition to the cloud, the question becomes how do you decide what goes to DeepSpeech and what is sent to the cloud?