Training Deep Speech: How you can help

Originally published at:

Last month we released DeepSpeech support for Mycroft. Many have now tried it and been, well, underwhelmed by the performance. Was this a colossal failure?

Welcome to the Evolution

Short answer: Nope, this isn’t a failure at all. DeepSpeech is behaving exactly as we expected.

The data that Mozilla has used to train DeepSpeech so far is largely from individuals reading text on their computer. A bit also came from speakers at conferences. In both cases, the person in the recordings is very careful to speak plainly and directly into a microphone. So DeepSpeech learned to understand speech, but only clear speech like it was trained on. No slurred words or yelling from across the room.

Like many interesting problems, the first pass isn’t perfect. But without it there would never be a second, third or fourth pass.

Machine Learning Loops

The exciting thing about this step is that we have created a machine learning loop. We took an imperfect dataset, trained on it to produce a model, and now we are using this model to create more data to place in a better dataset.

In the next few weeks, you will see a new tagger at Home for validating the voice data we have collected. This will create an invaluable real-world voice dataset to include in the training. The Mycroft Open Voice Dataset. Mozilla will be granted access to it to complete this machine learning loop.

Wash. Rinse. Repeat. Every time this cycle completes things will be a little bit better.

Let Your Voice be Heard!

The voices in the dataset are exclusively from those who have chosen to Opt-In. At Mycroft we will never use your data without your explicit permission. So far over 2,500 of you have chosen to join us in this effort, and we can’t thank you enough for your trust and contributions.

But there is a secret… by contributing to this dataset you are literally training the technology to recognize your voice. Eventually the tech will evolve to work for virtually everyone, but initially, it will be slightly biased to recognize the voice and pronunciation variants of those in the training data.

If you want to participate, go to Home and check Opt-In under your settings. Joining is easy, and changing your mind later is just as easy. Working together – Mycroft and Mozilla on the technology, you and the Mycroft community on the data – we are creating the foundation of an open AI for Everyone!


I would, but my kids are under 13, so legally I can’t…

I wonder if it would be safe to have a separate storage for my samples. Only I (or my wife) could tag them, but I believe so long as it is only used by the training and there are enough other kids in the sample set it should be okay.

Obviously this requires legal to double check, and some extra precautions on how data is saved but it seems like a useful solution to the problem of how do we understand kids if we can’t use their data

1 Like

If you’re so motivated and have the resources, you can train your own. I basically scripted a local record/parse to DS tool so I could have the kiddo do some training stuff. Not that they wanted, mind you, after about 20 sentences. Every little bit helps, tho?
If you want a good tutorial on training deepspeech, check out this one:

(you don’t have to use French, of course)

1 Like