Yes, humans listen to smart speaker recordings. Can it be done right?

Originally published at:

On April 11, 2019, stories began making the rounds highlighting that Amazon employs thousands of people worldwide to transcribe users’ Alexa queries to help improve the Alexa software. The original Bloomberg article reports “Amazon … doesn’t explicitly say humans are listening to recordings of some conversations picked up by Alexa. ‘We use your requests to Alexa to train our speech recognition and natural language understanding systems,’ the company says in a list of frequently asked questions.”

To us at Mycroft, this wasn’t exactly groundbreaking news. Other companies might not make it easily known that the audio from your interactions is key to improving the software and that humans are still needed to evaluate the recordings. We want you to know that is absolutely the case, be it at Amazon, Google, Apple, or even Mycroft. At Mycroft, we give our Community a clear choice whether to have their data stored and used to improve the software or not. Read on to learn more.

Voice software needs voice data

Voice technology has come a long way, but isn’t past the point of needing humans in the loop. STT like Mozilla DeepSpeech is rapidly improving but still not totally accurate. NLP is able to do more and more every day, but doesn’t keep up with context, ambiguity, or multi-part requests like a human could. TTS engines like Mycroft’s Mimic2 sound more natural than ever but trip over words at times. These issues need human review to improve. It’s fair to presume any voice tech, from full assistants to the speech recognition on your phone, still use humans in the loop.

So if all assistants require some human intervention to improve, what makes Mycroft different? If you use Mycroft, by default you are not sharing your data and Mycroft is not storing it. On other platforms, users must dig around in the settings in order to exclude themselves from training sets or delete their recordings. These options are often buried and obscure.

But as said above, data and human review are needed to improve these technologies, Mycroft included. To bridge that gap, Mycroft offers users the choice to Opt-In to our datasets. Only once a user has provided explicit permission will their anonymized usage data be shared to Mycroft’s Open Dataset. Then, they can be used to improve our Wake Word spotting, STT, NLP, TTS, and overall software. If an Opted-In user decides to Opt-Out, all their audio files, logs, and other information are scrubbed from the datasets.

Mycroft takes voice assistant privacy seriously. Mycroft does not store users recordings unless given explicit permission.

We think giving our users the choice of whether to have their interactions used for Mycroft’s improvement builds trust in our brand and product. More than 15% of Mycroft’s registered users have raised their hands as contributors to our Open Datasets. Our tagging is done by the Community, and help other projects like Mozilla DeepSpeech.

Why data – and privacy – are important

There are caveats, though. Machine learning engines gain accuracy based on how much good data they are trained on. This is why companies like Google and Amazon currently have an upper hand in these technologies. They have massive datasets to train their models on. Additionally, machine learning engines perform especially well on new inputs similar to data they are trained on. As such, an Opted-In user should have their experience greatly improved while helping Mycroft improve overall. In the case of other assistants, a new user’s recordings and interactions are fed into the models, improving the experience for themselves from the moment they sign up.

Privacy and user agency is one of the biggest challenges in the tech world. At Mycroft, we lean towards giving users more choice and control. That’s why we only save and use data from those that choose to Opt-In, and are developing other options for privacy-conscious users, like the Personal Backend.

We should also note, it is possible for users to opt-out of Alexa using their recordings for these purposes, and to delete recordings individually. But it still seems, according to Bloomberg’s reporting, that Alexa would rather not explicitly tell their users what happens with their recordings. But, they’ll still legally cover that option in their Terms of Use.

No one will become the leader in this industry without taking some hits. We’ve said as much before. But putting privacy first and designing the product around the desires of our Community should help Mycroft along. If you agree, don’t forget to share the idea when talking about voice technology.

For another view on the topic, our friends at Voicebot AI put together a post and collected examples of both industry insider and consumer sentiment on the news.