International Open Data Day: Join us to celebrate

Originally published at:

You may not have had a public holiday, or a big house party, but we think this is still one worth celebrating!


What is open data?

The Open Knowledge Foundation defines open data as that which “can be freely used, modified, and shared by anyone for any purpose”.

This can include broad information from websites, structured data sets such as those published by government bodies, or data sets intentionally compiled by communities like

Just like “open source” how you can use that data depends on the license that is applied by its creator.


Why is that important for Mycroft?

Many of the components that make up Mycroft require large data sets. Large established companies collect this from their users, often without their knowledge via unintelligible End User License Agreements. They also keep this data totally inaccessible to researchers outside of their organisation, giving them an incredible advantage in the machine learning world.

Open data sets, on the other hand, make it possible for new companies to innovate in this space, and to do so knowing they are using data that is explicitly in the public domain. Thanks to a range of open speech data sets (such as LJ, Blizzard, or CMU), anyone can train their own voices for use in speech synthesis or Text-To-Speech (TTS) systems. There are similar data sets available for use in speech recognition, or Speech-To-Text (STT) systems.

Mycroft’s Skills also use a whole range of open data each time you make a request:

“Hey Mycroft, what’s the weather forecast? has your back.

“Hey Mycroft, how do I make a margarita?”
Let’s check

“Hey Mycroft, what is a recurrent neural network?”
Here’s a very short summary courtesy of Wikipedia.

Every day human beings all around the world are creating, improving, and making open data more available. I am deeply appreciative for these contributions to collective human knowledge. It is impossible to predict all the ways that open data will be used. The only way to know is to put it out there and see what happens.


How does Mycroft give back?

Mycroft has benefited greatly from all of this open data, and where appropriate we also like to pay it forward. Our code is open source, and we work closely with other organisations like Mozilla to create the next generation of open data sets.

The benefits of openness must however be balanced with the need for privacy and user choice. Our strong stance on this principle is why our company exists, and one of the key reasons our users trust Mycroft.

That is why we use an “Opt-In” data collection model. Unless you explicitly give Mycroft permission to use your data, we never keep it around.


How can I help?

Join the Mycroft Open Dataset

The easiest way to help, if you’re happy to share some data for the common good, is to Opt-In to the Open Dataset through your personal settings page at Opting-in grants Mycroft the permission to retain data, such as samples of what you say to your device. These samples are then de-identified before being added to a data set. Should you choose to Opt-Out at a later date, any data originating from your account will be deleted.

Translate vocabulary

The Mycroft community have contributed over 80,000 translations so far, which amounts to an average of 3600 each week. These translations are available under the Apache 2.0 license, just like our code. If you are bilingual, please help us bring voice-interactive technology to more corners of the globe by joining our translate platform.

Donate your voice, or validate the voices of others

Common Voice is Mozilla’s speech recognition initiative. It aims to provide a speech-to-text engine that is open and accessible to everyone. Providing recordings of your voice adds to the diversity of their data, whilst validating recordings of others improves the accuracy of their data. These two contributions together enable better training and ultimately a solution that works better for us all. This of course aligns closely with our goals at Mycroft, so we work closely with Mozilla on their voice projects. Visit and see how you can help out anytime you have a few spare minutes.

Beyond our open data, there are other ways to contribute too.

Join the community in our Forums or Chat and help guide the future direction of Mycroft AI.