The Mycroft Roadmap and Open Source

ryanleesipes · December 2, 2015, 6:20am

Hey guys, I think there are a lot of misconceptions about whether various parts of Mycroft are going to be open source as well as what we are planning to tackle. It is my intention to clear this up in this late night post and throughout the coming weeks, as I think I’ve done a poor job communicating what is going on.

There are various challenges to tackling what we are trying to do with the Mycroft project. Some of the technology we already have, such as the Adapt Intent Parser, which allows Mycroft to determine what you mean. Or the ability to recognize when you address the unit with, “Hey Mycroft” or another wakeword which Mycroft processes locally and makes a determination that you are talking to it.

The harder challenges are: accurate speech-to-text, and quality text-to-speech voices. Initially, before the Kickstarter we recognized that giant companies with millions of dollars in resources to tackle these tasks were having a hard time doing it, and therefore we opted to use APIs like Wit.ai and Google TTS & STT in order to make this work.

As time has gone on we’ve decided to try and tackle these technologies and pursue them in the open as open source initiatives. This fits with our ideals and goals, and with the community’s help appear to be achievable thanks to the expertise @seanfitz and @jdorleans bring to the team.

We are planning to train an open speech-to-text model via our OpenSTT project (sign up for the newsletter over at http://OpenSTT.org ). We are currently developing some core tools to allow the community to help us train the model and looking for groups to partner with us on this effort. Expect to see an early implementation of this over the course of the month (estimate).

Another project that we expect to be out in the open soon is the Adapt Intent Parser. Adapt is currently being prepared to be open sourced (documentation and the like). This is something that will be able to be used in many different projects and has uses outside of those involving audible speech (as it can parse text in order to determine intent, as the name suggests).

On the TTS front, currently we are exploring how to create voices for the open source MaryTTS. We would like to use this, but we want to ensure whatever solution we use on this front can produce high quality, understandable voices and enable community members to make custom ones as well. Having said that we are currently making our own voice using MaryTTS and documenting the process. If we are successful and pleased with the results we will be moving forward using MaryTTS to solve this problem and contributing back improvements upstream.

Finally, the core Mycroft platform ties all these technologies together. If you hear us talk about using other services to achieve STT or TTS it is because as we develop these technologies we need something to use during development of core. Because it ties together all the pieces Mycroft core is slated for release to the community, completely open source, in April. We would obviously love to get it out sooner, but we are trying to ensure high quality code and documentation so that it is easy and enjoyable for those who contribute.

Hopefully this long post answers the questions surrounding the project, its various parts, and timelines. We’ve evolved since beginning the project, as we’ve identified some methods for completely our goal of an open source AI that can be used by everyone. But there are stages to completing that ambitious goal, and I want to ensure fans and backers have a clear idea why we are not able to dump all the source on the web on day one. This is not an easy task to tackle and we will need the help of the community (i.e. you) in order to achieve it.

Thanks for reading this and feel free to post questions and feedback below!