Hey @Autonomouse, as I outlined in this post, getting something that “recognized speech” isn’t really that difficult. The main speech recognition loop uses pocketsphinx to determine whether or not an utterance contains a mycroft “wake phrase”, and only then sends the audio off device for full ASR. As for accuracy, that’s a different story. To be extensible (in terms of number of skills), we really need a dictation recognizer. There are a few claims that pocketsphinx can do this, as well as Sphinx-4, but I haven’t personally had any success with them. We realistically need +>= 85% (magic number, synonym for high) accuracy to make a usable system.
We’ve opened up on some of the tools we’re using (specifically the SpeechRecognition client library). If you wanted to poke around with some of the available tools, we’ll gladly consume your successes
I would start here:
http://www.speech.cs.cmu.edu/sphinx/dictator/