A few days of hacking about and I haven’t managed to get a wake up from Mycroft. I confess I am trying to run it on an old Raspberry Pi B+ on wheezy, so probably my fault. I thought the following experiences might be useful though. I am not particularly skilled but perhaps this helps someone. I have managed to get it all to run with no errors but didn’t get past wake up. I did also try having a Debian VirtualBox on my mac (so I could learn for the Pi, but couldn’t get audio to work (not your problem)
USB Mic
The usual RPi issues with having a USB Mic and using the internal speaker Jack. There are many posts on this but I found that I had to do two things 1) modify /etc/modprobe.d/alsa-base.conf to set the index of the mic to be Zero e.g.
options snd-usb-audio index=0
options snd_bcm2835 index=1
but this didn’t work, another post led me to create ~/.asoundrc as in this post http://raspberrypi.stackexchange.com/questions/37177/best-way-to-setup-usb-mic-as-system-default-on-raspbian-jessie
Suggestion: Why not have device index in the config file for the STT and TTS, it looks as though it is in the code this would be much easier than people having to hack around with changing the default index etc.
Once I had that working and had verified it with aplay and arecord I ran ./start.sh audiotest which worked fine, but running mycroft would not get the wake up word recognised. I could not get the audio accuracy scrip to run and produce anything (so I need to go back to that. Perhaps it is my Brit accent.
Pocketsphinx returning None
The problem was finding out if pyaudio was actually capturing any sound. I hacked about with mic.py especially wait_until_wake_word() and established by looking at the energy variable that my mic was working. Messing about still more shows the no matter what I do transcribe() in local_recognizer.py returns None instead of some kind of object from pocket sphinx.
Personal opinion
I have kinda run out of energy for now. Mycroft looks kinda clever with the possible multi room type stuff, but does it justify being called an AI ? I would have thought that ML on STT will be great, but actually parsing instructions using regex once you have the text transcribed doesn’t look very AI to me (but I could be completely wrong on how it works. When you can get a bunch of free calls from people like AI.AI and recast.AI and when the length of verbal instructions is so short 30 seconds, I do hope I have missed something and you can correct me to point out the bits in mycroft I am missing. Having previously played around with things like NLTK tokenising text and then having hard coded rules to handle them isn’t a bad approach, but having a more intelligent approach as in you promo video looks great.
I completely understand why saying “AI” bugs you. It is such an overused term today, and it is also by its nature a moving target. For example if you were around 50 years ago you would have been amazed by a machine that could correctly multiply 10 digit numbers! Wow, that mechanical brain is brilliant!!!
Jump forward some years. Calculators are cheap – you can get one for only $100. No need to spaz, that is, like, totally mung. But have a computer that can beat most people is chess. Unbelievable! The AI in Wargames is real!!!
Jump forward even more years. Computer opponents exist in every children’s game, no big deal – it’s literal child’s play, right? But wow, I can talk to my phone and it can transcribe my voice almost perfectly and speak back to me sounding only slightly like a fairly cute robot (although still a robot). Still, amazing AI, right? Or is it already just expected?
What we here in Mycroft are doing today is laying a whole lot of groundwork. Current “AI” techniques require a lot of data – the “big” in “big data” – and we need to start with procedural mechanisms to capture that data. So we are walking right now, but getting ready to jog.
For example, the existing Skill framework – as you mentioned – is fairly rigid. You might have to define some regexs to allow you to convert “turn on the lights”, for example, to an Intent of type “LightControl” with a verb of “on”. But once we start collecting enough conversational data we can start to see that “flip on the light” or “its dark in here, brighten it up!” are often said (and fail to work) right before somebody says “turn on the light”. That is where the machine learning can kick in to start doing some adaptive intent parsing that categorizes all of them as the same thing.
Wow, that is amazing!!! That is certainly AI…at least until next week when it is completely expected.
Why Steve, honoured to have a response from the CTO! First up, totally agree with everything you say and I hope my previous post was not seen as critical. I really, really hope mycroft succeeds as a product people can buy and also as a set of projects. The Regex stuff is totally appropriate for the specific intents so no critique there at all. I am from a work perspective fascinated with ML, Text Mining etc. As my handle denotes, I am not trained therefore do not have the skills to dig into core ML. What I am interested in is computed and analysable meaning from text, pretty easy to do with a 140 char tweet, its when you get into web chat conversations, Messaging (e.g. FB Msgr), email etc.sentences become quite complex and long. I have played about with some of the .ai tools like ai.ai (now bought by google) and recast.ai which are really interesting in how they not only give you an intent but you can get quite a rich meta data packet back with classification of the sentence type, entities etc. If you couple that with text mining stuff one could also get detailed language element break down (for me subject verb object is very meaningful in simple phrases). From a business perspective bots are still very fixed function and simplistic I am hoping to see leaps and bounds in that field for automated or semi automated customer assistance.
You might want to check out some of the .ai’s for your early dev’s e.g. wrap a skill around recast.ai (only mentions that one because you get 15K api calls/month for free). I can imagine some kind of training conversation i.e. when something is not understood, you can tell mycroft which skill it belongs to.
From a mycroft perspective, don’t forget the market for the elderly or disabled you could do wonders with skills like emergency help, calling family members etc.
Anyway back to trying to get pocket sphinx to return something … Thanks for producing mycroft and for a noob like me it is highly educational.
when I tried to set options snd-usb-audio index=0. It always disappear the usb-mic when I check with cat /proc/asound/cards
Anw, I think it’s not because of your Brit accent since I’m very bad with English.
Whenever you can make your usb-mic in card 0. It should work.
Oh one thing, then after you make usb-mic in card 0. ./mycroft.sh start
If it’s not working. Check screen mycroft.voice and then press Ctrl+c when you see the line waiting for wake up word. It will work.
Hey, thanks. I have those settings and have checked my hardware with aplay and record and also run the test script for audio, so recording is working OK. I think it is something to do with pocketsphinx. Thanks for your kind post.
I am at same place. audio_test work fine, so mic and speakers work on my pi. but can’t get it to recognize wake word. tried smaller wake words, played with le-10… le-110 .
hacked around putting in debug.
It’s a shame to scrap mycroft since it seems real cool and flexiable/expandable, but when stuck on wake word, there is no other choice.
It would be nice if there was a debug tool, just for the wake word.
show me what mycroft is hearing… maybe adjust thresholds until the wake is recognized, then show me what to set…
such a nice project otherwise. very frustrated.
Jay, there is an issue that I’m working on right now that is causing issues with generic Raspberry Pi’s from starting up. I expect to push the fixes into the master of mycroft-core tonight and get a build pushed out in the next couple days. At that time I’m also going to post a generic Raspian Jessie Lite image with Mycroft preinstalled on it.
Sorry about hassle. Our focus recently was on preparing for the Mycroft Mark 1 unit release and this snuck in under the radar.
Good to hear Steve I’ve also been having this issue on a generic Raspberry Pi but did manage to get Mycroft working on a Ubuntu VM so it was doubly confusing not getting it to work on the Pi.
I’m struggling with the same as the others in this thread.
Everything seems to be working correctly but Mycroft does not pick up on the wake word no matter what I try.
This preinstalled Jessie image mentioned earlier is this available already? I would be really interested to see if that works and compare it to my current installation.
Any update on this? I’m hitting the same wall here with the Pi 3 Jessie Pixel. I’ve tried multiple USB and 3.5mm speakers and microphones with all of them failing. The USB microphone and speakers are defaulted both in the GUI and in /usr/share/alsa/alsa.conf and work outside of mycroft. The mycroft voice service works and starts but the audiotest fails because the rate is 16000KHz instead of the 44100KHz the configs are set to.
Any help here, even if it’s just a point in the right direction, would be greatly appreciated.
I tried a PS3 eye as the USB microphone and a 3.5mm speaker and that setup works. I also switched the rate back to 16000KHz from 441000KHz. Both the audiotest and wake word work now. I think this has to do with what USB microphone you use.
I have a beta image working pretty solidly now, but want to get it hooked in to our new backend once that is release this week before I declare it “1.0”. You can play with the beta if you like, but my lawyer tells me – IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED … LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE…
With help from steve.penrod, we found last night that this was the fix for my issue.
On the command line, enter “alsamixer”
Then hit F6, and you should be able to see your microphone.
Select your mic and lower the level. If you are presented with a message such as “No settings for your selection”. Then hit F5 and you should see a entry for CAPTURE. Mine was raised extremely high (89) and I lowered it to 50.
After that reboot your RPi. You should then see on the view_log, Mycroft recording your speech and an answer to your question in the log, followed by the vocal response from Mycroft.
I have also noticed that the wake word needs to be “Hey Mycroft” or else there is no response. Now that I have it working, I’m going to test out different wake words today to verify what is/is not allowed.
I’ve downloaded and installed version 0.8 on a generic Raspberry Pi 3 (I tried it on a Pi2 as well and got the same outcome). I plugged in my USB microphone and that’s working just fine (so the wake up word activates it to start listening, and if I’m close to the USB microphone, it does successfully capture the audio data and about 1/2 the time it successfully parses the data and I see in the log windows that it’s responding with an answer (I got it to tell the time, give the weather and a couple of other things).
One more thing, I did do an apt-get update and apt-get upgrade and there was about 100MB of updates applied to the 0.8 image.
However, there are some issues (from my perspective at least) as follows:
I can’t get the audio playback to work at all when it’s plugged into the PI2 or Pi3 (I’ve tried multiple speaker systems). I see that there might be some config settings I’ll need to tweak, but it would be helpful to have a definitive answer on that.
Should we really be using a multi-directional microphone to make it work more reliably when I’m not near the USB microphone?
Even though I configured “metric” settings when I created my account in the backend, it ignore that setting when outputting the results on screen (in the logs).
It thinks, I’m in Kansas USA and gives weather responses for that area. I’m actually in Auckland, New Zealand (very far away from Kansas). Do we need to change the location settings in the image that has been provided or is there an XML or .conf file we need to tweak to change the location?
Overall, I’m quite intrigued by this technology and look forward to what’s coming next with it.
I’d be interested in contributing financially to it. But would like to understand the product roadmap a little better.
I don’t think this is Mycroft related, you might check alsamixer to see if the correct output is selected and turned up?
A multi-directional mic might indeed help. On the Mark 1 unit we are using a single simple condenser mic with an auto-gain circuit to adjust volume levels, which seems to pick up the room well.
I’m pretty sure the metric setting won’t effect your logging, actually I’m not 100% sure what that setting does.
Thanks for your prompt response and the answers to my questions, it’s very much appreciated. One further thought on this, it seems that a number of Raspberry Pi users are having similar issues and perhaps there’s some work that could be done to the autodetect hardware options to improve things on the initial boot-up process (I know the potential array of hardware that could be connected is very large) but not having the correct device ID is a fairly fundamental issue that means only reasonably skillful Linux admins will be able to solve - thereby limiting the people that might use it to start with. - your thoughts?
I’ll do the config tweaks you are suggesting and expect that I’ll be good to go!
I might look into a multi-direction microphone as well though to see if it’s better (it could just be that my NZ accent is a little tough for the system to decode).
I have some thoughts about how your technology could be used (for a particular purpose) and would love to speak more about it and explore those ideas.