[Resolved] Custom wake-word model and constant activation

sparkyvision · May 11, 2021, 1:59pm

Good morning.

I am having just the damndest time getting my custom wake word model to activate properly. Join me, as we go on a journey of frustration.

Modeling machine: MacBook Pro running Ubuntu 18.04 through VirtualBox
Mycroft: Picroft RP4 with a working USB mic

I’ve been working very hard on creating a custom wake word for a while now. It’s “computer”. The model in the user repository doesn’t work very well, so I thought I’d make my own. I’ve recorded ~300 samples of the wake word, and added something in the neighborhood of ~51,000 not-wake-words to the model. I’ve been retraining from scratch every time I add new data, as per Mr. Eltocino’s suggestions. I’ve also spoken extensively with him about some of these issues, and he’s been incredibly helpful. But this precise issue is just baffling me.

The model, once created, was excellent. It activates perfectly on my modeling machine using precise-listen, doesn’t false activate, etc. Testing showed that it was as close to perfect as I could get, at least without live deployment when it would almost certainly need a few tweaks. However, once the .pb was generated, and uploaded to the Picroft, it wouldn’t activate, ever. Not once. It just sits there, like a bump on a log, stubbornly refusing to listen to the wake word. If manually activated, it would clearly hear and nearly perfectly transcribe my speech, so I know the USB mic is working, and I know it’s capturing correctly. This issue has been referenced before, without resolution. After a LOT of digging and talking to Mr. Eltocino along with this support post, I believed that I had figured out that I needed to be using precise 0.3.0 on the Pi. So, I downloaded it, erased the existing folder and uploaded the new precise-engine, and rebooted. (I also get the error about .params, but I’ve been told multiple times that it’s an error that doesn’t mean anything, so I’m ignoring it for now.)

So, this definitely solves the issue of not activating. Because now, it activates constantly. In a silent (and by silent I mean, maybe there’s a fridge running in the other room, but that’s it) room, it will activate every five seconds, perhaps less. Again, when it’s recording, it understands my speech perfectly.

I have messed with just about every setting possible. I’ve tried messing around with sensitivity and trigger_level, to no avail. I’ve turned on wake word saving, let it run in a room with the TV going and my toddler babbling, and uploaded the 300-some wake-word activations it saved to /tmp, and then re-modeled, in case it’s a case of the background noise. It’s definitely not the background noise. At this point, I’m like 90% sure this isn’t a model issue. There’s a disconnect between how precise evaluates the data, and how Mycroft is calling precise in the background. Something is broken somewhere.

There is one (possibly?) relevant error that I’m seeing in the startup logs in the CLI:

 16:41:45.968 | INFO     |  2762 | mycroft.client.speech.listener:create_wake_word_recognizer:328 | Creating wake word engine
 16:41:45.978 | INFO     |  2762 | mycroft.client.speech.listener:create_wake_word_recognizer:351 | Using hotword entry for computer
~~~~0 | WARNING  |  2762 | mycroft.client.speech.listener:create_wake_word_recognizer:353 | Phonemes are missing falling back to listeners configuration
~~~~3 | WARNING  |  2762 | mycroft.client.speech.listener:create_wake_word_recognizer:357 | Threshold is missing falling back to listeners configuration
 16:41:45.988 | INFO     |  2762 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "computer" wake word via precise
 16:41:47.119 | ERROR    |  2762 | mycroft.client.speech.hotword_factory:initialize:492 | Could not create hotword. Falling back to default.
Traceback (most recent call last):
  File "/home/pi/mycroft-core/mycroft/client/speech/hotword_factory.py", line 480, in initialize
    instance = clazz(hotword, config, lang=lang)
  File "/home/pi/mycroft-core/mycroft/client/speech/hotword_factory.py", line 228, in __init__
    self.runner.start()
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/precise_runner/runner.py", line 159, in start
    self.engine.start()
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/precise_runner/runner.py", line 53, in start
    self.proc = Popen(self.exe_args, stdin=PIPE, stdout=PIPE)
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/home/pi/.mycroft/precise/precise-engine/precise-engine'
 16:41:47.125 | INFO     |  2762 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "computer" wake word via pocketsphinx
 16:41:47.230 | INFO     |  2762 | mycroft.client.speech.listener:create_wakeup_recognizer:365 | creating stand up word engine
 16:41:47.232 | INFO     |  2762 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "wake up" wake word via pocketsphinx
 16:41:47.312 | INFO     |  2728 | mycroft.skills.msm_wrapper:create_msm:111 | Releasing MSM instantiation lock.
 16:41:47.314 | INFO     |  2728 | mycroft.skills.skill_updater:_log_next_download_time:265 | Next scheduled skill update: 2021-05-11 17:40:17.920100
 16:41:47.317 | INFO     |  2728 | mycroft.skills.skill_loader:load:185 | ATTEMPTING TO LOAD SKILL: mycroft-pairing.mycroftai
 16:41:47.339 | INFO     |  2762 | __main__:on_ready:179 | Speech client is ready.

I’m happy to upload any other logs, anything anybody wants. My entire model sample set, whatever. This problem is confounding me, and I think I have enough high-quality data in my model sample set that it’s not that.

Any help would be sincerely and truly appreciated. I just don’t know where to look in the logs or debug information to figure out what’s going on. Once I have this figured out, I will be extremely happy to write a detailed step-by-step instructional thing for anyone else who wants to train their own model.

[edit 1] spelling, missing words, minor clarifications]

[edit 2] To rule out a microphone problem, I went back and listened to the data it saved from the saved wake-word activations. They’re just tiny snippets of room noise. No static, pops, or clicks or any other odd audio artifacts.

[edit 3] Logs. I swear, I’m trying to provide anything helpful.

[edit 4] Here are my model files in case anybody wants to give it a whirl.

j1nx · May 11, 2021, 6:03pm

That log line tells you, you are NOT using your precise model (probably because of the error) but instead pocketsphinx (which explains all the false activations).

Can’t really help you debugging it, but I would start with fixing the permission error. Follow the path, directory by directory to see where it needs some chown changes.

sparkyvision · May 11, 2021, 6:08pm

I noticed that as I was going along too. Do you know what user needs to own that? Or would chmod 775 be acceptable? It looks like the user “pi” owns everything already…

sparkyvision · May 11, 2021, 6:31pm

So, I had to chmod 777 to get that error to go away. Now that I’ve done that, the output looks like this:

 19:21:49.116 | INFO     |  8472 | mycroft.client.speech.listener:create_wake_word_recognizer:328 | Creating wake word engine
 19:21:49.121 | INFO     |  8472 | mycroft.client.speech.listener:create_wake_word_recognizer:351 | Using hotword entry for computer
~~~7 | WARNING  |  8472 | mycroft.client.speech.listener:create_wake_word_recognizer:353 | Phonemes are missing falling back to listeners configuration
~~~9 | WARNING  |  8472 | mycroft.client.speech.listener:create_wake_word_recognizer:357 | Threshold is missing falling back to listeners configuration
 19:21:49.134 | INFO     |  8472 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "computer" wake word via precise
 19:21:49.241 | INFO     |  8399 | __main__:_get_pairing_status:90 | Device is paired
 19:21:49.244 | INFO     |  8399 | __main__:_update_system_clock:102 | Updating the system clock via NTP...
 19:21:50.324 | INFO     |  8472 | mycroft.client.speech.listener:create_wakeup_recognizer:365 | creating stand up word engine
 19:21:50.326 | INFO     |  8472 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "wake up" wake word via pocketsphinx
 19:21:50.434 | INFO     |  8472 | __main__:on_ready:179 | Speech client is ready.

I’m not sure if this means it’s loading. However, now that I’ve solved that file permissions error, we’re back to the wake word never being acknowledged.

sparkyvision · May 11, 2021, 6:53pm

Okay…so. It seems that it might have been solved. precise-engine needs have full read / write / modify permissions. Or, it might need something more restrictive, but I don’t know what that is. Anyway.

Once the permissions were set, it seems that I had settings for sensitivity and trigger_level set too high (I was trying to see if setting them higher would cut down on the false activations.) and once those were adjusted, things are starting to work a bit better.

The activation is a quite a bit slower than “Hey Mycroft” (less than a second of lag, but noticeably laggier) but I might be able to mess around with things to figure it out. So, I suppose we can mark this as solved, and I’ll get started on a write-up for anyone who comes after. Unless someone has any tips about the minor lagginess.

[edit] Trigger_level and sensitivity seems to have made it less laggy, so I’ll keep adjusting those.

j1nx · May 11, 2021, 9:04pm

If you debug the listener delay, make sure you use the mycroft-cli-client and watch de log output to see when the wake word is recognized and not the listener sound as there might be a lag to tweak in playing that wav file as well.

baconator · May 11, 2021, 9:11pm

I swapped the start listening sound for a silent, .05s long wav file instead. Has worked well for me.

MoffKalast · May 11, 2021, 9:51pm

Oh hey does that allow you to basically have it listen to a full sentence like “hey mycroft what’s the time” and have it actually respond?

That’s been a major gripe of mine for a while, since you can’t speak to it normally and have to wait like a caveman. I can’t see why it wouldn’t be doable by just caching the last minute (or so) of recordings and then just sampling the recorded data directly after the wake word has been recognised, but this workaround could work too I suppose.

baconator · May 11, 2021, 10:05pm

it’s more like a good comma-length pause? This is also on a decent x86 cpu machine.

sparkyvision · May 12, 2021, 12:58am

Because I feel like making a new topic is gonna be silly, I’ll ask for anyone who sees: acknowledge.mp3 isn’t playing when Mycroft understands a command. The relevant lines in the configs point to the right files, and I can manually play the mp3 with mpg123 from a command line. Anyone else seeing this behavior?