burns
February 22, 2024, 3:28pm
41
Whoa! This is pretty awful. I first said “what time is it” and that came out ok. Then I said “set daytime mode” slowly and distinctly. It should have been caught by the Hubitat skill, but it never got transcribed well:
2024-02-22 10:25:10.316 - voice - ovos_dinkum_listener.voice_loop.hotwords:found:261 - DEBUG - Detected wake_word: hey_neon
2024-02-22 10:25:10.322 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:434 - DEBUG - Wake word detected=hey_neon
2024-02-22 10:25:10.325 - voice - ovos_dinkum_listener.service:_hotword_audio:542 - DEBUG - Handling listen sound: snd/start_listening.wav
2024-02-22 10:25:10.328 - voice - ovos_dinkum_listener.service:_hotword_audio:561 - DEBUG - Emitting hotword event: recognizer_loop:wakeword
2024-02-22 10:25:10.330 - voice - ovos_dinkum_listener.service:_record_begin:433 - DEBUG - Record begin
2024-02-22 10:25:12.752 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:222 - INFO - Wakeword detected
2024-02-22 10:25:12.755 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:246 - DEBUG - waiting for speech
2024-02-22 10:25:12.800 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:246 - DEBUG - waiting for speech
2024-02-22 10:25:12.811 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:246 - DEBUG - waiting for speech
2024-02-22 10:25:12.838 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:12.857 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:12.893 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:12.909 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:12.936 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:12.946 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:12.970 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:12.980 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 10:25:13.012 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:252 - INFO - speech finished
2024-02-22 10:25:13.270 - voice - neon_stt_plugin_google_cloud_streaming:handle_audio_stream:128 - DEBUG - alternatives {
transcript: "in"
confidence: 0.279931456
}
alternatives {
transcript: "Inn"
confidence: 0.279931456
}
alternatives {
transcript: "here"
confidence: 0.140002295
}
is_final: true
result_end_time {
seconds: 1
nanos: 260000000
}
language_code: "en-us"
2024-02-22 10:25:13.291 - voice - neon_stt_plugin_google_cloud_streaming:handle_audio_stream:134 - DEBUG - ['in', 'Inn', 'here']
2024-02-22 10:25:13.294 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:668 - DEBUG - transformers metadata: {'client_name': 'ovos_dinkum_listener', 'source': 'audio', 'destination': ['skills'], 'timing': {'transform_audio': 0.0002658367156982422}, 'transcription': 'in'}
2024-02-22 10:25:13.296 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:669 - INFO - transcribed: in
2024-02-22 10:25:13.300 - voice - ovos_dinkum_listener.service:_stt_text:573 - DEBUG - Record end
2024-02-22 10:25:13.306 - voice - ovos_dinkum_listener.service:_stt_text:580 - DEBUG - STT: in
2024-02-22 10:25:13.313 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:698 - DEBUG - reset VAD
Of course give that it came out with a single word, we can now see why it did not respond. But wow, it’s not even close!
burns
February 22, 2024, 4:12pm
42
Here is another one. Again, I was close to the MkII and spoke clearly and slowly saying “Check for updates”. The result in voice.log:
2024-02-22 11:10:11.553 - voice - ovos_dinkum_listener.voice_loop.hotwords:found:261 - DEBUG - Detected wake_word: hey_neon
2024-02-22 11:10:11.558 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:434 - DEBUG - Wake word detected=hey_neon
2024-02-22 11:10:11.562 - voice - ovos_dinkum_listener.service:_hotword_audio:542 - DEBUG - Handling listen sound: snd/start_listening.wav
2024-02-22 11:10:11.572 - voice - ovos_dinkum_listener.service:_hotword_audio:561 - DEBUG - Emitting hotword event: recognizer_loop:wakeword
2024-02-22 11:10:11.576 - voice - ovos_dinkum_listener.service:_record_begin:433 - DEBUG - Record begin
2024-02-22 11:10:14.832 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:222 - INFO - Wakeword detected
2024-02-22 11:10:14.833 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:246 - DEBUG - waiting for speech
2024-02-22 11:10:14.842 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:246 - DEBUG - waiting for speech
2024-02-22 11:10:14.848 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:246 - DEBUG - waiting for speech
2024-02-22 11:10:14.855 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.864 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.870 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.878 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.885 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.890 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.895 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.907 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:249 - DEBUG - recording speech
2024-02-22 11:10:14.914 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:252 - INFO - speech finished
2024-02-22 11:10:14.919 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_get_tx:635 - INFO - Attempting fallback STT plugin
2024-02-22 11:10:15.635 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:668 - DEBUG - transformers metadata: {'client_name': 'ovos_dinkum_listener', 'source': 'audio', 'destination': ['skills'], 'timing': {'transform_audio': 3.2901763916015625e-05}, 'transcription': 'the union'}
2024-02-22 11:10:15.643 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:669 - INFO - transcribed: the union
2024-02-22 11:10:15.646 - voice - ovos_dinkum_listener.service:_stt_text:573 - DEBUG - Record end
2024-02-22 11:10:15.650 - voice - ovos_dinkum_listener.service:_stt_text:580 - DEBUG - STT: the union
2024-02-22 11:10:15.657 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:698 - DEBUG - reset VAD
burns
February 22, 2024, 4:49pm
43
Is it possible that the volume on the MkII microphone has to be higher? When it fails, it often sounds faint or even skips syllables on playback.
I don’t think there’s a way to change the microphone volume, but you could verify they’re set to 100% with pactl list sources | grep Volume
:
(venv) neon@neon:~$ pactl list sources | grep Volume
Volume: front-left: 65536 / 100% / 0.00 dB, front-right: 65536 / 100% / 0.00 dB
Base Volume: 65536 / 100% / 0.00 dB
Volume: front-left: 65536 / 100% / 0.00 dB, front-right: 65536 / 100% / 0.00 dB
Base Volume: 65536 / 100% / 0.00 dB
Volume: front-left: 65536 / 100% / 0.00 dB, front-right: 65536 / 100% / 0.00 dB
Base Volume: 65536 / 100% / 0.00 dB
Volume: front-left: 65536 / 100% / 0.00 dB, front-right: 65536 / 100% / 0.00 dB
Base Volume: 65536 / 100% / 0.00 dB
Volume: mono: 65536 / 100% / 0.00 dB
Base Volume: 65536 / 100% / 0.00 dB
Volume: mono: 65536 / 100% / 0.00 dB
Base Volume: 65536 / 100% / 0.00 dB
burns
February 22, 2024, 5:53pm
45
You’re right. Mine is exactly the same as yours.
How “slowly and distinctly” are you speaking? What immediately comes to mind for me is that because of how STT works, it isn’t like a human where going much much slower and enunciating with emphasis is helpful. There is a “sweet spot” where you are speaking normally but clearly. In my experience and understanding it’s more like a bell curve - in the center is the most similar to the training data, and while speaking very quickly and slurring words together is hard for the system to understand on one end, too slow and with unusually enunciated words is also difficult for the system to transcribe accurately. Speaking much much slower can even cause it to think you’ve ended your sentence before your words are finished.
The other error I’ve experienced is that if I wait significantly past the listening “boop” sound, it will time out without recording a response from me. The best right now in general usage is to say “Hey Neon / Mycroft what time is it?” without any pause between the wakeword and the command. It is set to go back and catch all the speech once it recognizes that you said a wakeword, so there is no need to wait for the “boop” except when Neon itself is prompting you with a query.
Hoping this helps.
Clary