Help to understand response time


since several weeks I try to figure out how the response time to an decoded answer can be improved.
I use the CLI to determine how fast Mycroft detects the wakeword, how fast it decodes my voiceto text (STT) and how fast I get an anwser.

The first two I have optimized and its quit snppy.
But the last one I don’t understand. I can see the Answer in the CLI and still need to wait 12 Seconds till I hear the same answer over my speakers.

Now my question: When I see the yellow answer in de CLI is it already processed? So has the TTS already happend or will it start to send the answer text to the server wait till it gets the mp3 back and the plays it?


3rd step should be split into two steps, how fast you get an answer and how fast it speaks the answer. Sounds like the answer piece is pretty quick, also my observation, but the TTS piece is running slower than you’re expecting.

Which TTS engine are you using?

I use the standard TTS. How can I see when the “mp3” from the server arrives back to Mycroft and when it gets played?

EDIT: Do I need to set the log level to DEBUG?

Tail the audio log, should give a reasonably accurate approximation of when events occur.

thx a lot that help tremendous!
I now figured out the the playback did take around 10 sec to start.
I now changed the “play_mp3_cmdline” setting to mplayer and now I git it down to about 2-3 seconds.

Have a look at your pulseaudio settings. It is probably resampling.

Really? Hmm where are does settings? As far as I know I have not setup any resampling maybe I should but I don’t know to what bitrate etc.

Arch linux wiki to the rescue (as always);

I am not saying, this is your problem, but as the response of the TTS is al fairly OK and the MP3 is there, just takes seconds to actually play it. PulseAudio settings are most likely the place to look…

thx and no it’s not pulseaudio itself it is the mplayer that takes around 2 secs to stat up till it starts playing the audio file.
I’m still looking for another player that can stream over pulse but haven’t found one that is faster. Only my squeezeboxserver plays mp3 etc. almost instant and has only the delay of the pulseaudio, so around 40ms.

Do other people have similar experience with pulse or am I the only one?

This is my mycrof.conf

Default mpg123 works perfectly.

Ok thx and what do you mean by perfect? How fast du you hear the audio when you see the processed answer in the CLI (yellow one).
And you use pulse over tcp or rtp?

To be honest, never checked through the logs. But basically straight after the thinking visual is stopped.

For instance. Don’t look at the weather request as that is always a bit delayed. Have a look at the date and time request after that for instance.

But the Audio is directly played through the HDMI to the TV?

Yes indeed. You have something different?

I did a lot of linux/sound architecture research over the year aiming to make mycroftos feel snappy on a user use level. I think/hope to help you out with all the stuff I learned.

yes I use a mycroft instance as an LXC Container on Proxmox. In the living room there is a RPi connected to a soundbar over SPDIF acting as pulseaudio sink. Other containers like kodi or squeezeboxserver do also use the pulseaudio sink in the living room.

So therefore wav runs over paplay and mp3 over mplayer. The latency given by pulseaudio is only 40ms however as already described mplayer an paplay takes some time to launch ~2 seconds and therefore generate an necessary delay…

"play_wav_cmdline": "paplay -d alsa_output.platform-soc_sound.iec958-stereo %1"
"play_mp3_cmdline": "mplayer -ao pulse::alsa_output.platform-soc_sound.iec958-stereo %1"

Ah, right! Now I am connecting the dots. It’s you from that other thread🤦‍♂️

Ok, reading back a bit. Will be back when I have another idea (or not)

Oh boy I found the problem.
Let me explain. Currently I’m developing a new skill and since I can’t always talk to Mycroft I use the CLI to enter my phrases that get processed.
I use the MobaXterm SSH Client to connect to the Mycroft instance and as default there is an X11 server activated once you connect to the system. Therefore when my mplayer plays the wav or mp3 from mycroft it first treys to open a window through the X11 and play the audio there till it detects that this is currently not possible (since it is not a video rather an audio file that does not need window to bee shown) and starts playing it.
This delay is around 2 secs. Yesterday I used with my phone to enter the CLI of mycroft and it immediately played the mp3 or wav file generated by mycroft. This helped me to narrow down the problem and now with deactivated X11 server it runs as quick as possible.


Glad you found it, because that sounds like one of those you loose your hair about en never find the solution😎

Can you try changing the mplayer command to something like “mplayer -novideo”?

no did not help. I need to deactivate this

1 Like