Prefetch and cache wav files?

Using the mimic3-server and the /api/tts endpoint, is there any way to generate wav files before they are needed and have them stored in the cache for future (like in the next hour) use?

I guess I’m looking for a way to call /api/tts to generate the wav, but not have it play until a later request accesses it from the cache.

That’s exactly what the Assistant (generally) does. Here you can see how OVOS does it in our Mimic3 plugins. The Mimic3 API returns a wav file, we write it, then the Assistant plays it. Separate operations.

That said, we are encouraging people to migrate to GitHub - rhasspy/piper: A fast, local neural text to speech system

Piper’s developer was Mimic3’s developer and Mycroft’s last lead dev. Mimic3 is abandonware. The OVOS community has some public Mimic3 instances up for the time being, so Neon and OVOS devices running those plugins will keep working, but nobody is working on Mimic itself.

Legacy mycroft-core had a feature that stored WAV-files generated by Mimic2 for spoken phrases - don’t know if it made it into Neon/OVOS, though.

I’m going to try modifying

so that if audioTarget is “none” and no-cache is false, it will not play the audio, but just cache it for later use.

Yep, it does.


mycroft had a hardcoded path for mimic2 only, in ovos we generalized that for every plugin, so there are 2 caches

  • runtime cache, every utterance is saved there so repeat speech is cached, deleted on reboot or if running out of disk space
  • permanent cache, those cache files are never deleted but are not auto generated either, classic core included mimic2 samples for default dialogs here (things that need to be spoken before selene was available, such as pairing and wifi setup)

we also introduced a config flag “persist_cache” that will save any new utterance to the permanent cache, this is meant to be enabled temporarily only for generating said cache

1 Like

Do you know the algorithm used to generate the names of the wav files? I’m creating the files (prefetching ahead of when they’re needed), but then I don’t know how to retrieve them based on the text and voice (an any other needed parameters).

its just a md5 hash

the path is built based on the TTS being used

the tts cache object can be found here