Prefetch and cache wav files?

webacad · October 1, 2023, 3:16am

Using the mimic3-server and the /api/tts endpoint, is there any way to generate wav files before they are needed and have them stored in the cache for future (like in the next hour) use?

I guess I’m looking for a way to call /api/tts to generate the wav, but not have it play until a later request accesses it from the cache.

ChanceNCounter · October 1, 2023, 5:30am

That’s exactly what the Assistant (generally) does. Here you can see how OVOS does it in our Mimic3 plugins. The Mimic3 API returns a wav file, we write it, then the Assistant plays it. Separate operations.

That said, we are encouraging people to migrate to GitHub - rhasspy/piper: A fast, local neural text to speech system

Piper’s developer was Mimic3’s developer and Mycroft’s last lead dev. Mimic3 is abandonware. The OVOS community has some public Mimic3 instances up for the time being, so Neon and OVOS devices running those plugins will keep working, but nobody is working on Mimic itself.

Dominik · October 1, 2023, 6:01pm

Legacy mycroft-core had a feature that stored WAV-files generated by Mimic2 for spoken phrases - don’t know if it made it into Neon/OVOS, though.

webacad · October 1, 2023, 6:21pm

I’m going to try modifying

github.com

MycroftAI/mimic3/blob/master/mimic3_http/app.py#L219


      
                  text = text[: args.max_text_length]
          
          
    # Cache settings
              no_cache_str = request.args.get("noCache", "")
              no_cache = _to_bool(no_cache_str)
          
          
    wav_bytes = await text_to_wav(
                  TextToWavParams(text=text, **tts_args), no_cache=no_cache
              )
          
          
    audio_target = request.args.get("audioTarget", "client").strip().lower()
              if audio_target == "client":
                  return Response(wav_bytes, mimetype="audio/wav")
          
          
    # Play audio on server
              play_cmd = shlex.split(args.play_program)
              subprocess.run(play_cmd, input=wav_bytes, check=True)
          
          
    return "OK"
          
          
@app.route("/api/voices", methods=["GET"])

so that if audioTarget is “none” and no-cache is false, it will not play the audio, but just cache it for later use.

goldyfruit · October 1, 2023, 10:01pm

Yep, it does.

JarbasAl · October 1, 2023, 10:13pm

mycroft had a hardcoded path for mimic2 only, in ovos we generalized that for every plugin, so there are 2 caches

runtime cache, every utterance is saved there so repeat speech is cached, deleted on reboot or if running out of disk space
permanent cache, those cache files are never deleted but are not auto generated either, classic core included mimic2 samples for default dialogs here (things that need to be spoken before selene was available, such as pairing and wifi setup)

we also introduced a config flag “persist_cache” that will save any new utterance to the permanent cache, this is meant to be enabled temporarily only for generating said cache

webacad · October 18, 2023, 2:48pm

Do you know the algorithm used to generate the names of the wav files? I’m creating the files (prefetching ahead of when they’re needed), but then I don’t know how to retrieve them based on the text and voice (an any other needed parameters).

JarbasAl · October 18, 2023, 3:10pm

its just a md5 hash

github.com

OpenVoiceOS/ovos-plugin-manager/blob/dev/ovos_plugin_manager/utils/tts_cache.py#L14


      
          from os.path import join, isdir
          import shutil
          from pathlib import Path
          from stat import S_ISREG, ST_MTIME, ST_MODE, ST_SIZE
          
          
from ovos_config.locations import get_xdg_cache_save_path
          from ovos_utils.file_utils import get_cache_directory as get_tmp_cache_dir
          from ovos_utils.log import LOG
          
          

          
def hash_sentence(sentence: str):
              """Convert the sentence into a hash value used for the file name
          
          
    Args:
                  sentence: The sentence to be cached
              """
              encoded_sentence = sentence.encode("utf-8", "ignore")
              sentence_hash = hashlib.md5(encoded_sentence).hexdigest()
              return sentence_hash

the path is built based on the TTS being used

github.com

OpenVoiceOS/ovos-plugin-manager/blob/dev/ovos_plugin_manager/templates/tts.py#L254C1-L262C35


      
          def get_cache(self, voice=None, lang=None):
              lang = lang or self.lang
              voice = voice or self.voice or "default"
              tts_id = join(self.tts_name, voice, lang)
              if tts_id not in self.caches:
                  self.caches[tts_id] = TextToSpeechCache(
                      self.config, tts_id, self.audio_ext
                  )
              return self.caches[tts_id]

the tts cache object can be found here

github.com

OpenVoiceOS/ovos-plugin-manager/blob/dev/ovos_plugin_manager/utils/tts_cache.py#L220


      
                  except Exception:
                      LOG.error(f"Failed to write {self.name} to cache")
          
          
    def exists(self):
                  return self.path.exists()
          
          
    def __str__(self):
                  return str(self.path)
          
          

          
class TextToSpeechCache:
              """Class for all persistent and temporary caching operations."""
          
          
    def __init__(self, tts_config, tts_name, audio_file_type):
                  self.config = tts_config
                  self.tts_name = tts_name
                  self.audio_file_type = audio_file_type
          
          
        persistent_cache = self.config.get("preloaded_cache") or \
                                     join(get_xdg_cache_save_path(), tts_name)
                  tmp_cache = get_tmp_cache_dir(f"tts/{tts_name}")