Fine Tuning Mimic 3

Hey everyone,

I’m currently working on a project that involves speech synthesis using the Mycroft Mimic 3 TTS package in Python. I’ve successfully set it up and got it working, but now I want to fine-tune it to improve the speech quality. I’m reaching out to this community to seek advice on how to proceed.
Specifically, I’d like to enhance intonation, pitch, and speed.

If you have experience in training mimic 3 or recommendations for datasets, libraries, or tools, I’d greatly appreciate your input. Any practical examples, tutorials, or code snippets would also be helpful.

Thanks in advance for your support!

To my knowledge the tools and instructions for training and fine tuning Mimic3 models have not been released to the pubic. Mimic3 is based on VITS, but has some changes/improvements. The former Mycroft employee who created Mimic3 is also the creator of Rhaspy voice assistant, maybe you want to have a look at Piper which is a kind of inofficial successor to Mimic3: GitHub - rhasspy/piper: A fast, local neural text to speech system


Thank you for your help! I will definitely take a look at Piper. Hopefully I can hack something together in colab and start training. Shame the model isn’t public as it is very good and has a permissive license! So much for “open source.” I scoured the GitHub repo and couldn’t find any Python code that directly involves the model.

1 Like

If interested in Piper this video tutorial on my Youtube channel might be helpful :wink:.