Mimic 3 TTS - speech metadata

Hello :wave:,

Is obtaining metadata about the generated speech supported? I would like to get timestamps of where the pronunciation of individual words starts and ends. I cannot find anything about it in the documentation.

If not, is there any plan to bring the support in?

Thanks! :slight_smile: