A layer (skill or ?) after the TTS .wav generation and before output

DrStein99 · February 12, 2022, 8:35pm

I want to perform additional processing after the .wav file is generated when whatever TTS engine selected. Ideally, my application would take the temp .wav file after TTS generation, process and re-write over, before it is picked up by and play speaker output.

Is there a skill, module, otherwise existing feature I can explore for this ?

baconator · February 13, 2022, 1:28am

If you’re only looking at audio manipulation and not additional api calls and the like, you might want to customize a TTS plugin for your usage.

iointerrupt · February 13, 2022, 3:22am

You can modify mycroft.conf setting play_mp3_cmdline to process the output file via a bash script prior to playing the audio. As an example, the default value i believe is:

{
 "play_mp3_cmdline": "mpg123 %1"
}

can be modified with something like:

{
"play_mp3_cmdline": "/bin/bash /home/mycroft/myscripts/process_before_play.sh %1"
}

with a script like:

#!/bin/bash

process_audio $1 >> /tmp/myrandom_file.mp3

# Play the processed audio
mpg123 -q "/tmp/myrandom_file.mp3"

# delete the post-process audio files after play
rm "/tmp/myrandom_file.mp3"

At the end of of your script you just tag on the mpg123 command so that it plays out the speaker or whatever with the altered audio file you want. You can also do that with the other play_wav_cmdline or play_ogg_cmdline as well.

gez-mycroft · February 14, 2022, 5:31am

So simple, but I’d never thought of that - nice one!

Gabriel · February 14, 2022, 4:50pm

I like the idea.

Extending on that… The other way around? I mean, after TTS processess can I get the result text and store it? I’m thinking about autotagging that can be use for future learning material.

Interesting

DrStein99 · February 14, 2022, 6:32pm

I was using my system to play sound effects, music, etc…

Does the “play_mp3_cmdline” line in mycroft.conf, interpret ALL mp3 commands to play audio or just just the voice?

iointerrupt · February 15, 2022, 1:04am

From what I can tell, anything that’s processed by the TTS engine is temporarily cached in (in my case) “/tmp/mycroft/cache/tts/GoogleTTS/” because I am using Googles TTS. This is probably dynamic based on the TTS you are using. An audio stream skill likely place their files in a different location if they are caching for streaming. You can modify your script to only process the audio if the base directory is ‘/tmp/mycroft/cache/tts/’ otherwise just play without processing. Here’s some pseudo-code:

[ "`dirname $1`" != "/tmp/mycroft/cache/tts/GoogleTTS" ] && just play the file && exit

You can copy the temporary TTS file to another location and then play it within the script.

Edit: fixed typo in pseudo code