Encoding issue with mimic3-server (Latin-1 vs UTF-8)

Hi!

I have an encoding issue with mimic3-server:

$ mimic3 --remote --voice 'en_UK/apope_low' "I don’t speak English" | aplay --quiet
Reading text from stdin...
Traceback (most recent call last):
  File "mimic3.py", line 40, in <module>
  File "mimic3_tts/__main__.py", line 129, in main
  File "mimic3_tts/__main__.py", line 450, in process_lines
  File "mimic3_tts/__main__.py", line 397, in process_line
  File "mimic3_tts/__main__.py", line 587, in get_remote_wav_bytes
  File "requests/api.py", line 115, in post
  File "requests/api.py", line 59, in request
  File "requests/sessions.py", line 587, in request
  File "requests/sessions.py", line 701, in send
  File "requests/adapters.py", line 489, in send
  File "urllib3/connectionpool.py", line 703, in urlopen
  File "urllib3/connectionpool.py", line 398, in _make_request
  File "urllib3/connection.py", line 239, in request
  File "http/client.py", line 1255, in request
  File "http/client.py", line 1300, in _send_request
  File "http/client.py", line 164, in _encode
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 5: Body ('’') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
[582387] Failed to execute script 'mimic3' due to unhandled exception!
aplay: read_header:2931: erreur de lecture

However, there is no issue with mimic3:

$ mimic3 --voice 'en_UK/apope_low' "I don’t speak English" | aplay --quiet
Reading text from stdin...
INFO:mimic3_tts.tts:Loaded voice from /usr/share/mycroft/mimic3/voices/en_UK/apope_low

The error message states: “Use body.encode(‘utf-8’) if you want to send it encoded in UTF-8.” but I don’t know how do this. I simply run the server with the command:

$ mimic3-server --num-threads 6

I couldn’t find the option to tell the server that the input is utf-8 encoded. Here the versions of mimic3 and mimic3-server:

$ mimic3 --version
0.2.3
$ mimic3-server --version
0.1.1

Here are my locales et system:

$ env | grep LANG
LANG=fr_FR.utf8
GDM_LANG=fr_FR.utf8
$ lsb_release -a
LSB Version:    n/a
Distributor ID: Manjaro-ARM
Description:    Manjaro ARM Linux
Release:        23.02
Codename:       n/a

Okay, I’ll give a tip to get around the issue.

echo "I don’t speak English" | iconv -f UTF-8 -t ISO-8859-1//TRANSLIT | mimic3 --remote --voice 'en_UK/apope_low' | aplay --quiet

This converts UTF-8 strings to ISO-8859-1 (i.e. Latin-1) while attempting to transcribe unrecognized characters, like "’".

I think this is a bug, because mimic3-server should accept UTF-8 encoding, as mimic3 does without problem.

Hi @regivanx

Mimic3 is not maintained anymore as Mycroft AI went down.

1 Like

We are currently recommending that most self-hosters move to Piper from Rhasspy, whose lead dev was Mycroft’s lead dev when Mimic3 was written and which can (perhaps consequently) be regarded as a spiritual successor to Mimic.

OVOS devices configured to use public Mimic3 servers are currently hitting a handful of instances maintained by our community - including Goldyfruit there - as a courtesy. There’s no sunset date, but they can’t possibly stay up forever.

2 Likes