Short sound to indicate error

As a Skill developer if there is some error currently you generally speak some dialog, eg:

“something went wrong”
“I didn’t understand which movie you meant”
“I can’t connect to X please check your settings”

Sometimes though, you don’t need a long response, you just want a way to inform the user that it didn’t succeed. Well, there’s an interesting PR to add an error sound. Currently it’s only about an error sound, however this could very easily extend to other common sounds eg success, email sent, turned on, etc.

So I was wondering how Skill authors would prefer to use this. The existing suggestions are:

  • speak_dialog(dialog, data, is_error)
  • report_error(dialog, data)
  • report_outcome(type="error", dialog, data)
0 voters

If you have deeper thoughts, please comment here or join the discussion on this PR.

More flexibility is always good in such case, to cover warnings or other types of meta information in a generic way. This probably requires some sort of additional logging backend, so that log messages are optionally not only printed to file but additionally spoken, with a sort of identifying sound or prefix for each log level type.

Related additional config options could be:

  • acoustic_log_level=NONE (default) CRITICAL ERROR WARNING …
  • acoustic_log_dialog=true|false
    When false, play notification sound only which identifies the log level, but do not speak log text.

For logs below warning this probably doesn’t make sense as Mycroft-wide logs appear faster then it is able to speak them. But having a generic backend on regular/existing logging basis is probably better after all then adding a new independent type/set of functions?

Ah, I think when it’s disabled by default, it somehow breaks the idea to allow forcing the acoustic output by app developers? :thinking:
Although something informative like “I didn’t understand which movie you meant” is probably even better to have as gracefully “handled” regular answer, while errors are best to reserve for unforeseen issues, especially internal code errors, due to kernel/hardware-level, access issues and what not, that is not only due to an invalid question or non-accessible remote resource.

Another issue: Those python error traces can be very long and impossible to reasonably “speak”, so that would need to be excluded anyway… okay not as simple on the second thought.

Both as a Skill Developer and as a potential User, my first impulse is to make the presence or absence of a semaphorical sound (semaphone? phonosemaphore?) a user configurable option. I.e., device configuration page would offer switches or radio controls to enable or disable (or perhaps conditionally enable) that behavior, per device. So I think the most consistent option would actually be a mix of first and third option: kind of an optional semantic tag (which later on could have a few other values other than error), but attached to the already established speak_dialog method (i.e., speak_dialog(dialog, data, type=“error”) ).

Purely as a sci-fi fan user: I think it would be really cool to have not only that option to enable event audio signals (acknowledge, error, long-running-task-complete, and of course already existing “listening” sound and time-out-expired sound), but also to have “audio themes”. Maybe one or two classic-but-copyright-free themes are included, but tech-tinkerer users could upload some replacement Star Trek Enterprise LCARS sound effects or your own DIY effects.

Final thought: in some future time if Mycroft were to have expanded accessibility features, the idea of semaphore could be abstracted a little and have additional options for visual surrogates to the audio sounds. For example: a deaf user might enable (on a Mark II or better) a visual signal where the whole screen or maybe just a thick border flashes twice red, perhaps along with the led ring. That could setting could just be side by side with the audible signal settings on

1 Like

I’m scared people will think this isn’t helpful at all, so uh…I can delete this if it doesn’t contribute to the discussion, because in terms of programmer ability, I’m what you call a “non-programmer”. (Though, I’m trying to learn some Python so I can write some skills…wish me luck.)

I agree with both posters here that flexibility is the way to go. (And I’m fully down with @jrwarwick’s idea to use LCARS sound effects, because I basically want to live on The Enterprise. (E)) Specifically with regard to errors, especially ones that use the fallback response (which I personally do not care for, I’d rather hear a short sound than the fallback) the sort of option I’d find most useful as a user would be a simple error sound. In the case of the fallback skill, that’s all that would be necessary. Then further information could be requested from the system, with something like “What’s the problem?”, “Why not?” or “Clarify” or “Explain”.

Standards for skills to try and turn errors into user-friendly explanations could potentially fit into that sort of framework going forward. For example, I imagine the following conversation.

“Hey Mycroft, play “Shake It Off” by Taylor Swift.”
(sad beep)
“Hey Mycroft, Clarify?”
“Unable to connect to Spotify. Other internet connections are unaffected.”

1 Like

Love the ideas, the concept of theming would be a very cool personalisation feature - in the perfect world this would include LCARS GUI screens, TTS voice, sound package, and modified dialog for at least the system default stuff.

Also great to be thinking about users with different interfacing needs!

1 Like

I love it, Gez! Full theme packages. It seems obvious now that you have described it. I’m seeing Pip Boy from Fallout, a few 'Trek themes (of course), StarCraft Adjutant, Cortana (Halo), MCP (Tron), Jarvis ( Marvel movies), HAL9000, GLaDOS (half-life), etc. Each with a couple of coordinated just-for-fun inside-joke skills (a la “beam me up”) appropriate to the fictional universe the theme represents.

To make this FR a little more universal, I’d suggest adding support for the SSML <audio> tag in Mimic3. This would allow including arbitrary pre-recorded audio within the TTS messages and could be used for prepending “alert” type sound snippets to TTS messages.

I’d love to switch to Mycroft/Mimic3 for my Home Assistant installation, but this is actually the one feature that keeps me from doing it. I currently still use PicoTTS, which allows me to do just that (add Enterprise-like sounds to messages) using SSML, plus some other features (like changing volume and language). The generated audio is than played on a bunch of synchronized “audio alert boxes” (Pi Zero’s within a 3W USB speaker module), using Logitech Media Server.

Here is an example of my bilingual (German/English) Home Assistant “say_greeting” script, using a bunch of variables previously set and some randomness. It looks rather complicated, but you’ll get the idea:

  alias: 'Say greeting'
  description: 'Speak a greeting message via TTS'
      description: 'Entity Id of the media player to use'
      example: 'media_player.signalpi1'
      description: 'Name of introductory audio file to play. MUST be WAV, pcm-s16le, mono, 16000 Hz.'
      example: 'picotts-beep.wav'
      description: 'The name of the person to welcome (can be a template)'
      example: "Matthias"
      description: 'The language to speak in (defaults to tts: setting)'
      example: 'en-GB'
    # only alert if "Audio Alerts" is on
    - condition: state
      entity_id: 'input_boolean.audio_alerts'
      state: 'on'
    # play text message
    - service: tts.picotts_say
        entity_id: "{{ entity_id|default(states('var.audio_alerts_media_player'),true) }}"
        language: "{{ language | default(states('input_select.audio_language'), true) }}"
        message: >
          {% set language = language | default(states('input_select.audio_language'), true) %}
          {% set lang = language[0:2] %}
          {% if audiofile == '' %}
          {% elif audiofile %}
            <play file="{{ states('var.audio_alerts_base_path') }}/{{ lang }}/{{ audiofile }}"/>
          {% else %}
            <play file="{{ states('var.audio_alerts_base_path') }}/{{ lang }}/picotts-{{ ['beep','transporter','door']|random }}.wav"/>
          {% endif %}
          <volume level="60">
          {% if lang == 'de' %}
            {% set greeting = [
              "Willkommen auf der Enterprais, {{ message }}!",
              "Hallo {{ message }}, schön dich zu sehen!",
              "Hai, {{ message }}!",
              "Schön, dass du wieder hier bist, {{ message }}!",
              "Ach, {{ message }}, schon WIEDER nur du!",
              "Wer stört mich jetzt schon wieder? Ach, {{ message }}, du bist's.",
              ] | random | replace('{{ message }}', message) | replace('Matthias',['Käpten','Schäffe','Meister', 'Matze']|random)
            {{ greeting }}
          {% else %}
            {% set greeting = [
              "Welcome aboard the starship Enterprise, {{ message }}!",
              "Hey {{ message }}, good to see you!",
              "Hi, {{ message }}!",
              "Great to see you are back, {{ message }}!",
              "Aww, {{ message }}, it's only you! AGAIN.",
              "Who is it? Ah, {{ message }}, come in!",
              ] | random | replace('{{ message }}', message) | replace('Matthias',['Captain','Boss','Master','Matt']|random)
            {{ greeting }}
          {% endif %}

This is actually one of my simpler scripts—would just love to be able to use Mimic3 for that (using the official <audio> tag instead of PicoTTS’s <play>)!

The above script gets triggered in Home Assistant a few minutes after I arrive back home, and—depending on the language setting—greets me with a random greeting, prepended by a Star Trek TNG “beep”, “transporter”, or “door opening” sound. :wink:

Using the <audio> (or, with PicoTTS, <play>) tag has the advantage that the TTS engine will create only one WAV file that an external media player can easily play—including whatever audio you might have included.

One might have to store audio in a format compatible with the TTS-generated output (16kHz mono for PicoTTS, probably 22kHz mono for Mimic3), but that shouldn’t be a problem.

1 Like