YouTube Audio Skill - testing and feedback

mcdruid · September 11, 2019, 11:53pm

Mycroft skill to play audio from YouTube, using the Common Play Framework.

This is heavily based on the excellent I Heart Radio and Tunein skills by johnbartkiw.

It uses Mycroft’s VlcService for playback, and pafy / youtube-dl to fetch media details.

It’s still a bit of a mess, but the basics work.

How to install YouTube Audio Skill

msm install https://gitlab.com/mcdruid/mycroft-youtube-audio

YouTube Audio Skill connects to … YouTube, without any credentials.

How to test YouTube Audio Skill

No configuration needed. Simply say something like

“Play Hendrix (on|with|using) youtube”
“Play Midnight in Harlem”
“Play Clapton Crossroads”
“Stop”

Where to direct feedback

Happy to receive feedback here, or as issues / PRs on gitlab.

gez-mycroft · September 12, 2019, 1:26am

That’s awesome

Great to use existing resources too, we’re all standing on the shoulders of giants!

malevolent · October 14, 2019, 9:58am

Working great here. Love this skill.

I would like to give you some ideas:

Pause/Resume option
Go to minute X (FF/Rewind feature)
Search for more titles (e.g: play chillout music always returns Balearic Chill Out Vibes Compilation 2 + Balearic Summertime 2 if when finish or telling “Another like this”, will play the next result

image956×742 181 KB
“Play random XXXXX” would return any result not just the first one, so most probably won’t play the same song if you search by style, like above.
Use the Mark I mouth screen to show information like name song and duration/time remaining
Use the MarkII screen to show information like name song, duration/time remaining and thumbnail/video
playlist support
(with playlist support): Next/Previous song

malevolent · October 14, 2019, 5:38pm

I did notice I cannot speak to Mark 1 when playing, so I cannot say “Hey Mycroft, stop”

Dominik · October 14, 2019, 5:51pm

Unfortunately the Mark-1 does not support echo cancellation, so it will not work…

…unless you turn the Mark-1’s volume low and shout loud enough (preferably directly in front of the faceplate where the microphone is) - then it will work…

malevolent · October 14, 2019, 6:02pm

Heheh, in that case I would rather prefer to push its “stop” button

I will test the skill on my desktop computer, but I guess my cheap USB mic won’t support echo cancellation either…

EDITED: surprisingly, on my desktop works

j1nx · November 16, 2019, 11:35am

Tested this Skill the last few days, now my system is finally setup to work as supposed (Volume control, AEC, Mute on Wakeword, Ducking of audio) (RPI3b running my own MycroftOS)

What works / My findings

Finds almost always what I want to play
Sometimes take a bit to find it, but that is most likely because of the search itself, amount of returns.
Hey mycroft during playback nicely mutes the playback (although slightly delayed)
Stop command when the playback is muted stops the playback (although because of the small delay mentioned above and the rather short period of muted sound, doesn’t feel “snappy”. Sometimes I am just to early or to late)
While playing something from youtube, I can give the command to play something else. Playback continues and is nicely ducked while TTS speaks the “one moment” feedback, then stops the playback and searches for what I justed asked.

Feedback / bugs / issues (not necessary this skill, could well be the CPS system or both)

It doesn’t always finish the TTS feedback (example: “Give me a moment to check for that”) giving the play command. However I believe this is a CPS bug. Depending on how long the CPS query takes it can either finish that sentence or not. It appears to be that, as soon as CPS figures out this skill can handle the command and forwards the command to this skill, playback of the TTS sentence is cut off if it wasn’t finhed yet.
While playing something from Youtube and I give another play command during playback, like said above, first it ducks the audio stream while speaking the TTS “Give me a moment to check for that”, then figures out I want something again from Youtube, forwards the request to this same skill, stops playback, searches and playback the new request. All fine, that is what i want; Stop the playback and play something else. However, if I play something from youtube and while playing back I say “stream radio station bla bla” using the TuneIn radio skill from @johnbart CPS figures out that I want to play that radio channel, forwards the request to that skill and then it starts playback as well. Meaning I have two things playing at the same time as the previous playback from this skill on youtube is not stopped, while the TuneIn playback starts. Not sure if this is caused by this skill, CPS or the TuneIn skill, however doing it the other way around it works. If I stream a radio station and while playing ask to play something from youtube, the radio get’s stopped before the youtube playback starts.

Feature requests / Nice to haves

While playing somethin from Youtube, when I ask to play something else from youtube the previous stream get’s stopped at the point CPS forwards the new request to this skill, leaving a (long) period of silence before the next playback starts. Perhaps the stopping of the previous stream could be initiated, after the search just before it plays the next found request. This way my playback continues till the new youtube search is ready. (Again, not sure if this is a Skill or CPS feature request)
Had some other “nice to haves” but can’t remember them all anymore. Will report them as soon as my brain finds them back…

Perhaps a bit to much text for a feedback post, but took some time to properly test this skill because it works so nicely to play anything without the need of any credentials and stuff.

Used this skill to test all the muting, ducking and AEC stuff. And it worked great !! I really like this skill and to be honest; Strongly believe this skill should be installed by default at any Mycroft instance because, I think any voice assistant should be able to play some music for free out of the box. That is what this skill provided.

Keep up the great work.

@forslund Sorry, I am tagging you as you can say more on the internals of CPS. Perhaps, you can pick something out of this as well to improve upon CPS.

gez-mycroft · November 16, 2019, 4:56pm

Thanks for the detailed feedback j1nx.

We will have to do some more testing with switching between music Skills to see what might be happening there as it would be great to get this in the official Marketplace!

j1nx · November 16, 2019, 4:59pm

Great, just to confirm. I think it has to do about this skill, because if I play something via another skill and play another or this skill, two streams will run as this skill doesn’t stop the previous playback where others do.

Might be something that slipped through playing stuf via ALSA as that system hardly allows access to sound hardware at the same time. With PulseAudio that is more than possible.

j1nx · November 17, 2019, 6:24pm

I think I figured out why two streams at the same time get’s started. This skill explicitly uses VLC as it’s audio service while the other doesn’t and therefor just uses the default audio service.

Because of that, we spawn two different audio playback programs, and that is no problem for Pulseaudio. Meaning two streams played at the same time.

So I believe this is a CPS bug as the stop command given to CPS should stop ALL audio backends instead of only the one being used or the default.

I think…

forslund · November 18, 2019, 5:09pm

Hmm, this skill imports the VLC service into the skill which means it’s not running through the audioservice and won’t react to the mycroft.audio.service.stop message and won’t actually report that there’s anything playing. If instead it’d use the Audioservice() interface I believe it would be stopped.

That said the CPS_start sends a mycroft.stop message to stop anything playing audio in this skill should trigger when starting another playback…

Gonna see if I can repeat the issue here

j1nx · November 18, 2019, 5:18pm

I can start a stream with the TuneIn skill.
Then start a second stream with this skill.
Both streams play at the same time.

(Or the other way around, forgot a bit)

forslund · November 18, 2019, 5:27pm

Hmm, for me it always stops the previous stream so far…

Could it be a pulseaudio related thing?

I recall when using the module-role-cork plugin to mute it actually pauses the player and then unpauses when the stream is unmuted…

j1nx · November 18, 2019, 5:38pm

Yeah for sure it is Pulseaudio. I have the exact same settings as you guys use for the mark 2 and forcing all other stuff that doesn’t obey my wishes to use Pulseaudio.

Will have acces to my laptop a bit later today and will post my configs.

forslund · November 18, 2019, 6:01pm

Thanks, I’ll see if I can set it up on my laptop as well… Been a while since I used the pulse ducking on this device

j1nx · November 18, 2019, 6:14pm

OK, just confirmed it.

“Hey mycroft, play rise against hero of war from youtube”

“>> Just a second”
“>> Now playing Rise Against - Hero of war ( Official Video ) from youtube”

“Hey mycroft, stream radionl”

“>> Playing streamin station RADIONL”

After the “Hey Mycroft” while playing the youtube stream, the listener beep got played, the stream get’s muted, and then continued at a low volume for the TTS output of “Playing streaming station RADIONL”. Then the volume of youtube get’s restored and the radio station also get’s played. Two songs at the same time.

Giving the stop command, stops them both.

Here are the default configs for my system;

asound.conf

# Use PulseAudio by default
pcm.!default {
  type pulse
  fallback "sysdefault"
  hint {
    show on
    description "Default ALSA Output (currently PulseAudio Sound Server)"
  }
}

ctl.!default {
  type pulse
  fallback "sysdefault"
}

deamon.conf

# This file is part of PulseAudio.
#
# PulseAudio is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# PulseAudio is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with PulseAudio; if not, see &lt;http://www.gnu.org/licenses/&gt;.

## Configuration file for the PulseAudio daemon. See pulse-daemon.conf(5) for
## more information. Default values are commented out. Use either ; or # for
## commenting.

; daemonize = no
; fail = yes
; allow-module-loading = yes
; allow-exit = yes
; use-pid-file = yes
; system-instance = no
; local-server-type = user
; enable-shm = yes
; enable-memfd = yes
; shm-size-bytes = 0 # setting this 0 will use the system-default, usually 64 MiB
; lock-memory = no
; cpu-limit = no

; high-priority = yes
; nice-level = -11

; realtime-scheduling = yes
; realtime-priority = 5

; exit-idle-time = 20
; scache-idle-time = 20

; dl-search-path = (depends on architecture)

; load-default-script-file = yes
; default-script-file = /etc/pulse/default.pa

; log-target = auto
; log-level = notice
; log-meta = no
; log-time = no
; log-backtrace = 0

; resample-method = speex-float-1
; enable-remixing = yes
; enable-lfe-remixing = no
; lfe-crossover-freq = 0

; flat-volumes = yes

; rlimit-fsize = -1
; rlimit-data = -1
; rlimit-stack = -1
; rlimit-core = -1
; rlimit-as = -1
; rlimit-rss = -1
; rlimit-nproc = -1
; rlimit-nofile = 256
; rlimit-memlock = -1
; rlimit-locks = -1
; rlimit-sigpending = -1
; rlimit-msgqueue = -1
; rlimit-nice = 31
; rlimit-rtprio = 9
; rlimit-rttime = 200000

; default-sample-format = s16le
; default-sample-rate = 96000
; alternate-sample-rate = 48000
; default-sample-channels = 4
; default-channel-map = front-left,front-right

; default-fragments = 4
; default-fragment-size-msec = 25

; enable-deferred-volume = yes
; deferred-volume-safety-margin-usec = 8000
; deferred-volume-extra-delay-usec = 0

# MycroftOS Audio Settings
resample-method = ffmpeg
default-sample-format = s24le
default-sample-rate = 48000
alternate-sample-rate = 44100
default-sample-channels = 4

system.pa (I am running pulseaudio systemwide)

#!/usr/bin/pulseaudio -nF
#
# This file is part of PulseAudio.
#
# PulseAudio is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# PulseAudio is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with PulseAudio; if not, see &lt;http://www.gnu.org/licenses/&gt;.

# This startup script is used only if PulseAudio is started per-user
# (i.e. not in system mode)

.fail

### Automatically restore the volume of streams and devices
load-module module-device-restore
load-module module-stream-restore
load-module module-card-restore

### Automatically augment property information from .desktop files
### stored in /usr/share/application
load-module module-augment-properties

### Should be after module-*-restore but before module-*-detect
load-module module-switch-on-port-available

### Load audio drivers statically
### (it's probably better to not load these drivers manually, but instead
### use module-udev-detect -- see below -- for doing this automatically)
#load-module module-alsa-sink device="hw:1,0" channels=8 rate=48000 format=s32le
#load-module module-alsa-source device="hw:1,0" channels=8 rate=48000 format=s32le
#load-module module-oss device="/dev/dsp" sink_name=output source_name=input
#load-module module-oss-mmap device="/dev/dsp" sink_name=output source_name=input
#load-module module-null-sink
#load-module module-pipe-sink

### Automatically load driver modules depending on the hardware available
.ifexists module-udev-detect.so
load-module module-udev-detect
#channels=8 rate=48000 format=s32le
.else
### Use the static hardware detection module (for systems that lack udev support)
load-module module-detect
.endif

### Automatically connect sink and source if JACK server is present
.ifexists module-jackdbus-detect.so
.nofail
load-module module-jackdbus-detect channels=2
.fail
.endif

### Automatically load driver modules for Bluetooth hardware
.ifexists module-bluetooth-policy.so
load-module module-bluetooth-policy
.endif

.ifexists module-bluetooth-discover.so
load-module module-bluetooth-discover
.endif

### Load several protocols
.ifexists module-esound-protocol-unix.so
load-module module-esound-protocol-unix
.endif
load-module module-native-protocol-unix auth-anonymous=1

### Network access (may be configured with paprefs, so leave this commented
### here if you plan to use paprefs)
#load-module module-esound-protocol-tcp
load-module module-native-protocol-tcp auth-ip-acl=127.0.0.1;192.168.0.0/16;172.16.0.0/12;10.0.0.0/8 auth-anonymous=1
load-module module-zeroconf-publish

### Load the RTP receiver module (also configured via paprefs, see above)
#load-module module-rtp-recv

### Load the RTP sender module (also configured via paprefs, see above)
#load-module module-null-sink sink_name=rtp format=s16be channels=2 rate=44100 sink_properties="device.description='RTP Multicast Sink'"
#load-module module-rtp-send source=rtp.monitor

### Load additional modules from GConf settings. This can be configured with the paprefs tool.
### Please keep in mind that the modules configured by paprefs might conflict with manually
### loaded modules.
.ifexists module-gconf.so
.nofail
load-module module-gconf
.fail
.endif

### Automatically restore the default sink/source when changed by the user
### during runtime
### NOTE: This should be loaded as early as possible so that subsequent modules
### that look up the default sink/source get the right value
load-module module-default-device-restore

### Automatically move streams to the default sink if the sink they are
### connected to dies, similar for sources
load-module module-rescue-streams

### Make sure we always have a sink around, even if it is a null sink.
load-module module-always-sink

### Honour intended role device property
load-module module-intended-roles

### Automatically suspend sinks/sources that become idle for too long
load-module module-suspend-on-idle

### If autoexit on idle is enabled we want to make sure we only quit
### when no local session needs us anymore.
.ifexists module-console-kit.so
load-module module-console-kit
.endif
.ifexists module-systemd-login.so
load-module module-systemd-login
.endif

### Enable positioned event sounds
load-module module-position-event-sounds

### Cork music/video streams when a phone stream is active
load-module module-role-cork

### Modules to allow autoloading of filters (such as echo cancellation)
### on demand. module-filter-heuristics tries to determine what filters
### make sense, and module-filter-apply does the heavy-lifting of
### loading modules and rerouting streams.
load-module module-filter-heuristics
load-module module-filter-apply

### Make some devices default
#set-default-sink output
#set-default-source input
#set-default-source alsa_input.platform-soc_sound.seeed-source
#set-default-sink alsa_output.platform-soc_sound.seeed-sink

### MycroftOS Audio Settings
unload-module module-suspend-on-idle
unload-module module-role-cork
load-module module-role-ducking

### Enable Echo/Noise-Cancellation
load-module module-echo-cancel aec_method=webrtc source_name=echoCancel_source sink_name=echoCancel_sink
set-default-source echoCancel_source
set-default-sink echoCancel_sink

/etc/mycroft/mycroft.conf

{
  "play_wav_cmdline": "paplay %1",
  "play_mp3_cmdline": "mpg123 %1",
  "ipc_path": "/ramdisk/mycroft/ipc/",
  "enclosure": {
    "platform": "MycroftOS",
    "platform_build": 1
  },
  "listener": {
    "mute_during_output": false
  },
  "tts": {
    "module": "mimic2",
    "mimic2": {
      "lang": "en-us",
      "url": "https://mimic-api.mycroft.ai/synthesize?text=",
      "preloaded_cache": "/opt/mycroft/preloaded_cache/Mimic2"
    },
    "pulse_duck": true
  },
  "skills": {
    "priority_skills": ["mycroft-pairing", "mycroft-volume"]
  },
  "log_level": "INFO"
}

/home/mycroft/.mycroft.conf

{
  "max_allowed_core_version": 19.8
}

j1nx · November 18, 2019, 6:42pm

Forgot to mention that at this moment I patched the volume skill to add my “MycroftOS” enclosure tag to the ALSA_PLATFORMS as I have not yet started coding that part. But this should not matter because removing the enclosure tag from my config makes it an unknown tag doing the same.

forslund · November 18, 2019, 7:52pm

Ok that I can repeat.

It’s the “stream X” command in the tune in radio that don’t emit a mycroft.stop message like a normal common play skill would (ex “play jack radio”).

j1nx · November 18, 2019, 8:02pm

Ok great! That’s why I said, not sure who to blame.

Then I will move over that feedback to that skill thread.

Thanks and sorry for bothering you😊

forslund · November 18, 2019, 8:09pm

No worries, glad we could get to the bottom of this

I do have some todos on the audioservice (especially VLC backend) so it’s good I got a kick in the right direction to get me going.