Multiple Speaker/Person Recongition

fah · June 7, 2024, 7:53am

Hi,

I experimented/fiddled around with Mycroft a few years back and wanted to start over again - mainly because I want to run it locally and offline. I can vaguely remember, that multiple speaker (persons) recognition was not yet possible. This was due to hardware beeing not capable enough? (which should be obsolete, if we consider RPi5) and Software for it not (yet) beeing developed.
My questions is, if I want to run OVOS/Neon (or if there are other alternatives) locally - is it possible to have the device recognice different persons (by voice) and react differently.
If there are other systems, that can be run locally on a RPi and support multiple speaker recognition as well as skill development, I’m happy to hear suggestions, as I’m just yet trying to update myself on what has happened the last few years.

Thanks in advance

JarbasAl · June 7, 2024, 2:31pm

I am working on something like this, it is not ready but i can leave a sneak peak in this thread

I implemented a lot of stuff in OVOS to allow this sort of functionality, in particular we introduced a new type of plugins that can inject metadata

https://openvoiceos.github.io/ovos-technical-manual/speech_service/

and also the new concept of Session, which means some data can now be set per request instead of hardcoded in mycroft.conf , this will allow things like user preferences

https://openvoiceos.github.io/ovos-technical-manual/session_skills/

finally, the actual user recognition is a work in progress and not usable yet, but here is some code

mike99mac · June 8, 2024, 11:55am

This is another “WIP”. What about ovos-media replacing ovos-audio? Is that re-architecture dead?

JarbasAl · June 8, 2024, 6:26pm

no, ovos-media is still coming, and a part of it has already made it into ovos-core 0.0.8

ovos-media is roadmapped for ovos 0.1.0 release, we are currently in QA for 0.0.8