Girlfriend can't wake Mycroft

sparkyvision · April 11, 2022, 1:37pm

I’ve had excellent success with training a custom model. (And shameless plug here for some guides I wrote to help you do that!) The biggest hurdle was convincing her to sit down and say the wake-word over and over again, which she does not really see the point of.

Thing is, systems like Alexa or Siri have terrifically gigantic userbases that are constantly being fed massive, stupefying amounts of data from across the voice spectrum, whereas projects like this one tend to - when it comes to human-machine interaction, anyway - be inherently biased toward the userbase that creates them. This makes sense and isn’t a moral judgement - bias happens in all systems - but I’ve wondered about how we could encourage a large amount of people from across the voice / accent spectrum to contribute samples. It’s a large undertaking…and it depends on what you’re trying to train the model to do.

I don’t understand the internals of how precise works, so I don’t know if it would better, say, to have models trained to hear one wake word across a variety of accents and voices, or if there’s a point (and there almost certainly is) where the variation in pronunciation starts to make the model less, well…precise. (Pun intended)

I’m sort of babbling here, and maybe it doesn’t matter so much for pitch vs pronunciation, but all this to say, at the moment, training a custom model may be the right way to go for you.