I was poking around the voice tech world (as I do whenever there’s a cool Mycroft announcement) and I decided to check in on VocalID - an old partner who had some hand in the original Mimic release.
Looks like they just this month got acquired by Veritone - who provide an enterprise-facing AI platform with many applications. I can’t remember if I considered them *Veritone a Mycroft competitor back when I was researching enterprise-facing voice AI tech.
VocalID does great work for disabled people so happy if this is a way for them to expand their reach. Wonder the last time Mycroft and their team were in touch.
I guess it would be hard to integrate as think it is now all closed source and would attach a cost of use.
I think VocalID are a really good example though as they have a specific area of expertise where they have developed voice tech IP that has been well received through peer review and pretty much cutting edge stuff.
Hoovering up permissive licensed research repos where the only dev is refactoring and rebranding is never going to really compete as if you listen to the voice quality of Mimic3 its never going to win any awards for being cutting edge but it was an incremental step up to what was offered by Mycroft previously.
This is a really good example of the whats wrong with the core and why spreading a small dev team very thin and offering relatively mediocre versions at every stage of a smart assistant with in-house branding is probably a really bad idea than concentrate on a flexible core that can incorporate many best of breed applications but it is a single framework that will link all into a single smart assistant.
I haven’t tested Mimic3 but listening to the samples I guess the quality is limited to the target platform which I would guess should run well with relatively low load on say a Pi3/Pi4.
That is another thing VocalID seem to of got right as for a platform of extremely capable devices able of running far more complex models, mobile phone platforms such as Android and iOS have a monopoly on devices which VocalID cater for.
The power available in a sub $400 mobile phone nowadays is truly astounding where cpu, npu & gpu means for the disabled the days of sounding like Stephen Hawking are well over and availability is a click away from a ‘play’ store.
Hopefully nobody is going to market a Pi3/Pi4 as disability enablement device or software developed to run on as with what is available at not much more cost is the minimum the disabled deserve.
Also same with the community that for top end TTS where you are prepared to invest more than just a Pi3/4 Mimic3 or at least the models we have so far are relatively ‘old tech’ in terms of quality.
Its why I think the only dev work should be integrating best of breed products so that a far greater range of choice can be given, access to the enhanced apps, services, support and communities they provide.
Yeah interesting - thanks for posting it!
Sounds like they’re more of a competitor with Coqui who seem to be aiming for the voice cloning market. There’s a whole lot of voice cloning startups at the moment but unfortunately many of them don’t seem to be thinking about the ethical implications of copying someones voice with small amounts of data. It’s very often done without their consent which is not only problematic from a personal perspective, it’s very likely to get them sued by the various media personalities and characters they are emulating.
VocalID do have a strong track record and from my understanding provide a much more personalized voice.
I also hope we don’t see people without experience in disability services trying to market products to that community. It’s so much more than being able to run a TTS service on some hardware. There is a lot of deep collaborative work that goes into designing those alternative augmentative communication systems - this is actually what my partner does for a living!