Tailoring Mycroft for an education project. Is it possible?

EDU · October 13, 2022, 8:41pm

Our project background:

We are working on an exciting project for education aiming to create an interactive way of learning via an inspirational edutainment package that introduce students to the technologies that will dominate their future including Immersive technology and gamification front end integrated to a backend of conversational AI platforms with curriculum models, General information bots (like wiki bot), and some entertainment deserts.

Coming from a struggling upbringing, it is important for us to make sure such service is available for all. So, the plan is to have cloud/hybrid version that saves the complexities of system set-up for schools where internet connection is not an issue, but also a full off-line version for remote locations where internet is a luxury or even non-existent. Our front-end is developed accordingly, and so we want our back-end to work.

Diving into Mycroft:

We went far with the development of our platform, but then we came across Mycroft! Our first impression is that it may take our project to a different level. It is obviously a matured solution, well structured, developed by a versatile team of experts and with a huge community behind it (We, on the other hand, are two people, only one can code and working part-time), and we are more experienced in gamification and immersive technologies than what the back-end requires… There are levels to this.

But let’s start with the major questions so we know if Mycroft is compatible with the needs of an education platform:

Will we be able to host the whole platform on our cloud with no strings attached to any third party?

That’s important for many reasons, for example:

a) Heavy customizations: We like the structure of Mycroft. Out of the box, it is perfect for what it is meant for as a personal assistant. For education, as I’ll present later, we will have to subject it to heavey customizations from skills, to core, to permissions, and flow. For example, a user can install a new skill via a simple voice command. Great for personal assistant! But if we open that for students at schools without any control, imagine the circus (and the issues of installing unsuitable skills for certain ages). Over time, out back-end will keep having the Mycroft DNA but gradually shapes up to become a different system.

b) The Edmodo crisis: At some point, we will work with major partners like ministries of education to develop curriculum models, create special pairing policies (for example, system only accessible for verified schools/students, or maybe no pairing at all), etc… After the Edmodo education platform shutdown earlier this year, leaving 90 million users in the air, there is a big concern for major education players to invest the time and effort to adopt a platform they don’t have a full control over its destiny. Many education sectors are yet to recover from that huge loss of retired Edmodo.

C) Training modules will be a more exclusive matter supervised by people with special permissions. We will even have many clones for the system for different scenarios. For example, a simple question like “define chemical reaction” should provide different responses depending on the academic level/field of the user/student asking the question. It will be an answer that perfectly matches this student curriculum.

Once we reach a point where we need the hardware device, is it easy to configure your devices (like Mark II) to work with our cloud instances?
Can a standard VPS server run Mycroft ai platform or does it require special cloud solutions?
Integration and communication:

a) Assuming we use local STT (like Vosk for example) and a local TTS (Mimic 3, for example) packaged within our desktop front end application, is there some sort of API – or equivalent – that makes it possible for our application to send text utterance to our cloud Mycroft server (bypassing the cloud Mycroft STT), and get the response from the cloud as a text (by-passing the mimic 3 TTS that exists on the cloud instance)? Can we also send the targeted skill with the utterance to make sure we get the response from that exact skill model? For example, “what is chimerical bond?” utterance, will give one answer using wiki skill, and another by using a trained conversational ai model that we prepares for a certain academic level. If so, it will be great if we can get the direction to the mycroft api that handle that.

b) If we create a web application (we can most likely upload it into the same server directory of Mycroft if that makes things easier), will we be able to integrate that webapp to mycroft system so once a user opens the front end web application, mycroft starts listening, then convert speech to text, deal with the utterance, and send back a wave file response (instead of speaking it) to the front end to play it?

This is shamefully long already, so I will stop here. We are just careful not to move away from what we’ve done already to something that looks way more amazing in mycroft but then get ourselves into a rabbit hole before discovering this amazing system is just hard to tailor to fit the requirements. Answers to the mentioned question will help us a lot to decide.

builderjer · October 13, 2022, 10:03pm

Do you have an URL for your existing project?

NeonDaniel · October 13, 2022, 11:16pm

Depending on what you mean by ‘no strings attached’, most of the Mycroft components are Apache licensed which is pretty permissive. The Selene backend is licensed under AGPL which does have some strings attached, but there are alternatives to using it, such as ovos-backend-client. As far as services like Wolfram|Alpha, Google STT, etc., those could be removed or replaced with alternatives if there is any concern with licensing those services.

I think this is the biggest advantage of Mycroft and related projects, you can look at all the code and host every piece of it yourself. Even if Mycroft goes away, you still have the code, just have to (optionally) replace the Selene backend.

What are you expecting to run on the cloud vs on-device? I think right now, every edge device handles skill processing on-device with remote and local options for STT/TTS. STT/TTS services are plugin-based, so it’s easy enough to make a plugin for whatever cloud service you want to use. I don’t think remote skills are implemented on any device yet. HiveMind is in early development with a goal of something like this, and Klat Chat is another application that routes user requests to get responses from one core instance.

Skills/intent parsing usually happens on-device in the skills service. If you use local STT and TTS, you wouldn’t necessarily need anything on a remote server (unless you want to).

This sounds like what we (NeonGecko) have on klat.2022.us (in beta) or the Ona assistant (Catalan language) https://ona.assistent.cat/ .

Dominik · October 14, 2022, 8:20am

Don’t know if it is exactly what you are looking for, but maybe some inspiration what is possible:

EDU · October 14, 2022, 5:38pm

Thank you for taking the time to post this extremely helpful reply.

As for “no strings attached”, I meant from the technical side. We have no issue with license type, as long as running the system is not dependent on registrations or running scripts that are out of reach and running thru another party. And that’s just to feel secured long-term. But you already elaborated very kindly about that.

My plan is to have the STT and the TTS ran locally. I will try some of the alternatives Mycroft mentioned in the documentation, but I also used Vosk running locally in a python environment and the accuracy was fine. Whisper AI is making a lot of noise lately, and we are having a look at it currently.

As for the TTS, we are using something locally but not satisfied, so we will have a look at Mimic 3 for that.

The rest work better for us being on the cloud as there is a need to monitor and control the skills available in the package.

I will certainly have a look at the links you shared . Look interesting.

Note that Klat Chat link is broken.

Again, thanks a lot for your help.

EDU · October 14, 2022, 5:39pm

Not published yet. I will share that when we go live.

EDU · October 14, 2022, 5:40pm

Thanks a lot. I will certainly check that out.

NeonDaniel · October 14, 2022, 6:13pm

Have you tried Coqui TTS? We have a plugin and there’s a demo up on huggingface. I’m running it locally on a Pi4 with decent performance.

For online use, I adapted Neon Core to run in containers and have it depolyed to Kubernetes to serve Klat. All of those core modules and dependencies are BSD licensed.

Oops; forgot to mark that repo as public, thanks for letting me know! Link should work now