Core consists of a set of components that interact with each other via asynchronous messaging. The messages are used to control logic and data flow across various system components. This proposal is an attempt to define a generic framework that can be used consistently throughout the system to facilitate interactions within the same component and between different components.
Mycroft AI is requesting feedback from the community regarding this proposal. Please comment in this topic rather than in the document.
Iāll read it over a few times to be sure Iāve absorbed it, but, on first read-through, this seems largely compatible with HiveMindās thinking re: how to accomplish the same goals between instances of Mycroft and compatible devices.
A couple of notes:
HiveMind proposes an Action Registry, where Actions are similar in purpose to Events. The Registry will house a single, 128-bit GUID per conceptual Action, and a whole boatload of canonical, JSON-serializable names. This way, the ecosystem can share uniform messages, and interoperability is baked in.
HiveMind is already in the process of isolating bus clients, so each component has its own socket. The node running the bus is the arbiter of whether and to whom a message should be forwarded. This adds a pub/sub element to the bus model, but it allows for granular permissions, unifies (from Mycroftās and HiveMindās perspectives) the distributed and local experiences, and closes what I consider our biggest security hole.
Iām particularly interested if you think the Registry model could work, as the HiveMind side of it is, straightforwardly, Mycroft-native IoT. If we jog in the same direction on that front, we can put Skill developers in an interesting place as soon as 2022.
Nice document! The first question that came to mind is āHow significantly does this change the current architectureā?
After reading it, I searched for āserverā and got only one hit:
Sync Skill Settings Activity
Synchronize skill settings between the device and the server.
What is āthe serverā? mycroft.ai?
Please allow me to extrapolate with an example, perhaps only tangentially related. Say Iām visiting a big city and want to plan dinner. So I ask: āHey Mycroft, where can I make reservations at 6PM within five miles of my hotel?ā.
Mycroft is a personal assistant so he/she/it would know what hotel Iām staying in. Letās say there were a Voice Registry System (VRS) where restaurants can enroll their hours, menus, cuisine, seating, delivery options, etc. The VRS would return say a dozen records. Mycroft could reply, āThere are 12 restaurants with seven different cuisines. What cuisine are you interested in?ā
I know this is a bit of a stretch, but there is a group Iām working with in the Open VOice Network (OVON), trying to design such a system. Think of it as DNS on steroids.
Would such a VRS fit into this Component/Activity/Event Framework?
The term āserverā refers to the server side software that supports account.mycroft.ai. it is where a user can change skill settings, pair a device, etc.
To answer your question about how significantly this changes the architecture, I would say ānot veryā. The events are basically the message bus events that exist today. The activities are more of a logical construct than a physical one for defining a unit of work. Mycroft Core already defines several services. The biggest change in the proposal is a more well defined methodology around the naming and emitting of events.
In your example, the logic to do the restaurant search and reservations would be in a skill, which is a component. The skill would have activities and events defined that provide structure to the work being done within the skill.
I like the proposed conventions and structure; also nice to have a documented vocabulary we can use to be precise in what weāre talking about.
Couple clarifying questions here:
is an activity implicitly thread-bound (i.e. it is expected that the same thread will start/end an activity)?
I donāt think its explicitly stated in the doc; Iām assuming an event will be implemented as a Message object with context being preserved as it currently is when forwarded/replied to
And a few proposals:
In the Listener Service example, could Detect Dialog Activity be abstracted to generic audio processing. Iām thinking in terms of this implementation in neon_speech. I think there are use cases other than dialog detection that are useful for accepting/rejecting audio (volume threshold, SNR, speaker recognition)
Somewhere between STT and Intent services, consider building in a pre-parsing pipeline to do things like expand contractions, substitute pronouns, or perform other generic string processing. See implementation in NeonCore. This moves extra text parsing to plugins; base implementation could do nothing, but it adds flexibility for integrators
In Intent Service, I would propose accounting for other intent parsers (not saying I expect another parser now, just that the architecture should account for it)
And some thoughts more to the point of implementation than any piece of this spec. I would recommend maintaining each service independently (like how neon_speech above is extracted from core).
If communications follow these standards, there is no reason to expect that the services all run out of the same environment
In my experience, PRs are easier to manage when they are guaranteed to be scoped to one module (also forces a change to comply with communications specs rather than modifying them)
From a dev perspective, it is easier to read through a moduleās commit history than the full core commit history if Iām troubleshooting an issue (since tracking down the originating module/service is the easy part)
I would say yes. But keep in mind that activities are not necessarily an object, like a thread or method, but a set of logic bound within start/end events. That being said, I can see code bounding an activity in a method call or similar since it signifies a unit of work.
You are right that events are just Message objects. It wasnāt included in the document because it is an implementation detail. I donāt want to speculate about how we will deal with context in the future, but for now it will remain unchanged.
If I am reading this correctly, it sounds like a different activity rather than an abstract of Detect Dialog. It sounds like the audio playback example where there are Play MP3 and Play WAV Activities.
A good idea. Sounds like a new feature. The document was just an example of how Core in its current form could be broken out. The intent is that other services and activities not mentioned in the document can always be added using the same framework.
I was just mentioning in a recent dev sync meeting that Intent Parsers should use the plugin system so I agree with you completely.
Somewhere down the road is a project that will address your thoughts about independent services. Thanks for your suggestions. We will probably not do this in the first phase but may consider it in subsequent phases.
Thanks for the clarification on those few questions. I agree that many of the specifics I pointed out are beyond the initial scope here; mostly I wanted to highlight some areas where plugins or other extensions would logically fit in (and where we could consolidate some efforts via plugins).
I think of Detect Dialog as a particular audio processing activity, my proposal is to allow for other activities in addition to (or in place of) this one.
Main point here is that the framework needs to account for something between speech and skills modules to properly support this functionality. Initially, this could just be a stubbed handler that forwards events, but establishes the flow of speech (or cli/text) ā optional text parsing ā skills.
Since the message bus handles events and actions, I was thinking we could lend concepts/thoughts from OSS projects, such as Rasa (rasa.core.actions.action).
From the point of skills, I would like to see a pragmatic method-box allowing developers to interact with events, especially getting those needed to control the conversation flow (last user message, last bot message, last action xyz).