Adapt - First Open Source Intent Parser Released by Mycroft A.I

Hey guys, we released some of the secret sauce today - the Adapt Intent Parser!

You can find it here:

Here is the accompanying video:


Great, will peruse through it tonight. Had a quick look for now though - just wondering, though, is there one single * file, or does it work with a bunch of them? Where will they live?

I assume we can add our own, as that seems to be a lot of what this project is all about, but will there be a core of immutable ones or is everything open for editing? Will you be providing a platform to showcase and share useful ‘intents’, or is that what github is for?

Good work, by the way :slight_smile:

Had a quick look in to examples provided with Adapt. Great work !!!
I have similar questions, what @Autonomouse already asked + few more.

Writing a Intent parser, will it be a manual process or automated using data from a database/excel sheet?

Who will decide which is ‘required’ and which is ‘optional’ data for an intent parser ?
Will design Mycroft Intent Handler first and then design Intent Parser ?

Whether there any second intent parser use in Mycroft Intent Handler if (First) Intent Parser provides little data ?
For ex: “Play Song” (Artist is optional here)

Any plan to handle ambiguous results ? (Else user needs to keep track of require keywords used by Intent Parsers)
See the below example, I want to listen song but instead I am getting Tokyo Weather :smile:
“play tokyo weather masturi song”
“intent_type”: “WeatherIntent”,
“WeatherKeyword”: “weather”,
“Location”: “tokyo”,
“confidence”: 0.3440860215053763,
“target”: null

How same Intent Parser will handle phrases like "Set First floor Office Temperature to 20"
and “Switch On Ground floor Kitchen Table Light” and "Switch Off Second floor Bathroom Heater" ?
So many required keywords !!
[Set/On/Off, Ground/First/Second Floor, Kitchen/Office/Bathroom, Heater/Cooler/Ceiling Light/Table Light]

Above all best results from Intent Parser depends on WER of STT Engine.

Great work btw, I am now going to start using it !!!

Hey Guys, thanks for the excitement!

To answer I think both your question, Mycroft proper has a concept of skills, or “things I can do”, and the invocation of those skills is an intent. The intent parser will be defined by the skill developer. There will be some skills/intents that are part of Mycroft proper, and there will be some platform level intents (telling mycroft to go to sleep, for example). The rest of the intents will be defined by developers, and mycroft will make a best-effort to determine the appropriate intent for a given query, and route it back to the skill that registered the intent parser.

As for public-domain or out-of-box intents, I’ve been thinking about that a bit today. Right now, the samples are all we’ve given to the world, and we expect people to create their own and populate them with data as they see fit. I imagine some datasets will be dynamically created/maintained at runtime (think the list of a user’s pandora stations), and others will be static (like a list of US Cities, or top 100 business names). Some of these I can imagine Mycroft maintaining for the public, though your mileage will vary based on the skill/application you are developing.

If there’s interest, I think we could start corralling a set of public intents/vocabulary as part of adapt-intent-* projects, release/manage them independently, and allow for end developers to have an experience like the following:

pip install adapt-parser adapt-intents-music
adapt-vocab-downloader music en-US

and then start developing against a populated intent parser.

Thoughts on this?


To answer specifically your question about the multiple intents:

I think you’re using the multi_intent_parser example here, but I’m not sure, and I’m having trouble following your question.

The examples are very contrived. In reality, your engine will be populated with a music catalog as well if you’re hoping for higher accuracy, which will help Adapt to disambiguate between the weather intent and the music intent. I’m not familiar with the song you’re referencing, and I would guess that you have not registered it as vocabulary in your engine. If this is a contrived song name, I’d ask you to be more fair, but frankly song titles are extremely ambiguous :smile:

Since Adapt is a known entity parser (and not a statistical parser), without any indication that “tokyo weather masturi” is related to music, the weather intent would be tied with the music intent, and while adapt is deterministic,the winning result will be rather arbitrary. Adapt does also support returning multiple parse results, at which point your application could bubble up a response of “Which of these things did you mean?”

Sorry for any confusion, but the examples were intended to be instructional, and not drop in parsers for their domains.

1 Like

Take this example.

Home Automation Intent Parser.

Automation Phrases used will be like below:

  1. “Mycroft Set First floor Office Temperature at Twenty”
  2. “Mycroft Switch On Ground floor Kitchen Table Light”
  3. “Mycroft Switch Off Second floor Bathroom Heater”

Automation Intent Parser:

action_keyword = [“Set”, “On”, “Off”] —> Require
floor_keyword = [“Ground”, “First”, “Second”] —> Require
room_keyword =[“Kitchen”,“Office”,“Bathroom”] —> Require
device_keyword =[“Heater”, “Cooler” “Ceiling”, “Table”] —> Require

Q:- How Intent Parser will handle “Temperature at Twenty” ?
Other two phrases (2 and 3 ) looks ok, as entities are known.
Q:- How Automation Skill will get this number Twenty in json before setting office temperature?

Twenty is not a known entity, Do we need to registered another keyword with intent parser ?
temperature_keyword = [“one”, “two”…“fifty five”] —> Require / Optional

If we make temperature_keyword optional and Automation Intent Parser detect no Temperature Value in phrase, then it will definitely pass null to Skill in json for temperature value.
Suppose Automation Skill put a check on Temperature value (which is null) and ask again for Temperature,
user replied “set it at Twenty”,
In this case do we need another Temperature Intent Parser to parse Temperature Value?

Take another example :

A Voice Calculator

Calculate Five Thousand Divide by Twenty Five”

Calculator Intent Parser
calci_keyword=[“Calculate”, “Calci”] ------> Require
Operation_keyword=[“Multiply”,“Divide”,“Plus”,“Minus”] —> Require

What about “Five Thousand” and " Twenty Five" ?
How Intent Parser will deal with Numbers ?

Stock Exchange Intent Parser

“BUY Thousand Share of APPLE Limit two hundred point five
“SELL Twenty Lot of ORACLE Market”

What about “Thousand” and " two hundred point five" ?
How Intent Parser will deal with Numbers ?

Adapt doesn’t yet deal with numbers! At least, not in any helpful way. There’s an open task in our JIRA instance about making datetimes a first order citizen, and numbers are an obvious additional case. Right now (with the exception of the tokenizer), Adapt doesn’t require any localization, and converting from phrases to numerals (and vice versa) is something that will vary from language to language. I don’t have a good answer for you right now as to what the correct direction for this is.

In the short term:
You can specify a list of reasonable temperatures that are specific to your skill (twenty is ok, one hundred is death)
You can specify a regex entity that extracts numbers:
"?P(<Temperature>\d+) degrees"

Since (at the moment) this is primarily a command and control interface, the vocabulary sets for these skills shouldn’t be particularly large.

Good Work, but What are the differences between and ?
Why not use and improve what already exists?


Aside from the technical differences, which I wouldn’t likely be able to provide too much light on, just quickly having a look at the code on github, I’m unable to find where or how they handle parsing of the requests. Do you know if their intent parser or voice request parser is open source? My first impression is that they provide SDKs for various programming languages, but their implementation isn’t actually opensource.

1 Like

The parser includes code that prefers wider parses (covering more of the utterance) than smaller parses. So if “tokyo weather matsuri” is known as being a song title, it should recognize it. Unfortunately, it can not work out that this is a song title just because it appears in front of the word “song” and after the word “play”, as best I can make out.

@Raidptn Adapt is open source and not reliant on a 3rd party service. Even if offers free service to open source project - you are still beholden to their service being online and available for the lifetime of any project you base upon it.

This is a correct interpretation of the code. The “Adapt-y” way of implementing this is to have an index of song titles that you’d expect to recognize. A good implementation would be to get song titles from the user’s media library. A less good implementation would be to go to freebase and get a list of all songs ever written. I do not recommend the latter; people have written a lot of songs :slightly_smiling:

I tried that on my own attempt at doing what Mycroft is attempting. The trouble is that the people who provide track data are not very cooperative. One of my tracks has a title tag value of

 "Gimme! Gimme! Gimme! (A Man Af"

And my library has over 3,500 tracks in it. (not all ABBA songs) So I had to add an additional table relating a special “pronounced title” string with each track, and populate that with algorithmically simplified title strings. I removed anything after a left-paren and converted all punctuation to space. Then I was faced with the pronounciation dictionary not having “gimme” in it.

Yup, these are the problems! I’ve done the same thing on past projects; music is a particularly dirty data set. The lack of gimme in your pronunciation dictionary is something that should be resolved with a high-quality dictation speech recognizer. There’s then the issue of using, for example, the english recognizer, and trying to recognize the names of german songs.

Long story short, this stuff is hard. But hard is fun!

A way around this is not to try to select individual songs by voice command. Instead define nicely named playlists for various purposes. “Mycroft, play my christmas album.”

Another approach is to implement a “fuzzy match” algorithm that tries to match the text output by the recognizer against known category strings using something like a SounDex hash. This requires using specified grammars with wild cards like “Play the song [WORD…] .” or using “hamming distance” matching. A general “sounds most like which of these” match filter might be generally useful to several behaviors.

My library includes songs with titles like “川の流れのように”. And Japanese has phonemes that do not exist in English, and vice-versa.

Possible solution:
For this you could go through the music library and try to detect what languages are present, (language detection need be run only once, each time the music library is modified), or some other way to obtain a list of languages.
Then whenever the user asks to play a song, the spoken song title could be transcribed with all of the language recognizers in the list of languages the music library contains. The transcription with the highest confidence is then chosen, and that is the title that is searched for.

Would also be nice for persons whose first language is not English, but have English music in their musik library.

also: “99 Luftballoons” ftw

What about a json skill parser ?

I think we could write most skills with just a few jsons.
Jsons for entities, one for regex and some properties.

Therefore we could easily write skill without touching any python.

Edit : maybe not a good idea…

Hi, having fun with adapt.

Does it support well UTF-8 ? i tried to put a “é” in a keyword and I have weird issue in json output

“Delay”: “10 minutes”,
“intent_type”: “timerIntent”,
“confidence”: 0.4827586206896552,
“timerKeyword”: “pr\u00e9viens”, ---- instead of “préviens”
“target”: null

Other question : is there a french tokenizer ? for words such as “j’ajoute”, I would like “ajoute” to be a word (a keyword actually) but it doesn’t work =)

I think french tokenizer is pretty similar to the english one except for this quote rule (which has exceptions such as words like “aujourd’hui”).

Other question :slightly_smiling:
I tried registering this regex entity :


But it doesn’t work as it should, any idea ?