I’m writing a skill that digs through my browser bookmarks and pulls out relevant matches to a spoken phrase. So for example if I say “Search my bookmarks for chicken” I want it to find all the recipes I have stored for chicken.
I thought that the smart thing to use here was mycroft.util.parse.fuzzy_match()
, but it’s not doing what I expected.
In a list of bookmarks with titles like Chicken Kiev
, Chicken Soup with Garlic and Sour Cream
, and Chicken Parm Lasagna
, the most relevant according to this function is Arch Linux
. (score: 0.5) That soup recipe has a score of 0.26!
Now I know that I could just do a search for the keyword, but that’d have its own problems like “Chicken Parm” wouldn’t match “Chicken Parmesean” for example.
What’s the “right” way to do this?
Nevermind, I think I’ve found what I need. the fuzzywuzzy package seems to do what I need:
choices = (
"Arch Linux",
"Chicken Parmesean",
"Gradma's Chicken Soup with Garlic and potato - somerecipesite.com",
"This is almost chiken soup"
)
process.extract("chicken", choices=choices)
Result:
[('Chicken Parmesean', 90),
('This is almost chiken soup', 77),
("Gradma's Chicken Soup with Garlic and potato - somerecipesite.com",
60),
('Arch Linux', 47)]
Yeah fuzzywuzzy is a great choice for fuzzy matching. The implementation in Mycroft is a poor-man’s version. I think the main reason it’s not used in mycroft-core were licensing issues.
Fuzzywuzzy is GPL-2. What’s Mycroft using that won’t play nice with that?
It’s under Apache v2.
I’m not a lawyer so I don’t know how / if it would work but that’s the reason I remember from way back.
1 Like
You may try RapidFuzz which produces similar matching results like FuzzyWuzzy but comes with MIT licence…
1 Like
We did run into this license issue with rhasspy aswell (and it was notoriously slow). Which is why we created RapidFuzz 