Mycroft Translate - our language learnings so far

Originally published at: http://mycroft.ai/blog/language-learnings/

As you may have read on our blog in earlier posts, making Mycroft available in languages other than English is hard. Our release of Mycroft Translate - a platform to crowdsource translations of Skill dialog and vocabulary - has attracted the support and efforts of over 400 people who have contributed translations in over two dozen languages. We’re incredibly grateful to our Community for the enthusiasm you’ve all demonstrated in this large endeavor - thank you! Mycroft Translate is only one piece of the languages puzzle - you can read more about all the different components that need to be available before a language is available for the whole Mycroft Voice Stack.

We’d like to take the opportunity to provide you with an update on how Mycroft Translate is going, our language learnings, and the continuous improvement we’ve enacted based on Community Feedback.

How is Mycroft Translate going?

The initial statistics for Mycroft Translate are really encouraging - the progress that's been made in just a few short weeks is fantastic!
Number of registered users 403
Number of languages available for translation 42
Number of languages with some progress made on translation 27
Phrases available for translation 458,709
Phrases translated 91,593
The most complete languages are;
  • Deutsch (German)
  • русский язык (Russian)
  • español (Spanish)
  • français (French)
  • svenska (Swedish)

How could we make Mycroft Translate better?

As part of our dedication to continuous improvement, we implemented a feedback mechanism for new users of Mycroft Translate so that we could identify gap areas and prioritize actions to address them. As always, we're committed to open sharing of information such as this, so below you will find both summary data and machine-readable data for verification. To date, we've had 83 responses from emails sent to 299 registered users - a really representative sample size.

How do people find out about Mycroft Translate?

As expected, most people found out about Mycroft Translate through the blog. Our survey wasn't so great at capturing a significant category - newsletters - which we'll adjust here on in.

translate-survey-how-find-out

Download the machine readable data for this question (7.1Kb)

How much exposure had people had to Mycroft before assisting with translation efforts?

This one was a real surprise for us; most of the people who were assisting with translation efforts had no, or only a little exposure to Mycroft. This highlighted how important it is to provide people of all abilities - technical and non-technical - with pathways to contribute to the Mycroft ecosystem.

Download the machine readable data for this question (6.3Kb)

What languages are people interested in translating?

The answers to this question were pretty much what we had expected; very strong support for Romance and European languages, and very little demand for Middle Eastern, African or East Asian languages. We did highlight through this question that we didn't have any South Asian languages available.

Download the machine readable data for this question (7.2Kb)

Did we provide the language that people were interested in translating?

Our initial offering of languages for translation was chosen deliberately based on existing knowledge of our Community and anticipated demand. We pretty much got this right, but missed smaller cohorts of language speakers. Several additional languages were requested, covering South Asia; additional European languages, particularly Slavic languages, and also a request for an Indigenous Australian language to be made available for translation.

translate-survey-language-available

Download the machine readable data for this question (7Kb)

Freeform feedback

The freeform feedback provided identified several issues; some which we could take action on and others that would require more planning.
  • Context - the most frequent feedback we got was around context. Translations are difficult to provide without the right context. For example, the word "higher" has different meanings depending on whether it is in the context of turning the volume higher, marking an item as higher in a queue, or asking about the location of a mountain. The same is true in all other languages.
  • Gender - most Romance languages differentiate the gender of objects - nouns. For example, a "bridge" is feminine in French, and masculine in German. Gender becomes even more complicated in languages like Portuguese, where salutations such as "thank you" change depending on the gender of the speaker - "obrigado" if you are male, and "obrigada" if you are female. Languages like Welsh and Russian have an added level of complexity, where whole phrases change depending on the genders involved.
  • Formality - many languages - but especially those from cultures which are highly stratified - have different levels of formality in language, and this can change depending on the seniority of the participants.
  • Instructions - feedback indicated that we needed better instructions on Mycroft Translate
  • User experience - the design of the site is awkward and difficult to use - it has poor usability
Download the machine readable data for this question (10Kb)

Improvement actions

Some of the issues identified are easier to resolve than others.

Additional languages

We've recently added several new languages to the Mycroft Translate platform based on the information from the survey, including;
  • Russian
  • Slovenian
  • Korean
  • Yolgnu Matha
  • Bulgarian
  • Marathi
  • Telugu

Context

Context is very difficult to ascertain without installing Mycroft on a Device, installing the Skill which is being translated, and running through all the possible vocab and dialog files. For those who are technical and have some exposure to Mycroft, this may be possible, but our survey indicated that most people undertaking translations have little or no exposure to Mycroft. We need to put more planning into how we address context. One of the ideas we've discussed internally is to have a web-based version of Mycroft. This is a significant amount of development for an already-stretched team, so not something we can deliver in the short term.

Gender and formality

We're exploring programmatic approaches to gender and formality that we could implement in the mycroft-core and home.mycroft.ai accounts that would help to tailor the translation to both the gender of the speaker and to the "personality" that Mycroft is exhibiting. There are dependencies here with Persona, a part of the Mycroft platform still in development, that will allow adjustments to how Mycroft expresses personality - think cheeky, or serious, or bubbly.

Instructions

We've put together some handy instructions for Mycroft Translate - as always your feedback on them is welcomed to translate-admin@mycroft.ai.

User Experience

The Mycroft Translate platform is based on an open source product called Pootle, which is written in Django. We're currently tinkering with the visual layout and user interface of the Mycroft Translate platform in development. However, eventually, we want to integrate it into home.mycroft.ai so that you can use single sign-on, and have a less fragmented experience across the Mycroft ecosystem.

Other key developments

Automating the import process

We've recently automated the import process so that new Skills are added to Mycroft Translate soon after they are pulled into the mycroft-skills repo. This works for both new Skills, and Skills which are being updated. The delay between a new Skill being added to mycroft-skills, and being added to Mycroft Translate, should be less than an hour.

Automating the export process

If you are a Skill Author, you will likely see PRs being generated automatically with translations in them. We've already tested this with some default Skills, and languages that have more than 50% of their vocab and dialog files translated - and are pretty pleased with the results.

What's next for Languages?

We have plenty planned for languages! Our Languages Roadmap makes this a lot clearer, but our next key activities in this space are;
  • Sourcing additional language recordings to build voices in other languages (the TTS layer), using the still-in-development Mimic Recording Studio. This tranche of work requires that we also build out a 'corpus' of prompts for the Mimic Recording Studio in the target language. The prompts need to be natural-sounding spoken phrases. Know of a good source of phrases? Let us know at translate-admin@mycroft.ai.
  • Continuing to partner with Mozilla around their DeepSpeech Speech to Text (STT) product, and working with them to make this available in other languages, and to improve the accuracy of the available languages.
5 Likes