Mark II Update - October 2020

gez-mycroft · October 30, 2020, 3:03am

Originally published at: https://mycroft.ai/blog/mark-ii-update-october-2020/

TLDR; we got a new revision of the SJ201 & SJ202 boards created and the first of these are currently on their way to our dev team.

What’s changed

The big change in rev3 is that XMOS updated their firmware to enable I2S output. This allows us to use the XMOS chip as a single source, being a sound card for both recording and outputting audio over I2S.

As the current amplifier takes an analog input, we added an I2S to Line Out converter chip to bridge the gap until we switch the amplifier over to one with a direct I2S input. The change of amplifiers is planned for rev4 to the TAS5806 IC. This reduces parts, complexity, and cost. It also has the added benefit of better sound quality.

Front of SJ201 v0.67c PCB

Back of SJ201 v0.67c PCB

Added in rev3:

XMOS I2S output to UDA1334A (I2S to Line out IC)
ATtiny404
- Controls LEDs
- Tested controls from I2C from XMOS over USB – Works
- Wrote firmware for ATtiny
- In rev4, the ATtiny will remove the power on reset sequencing bits needed for the XMOS
Changed the position of the mounting holes
4 pin connector to SJ202 instead of USB connector. This allows a much simpler and stronger SJ202

Removed in rev3:

USB Sound card CM108B
- Removed complexity and didn’t work as expected
Removed 2 physical indentations no longer needed.

SJ201r3 Overview

Attempts to speed up production

In an attempt to speed up the turnaround time for rev3 we selected a local PCB house to produce this run of boards. They were more expensive but were theoretically going to be quicker than getting them shipped from overseas. That unfortunately did not work out as planned.

Thankfully we also ordered a small batch of single sided boards from our previous supplier in China as a backup. These ended up arriving first, however they are single sided and therefore required many parts to be manually added.

We’ll keep looking for ways to speed up the process however we can.

Enclosure code

Now that the hardware has firmed up, our software team has been able to get into the enclosure code. This connects Mycroft with all of the low level hardware systems so that you can do all the things you’d expect from a smart speaker – like changing the volume, muting the microphone, or getting visual feedback from the LEDs on top of the device.

This code will continue to be iterated on over time, but the core foundations of that code has been written.

Mechanical Design

We have completed the first round of 3D printed housings for the enclosure. These need to be tested for acoustic and thermal properties before we can share with the community and distribute to our internal team. The intention is to keep the 3D printed design as close to the final injection molded design as possible. This will allow anyone who buys a board only dev kit in the future to print their own enclosure. We aren’t sharing images yet because we are exploring options to protect the Mycroft brand from potential misuse in the future. However, it will still be open source hardware, and we will share CAD data.

Until we can further test the new 3D printed housings we are continuing to use our simple laser cut enclosures (SJ230) to get an approximate experience to the final assembled product.

Precise Tagger

Another piece of work our team has been focused on is bringing our Precise data tagger back online. Whilst this is not Mark II specific, it is a critical piece of our technology that is necessary for the Mark II to be usable by the majority of people.

Our current wake word detection is frankly terrible for anyone that isn’t an English speaking male, most likely from the US. For women and children in particular, the current model is not accurate enough. This primarily comes down to data. We have an enormous amount of wake word data from men in the US. Which means that the wake word detection works great for that group of people.

To address this, we have been looking at ways to better balance our training data. We’ve had some promising early results with our existing data, but the biggest limitation is that it has only been categorized as either a wake word, or not a wake word. We don’t know whether the speaker sounds young or old, masculine or feminine, what type of accent they have, or whether there is other background noise. Balancing these characteristics should significantly improve our wake word detection for the broader population.

Which brings me back to the Precise Tagger: this is a web interface that allows anyone in our open source Community to help make wake word detection more accurate. It presents a short audio sample and asks you to answer a few simple questions based on how the speaker sounds and other noise that may be present.

The first prototype of the new Precise Tagger is being code reviewed and we hope to release to the Community for testing shortly.