Mycroft’s Take on the Voight-Kampff Test

ChrisVeilleux · April 23, 2020, 10:03pm

Originally published at: https://mycroft.ai/blog/the-voight-kampff-test/

No, we don’t foresee Mycroft software running in replicants any time soon. But it is a fun name for a new test suite we have been working hard at over the past couple of months.

User experience is important to us. It suffers when a change is made to one part of the software that inadvertently causes breakage elsewhere in the stack. These issues can be hard to find using manual interaction with a device.

Our goal with this test suite is to build an automated way to ensure a user’s interactions with the voice assistant software are always meeting expectations.

Voight Kampff Integration Test Framework

We chose Python Behave, a Behavior Driven Development (BDD) framework for this project. The primary benefit to this framework over a unit testing framework like pytest, is that it focuses on behaviours and expectations using a very English-like representation of the test conditions. This provides a clear test and acceptance criteria that anyone in our team and the broader Community can understand and contribute to.

Voight Kampff tests are split into Features. Each Feature may have one or more Scenarios, and each Scenario will have multiple Steps.

Taking a quick example of a Weather Skill. A Feature of this Skill might be the ability to report the current weather conditions. A Scenario within that Feature is that a user asks for the current weather. The Steps to test this Scenario might then be:

Given an English speaking user
When the user says “tell me the weather”
Then “my-weather-skill” should reply with “Right now, it’s overcast clouds and 32 degrees.”

As you can see this is quite simple to understand, and enables anyone to participate in running and expanding our test coverage.

How do I get started?

The first iteration of Voight Kampff will be included in our next release of Mycroft-core (v20.2.2). If you’re on the dev branch, you will already have access to it.

This is all part of our testing road map, which we’ll outline in a future post. Currently it injects messages representing the output of the speech-to-text step into the system. Then it inspects the messages that would be fed into the text-to-speech step. Future iterations will expand the scope to include the speech-to-text and text-to-speech engines with the aim to create complete end-to-end tests.

If you want to learn more about Voight Kampff, including how to use these tests for your own Skills, check out our expanding documentation.

brrn · April 23, 2020, 11:10pm

Looking nice! I have a few questions though: how is this different from the testing now, and will this replace the current testing?

gez-mycroft · April 24, 2020, 2:17am

Longer term the current system will likely get assimilated by Voight Kampff. It does some things that VK can’t yet handle like mocking and we’ll re-use those existing components where it makes sense to do so.

The biggest difference I think is the shift in framing and simplifying the test structure. One aim of this is to make it so that everyone can contribute to improving our test coverage. It helps us build a common language between all members of the community regardless of their technical ability. We also want to push ourselves and others to prioritize adding a test when we discover something that isn’t working as expected.

To that end VK is much easier to write more tests in. We’re already running 10x the amount of tests on the core Skills than we were with the previous system. It’s not that the old format can’t, it’s just onerous and doesn’t get done. It’s also harder to see what is and isn’t getting tested unless you really know the Skill. VK makes it much easier to write and to read the tests so you can see what’s happening quickly. Making it easier, we hope will aid adoption.

Another important area we’ve been focusing on is how this fits into our CI/CD pipeline. The old system tests a new Skill against the default Mycroft Skills. VK is more flexible, letting us test against any range of Skills from the Marketplace.

Finally this is setting us up to expand VK to cover greater degrees of integration. Currently it’s injecting utterances and catching responses from the messagebus, but with the structure of Step files it’s much easier to modify these to incorporate the STT and TTS components ie feeding in an audio utterance and observing the audio response.