Editor’s Note: Skill testing is one of the most important things you can do to build high-quality voice experiences. Today we welcome a community expert in testing tools for voice—John Kelvie, co-founder and CEO of Bespoken—to share some best practices.
Testing Alexa skills can be a bit daunting if you are just getting started. Many people get stuck in a mode repeating the same commands and interactions with their Echo devices over and over. Stefania Sharp (Senior Solutions Architect, Amazon Alexa) and I recently presented at Alexa Live to show voice developers that there is a better, easier way - one that helps you stay focused on the fun stuff, while still building high-quality experiences that delight customers. In this post, I share a recap of what we shared during the session. You can also watch the full 45-minute session below.
All code is guilty until proven innocent. Put another way, code that is not tested is code that is not working. Our years of experience have not made us perfect coders, but they have taught us the wisdom of thorough testing. Murphy's Law is always there to remind us of this truth. To overcome this, we strive to make testing easy and fast so you can concentrate on the fun stuff, which is why our approach spans all the stages of the software development lifecycle.
Overall, the benefits of automated testing are:
Unit test scripts are a great way to identify and quickly correct issues while in development. And they run locally on your laptop, which helps you build voice apps even faster.
To start, we first need to install Bespoken Tools. For this, just enter $ npm install -g bespoken-tools in your command prompt. The tests are easy to write - you can see the full example we created during our session here, or jump to 11:21 in the video. Here is a quick snippet:
test: Run through a quiz with one player
LaunchRequest: Welcome to the Classic Movie Musts Quiz!
QuizIntent: OK. The category is people. here is your question. Who was it that said "Too much of a good thing is wonderful"?
AnswerIntent PEOPLE="Mae West": Your question is Houdini was played by who in the 1953 movie about the famous magician?
AnswerIntent PEOPLE="David Copperfield": The correct answer is Tony Curtis. Your question is Who did Steve Martin marry in 1986 and divorce in 1994?
AnswerIntent PEOPLE="Victoria Tennant": "*"
The left-hand side of the test (before the colon) is the intent we send to Alexa - the right-hand side is the expected response - it’s as simple as that!
Furthermore, unit tests work even better when combined with a debugger. This is one of the most effective ways to identify and fix issues with your Alexa skill. Check out this guide to walk you through debugging a unit test using the session we put together for Alexa Live as a guide.
End-to-end testing (E2E) verifies the entire system as a whole, from front-end to external services.
The core components of Bespoken’s E2E testing and monitoring are Virtual Devices. A Virtual Device works like a physical device, such as an Amazon Echo, but with the important difference that it exists only as software. Virtual Devices allow interacting with Alexa just by typing.
Text interactions written by the developer are transformed into audio using TTS services like Amazon Polly, this audio is sent to the Alexa voice service. The actual response is later converted back into text and is compared with the expected response the developer defined in the test. If there is a match, the test will pass. Otherwise, it will fail.
Here is an example interaction:
“Ask my skill to play a game” is what we are saying to Alexa - “here is your first question” is what we are expecting to get back in reply. E2E tests are very similar to unit tests, but in this case we are using real utterances (not just intent names) and it can do much more, testing even the most complex scenarios. To get started writing end-to-end tests, just go here. Or jump to 26:41 in the video to see a live example.
Continuous testing ensures that a skill, once it is live, works flawlessly. Bespoken’s Continuous Testing (also known as Monitoring) runs end-to-end tests on a regular interval to check that everything is working correctly with your production skills. Learn how to set it up here if you want to try it out.
The monitoring runs your tests twice an hour - as long as everything is working. But if there is an issue, you will be notified right away via email. It’s a great way to build confidence that your skill is working well, all the time.
Usability Performance Testing (UPT) makes sure that the AI components of your skill are working correctly. The main goal is to identify issues with the speech recognition and natural language understanding (NLU) behavior of Alexa and your skill, spot problems with how your customers are being understood, and fix them.
UPT works by sending a vast array of utterances at your skill. These utterances can be recorded or generated (via Amazon Polly or TTS) and will exhaustively exercise your interaction model. We provide detailed findings on where the problems are and how to correct them.
This testing covers scenarios around how well accents are handled, background noise scenarios, idiomatic phrasings, as well as more mundane but very common issues like typos and “sounds-alike” phrases. It’s a turnkey solution, and you can get started with our UPT tool by filling out this form.
We know testing is essential for building great voice experiences, and we hope you learned from our Alexa Live session how it’s not as daunting as it may seem, no matter what the pain point is or the type of testing that is needed. Feel free to reach out to us at jpk@bespoken.io or @sharpstef on Twitter.