Editor’s Note: Skill testing is one of the most important things you can do to build high-quality voice experiences. Today we welcome a community expert in testing tools for voice—John Kelvie, founder and CEO of Bespoken—to share some best practices.
Developing for Alexa can be a lot of fun. There are so many opportunities to create innovative user experiences. The cutting edge is constantly evolving. And the reachable audience is immense, and always expanding.
When building skills, it is incredibly important to build high-quality experiences for users. These users will not come back if a skill does not open or fails quietly halfway through. And we may not be aware of any problems until a user writes a one-star review. This is not the ideal way to identify and fix bugs; there must be a better one.
And there is. Testing and automation are the solution. They help us deliver reliable skills for customers and a great user experience. Through testing and automation, we can offer consistently great experiences to our users. This blog will outline how to do this at a high level and also offer some practical steps to implement it.
We like to think of testing and automation as a layered cake:
Looks delicious, right? We see each step as essential to having a high-quality, well-running skill. We’ll go through each part one-by-one below.
The goal of manual testing is to see if the basics of a skill are working. It is going to simultaneously test the code, interaction model, and user experience.
It is not necessarily going to test any of these to incredible depth, but it will at least identify any glaring issues. And, luckily, there are a variety of tools to help with this. For testing using voice, we can use just about any Echo device. If you don't have one handy, you can use virtual solutions like echosim.io or Reverb.ai.
It’s also often convenient to test without having to actually speak. This is useful in a crowded area where it can either be difficult to be heard, or in an office-type environment where you do not want to disrupt others. For that, using the Bespoken Utter and Speak commands can be great, as well the Alexa Test Simulator, which provides a nice online interface for interacting with Alexa skills via text, voice, and visual display. There’s also the Alexa Skill Management API (SMAPI) and Alexa Skills Kit Command Line Interface (ASK CLI) with new capabilities to manage and test Alexa skills.
Depending on what we are trying to test, a combination of these tools is useful. The important part is to employ manual testing to prove out the essential aspects of the skill, especially the user experience and interaction model, as there is almost always some need for course correction from an initial voice design.
Automated unit testing is the next essential aspect to testing and automation. The goal of unit testing is to see if the skill code is working correctly. To that end, we write unit tests to verify each intent, and each major piece of functionality.
There are many popular frameworks for unit testing. For Node.js/JavaScript projects, Mocha and Chai are two of the most popular. Mocha is a unit test framework (it actually runs the unit tests), while Chai is an assertion framework (it provides functions for comparing actual and expected results).
Here is an example of a test script written using Mocha and Chai:
This example uses Bespoken’s Virtual Alexa component. This tool simulates the Alexa Voice Service and creates JSON objects that are sent to our skill. There are several Alexa unit testing frameworks available that can be used. The key things to keep in mind when choosing a unit testing framework are:
Once we’ve written the unit tests, it’s relatively easy to get code coverage setup, and that makes our investment in unit tests all the more valuable.
For the same project, here is an example of a code coverage report on the core skill handler in AWS Lambda:
The lines of code highlighted in red are the ones that are not being tested. Other lines in green are being tested, and the numbers in green next to each line represent how many times they ran in a test (1x, 8x, etc.). Just running a line of code does not guarantee it is working perfectly. What is guaranteed is that lines of code that are not tested are NOT guaranteed to work, and are more likely to cause issues. Code coverage is the starting point, not the ending point, but it’s a great map for finding problem areas in the code.
Additional note: This example uses Istanbul/NYC, but there are many other code coverage tools out there. And there are many hosted tools available as well, which will allow us to see visual reports on what is happening with our code, now and over time.
The third pillar of automated unit testing is continuous integration. Continuous integration (CI) works by tying into the source code repository and automatically running unit tests whenever new updates to the code are made on the repository. There are many popular CI tools, such as Travis CI, CircleCI, and Jenkins, so you can take your pick. They are typically easy to setup, and will “intuit” things about the project (such as what type of programming language it uses and how to run tests within it). This auto-configuration makes them even easier to work with.
The continuous integration service is responsible for bringing our source code, unit tests and code coverage together, including alerting the developer when there are issues. These issues can manifest with either tests failing or new code that is insufficiently tested. And once configured, there are a ton of helpful tools out there to perform ongoing checks to ensure code quality.
End-to-end testing is another important part of the equation. Counterintuitively, it’s almost the opposite from unit testing, as it puts the emphasis on testing the system as a whole as opposed to testing pieces in isolation.
We can perform end-to-end testing using scripts written with components such as Bespoken’s Virtual Device SDK. The virtual devices allow for interacting with Alexa programmatically, so that you can perform full-cycle Alexa testing without speaking. The Skill Management API (SMAPI) and the skill testing API in particular can also be very useful for automating end-to-end tests.
The end-to-end tests, once constructed, will test everything from:
Rather than the narrow focus of the unit test on the correctness of our code, instead we are looking at the behavior of the skill as a whole. The end-to-end tests should ideally detect problems everywhere from an S3 misconfiguration to non-performant Lambda.
End-to-end tests should be wrapped into a continuous deployment (CD) process. Continuous deployment is the process of automating deployments to execution environments. It is important to code quality because:
It is in combining end-to-end testing with CD that automation becomes particularly powerful. With it, we can move code between environments, with thorough checks at every stage, and have confidence that our system is working. And if for any reason it is not, we are going to know right away.
The services used for continuous deployment are frequently the same as continuous integration, but how they work is usually quite a bit different. Here is a typical flow:
We can further enhance the process by automating rollacks in the case of issues, automatically notifying the QA team to perform manual QC processes, and so on. But these pieces are the essential parts, and thanks to the ASK CLI, it is relatively easy to implement all of this automation today. Take a look at the Bespoken Guess The Price skill to learn how it can be done.
The last topic we are going to cover is testing on an ongoing basis. We’ve gone through all this effort to setup a fully automated and testable Alexa skill—why not continue to leverage it once the skill is published? After all, we do not just want to make sure the skill is working while we are developing it; we also want to make sure once our skill is in the wild that it is still working great.
There are a variety of monitoring tools and approaches available. Some of the CI/CD pieces mentioned in the previous section can work for this. Additionally, if there are effective unit tests and end-to-end testing scripts in place, we can leverage these for an ongoing test. Take a look here to see how we do it at Bespoken. Tools like Loggly, DataDog, and NewRelic can also be extremely helpful for keeping an eye on skills. Also helpful is Amazon CloudWatch, which allows for setting alarms and notifications around a host of conditions.
We’ve covered a lot of ground in this blog. Hopefully, you’ve come away understanding how to:
While this post should provide an overview, we hope this only whets your appetite for a deeper dive on testing and automation. Reach out via email with questions or to learn more.
Developers with a published Alexa skill can apply to receive a $100 AWS promotional credit and can also receive an additional $100 per month in AWS promotional credits if they incur AWS usage charges for their skill, making it free for developers to build and host most Alexa skills. Learn more and apply.