Since public launch in June, the Alexa Skills Kit (ASK) has helped developers around the globe quickly add new voice experiences to Alexa-enabled devices like the Amazon Echo and Amazon Fire TV. I’ve had the pleasure of meeting many of you and have seen first-hand a skill go from an idea to running on a device in only a few hours. Most of us start out with simple “telling” style skills that give customers a quick reference point, like “Alexa, tell me a random fact.” Then we quickly want to create more complex skills that are more engaging for end-users. This post combines feedback and best practices from my learnings, our Alexa voice design experts and countless others who have given feedback and input (including many Echo customers) on the types of interactions that work best.
While you should plan to monitor and tune your own voice interactions based on the ways that users interact with your skill, the tips in this post will hopefully improve the usability of your skill from the start. It is our hope they will give you a better engaged set of customers and help grown your own voice experiences moving forward.
Consistent user feedback has shown Alexa is a delight to use because:
As you look to create your own skills you should ensure all three of these core user experiences are met.
The hierarchy of utility for Alexa differs greatly from a visual user interface. In fact, they tend to often be in opposition. Browsing is great for a visual interface, but challenging for a voice user interface. When choosing the use case for your skill, try to aim for the upper end of the voice utility spectrum.
As you look to expand the capabilities of your skill you should try and move in the direction of the higher utility. For example, while you may start out providing a quick lookup of information, giving the user the ability to then search on topics of their own interest (or even learning automatically the types of lookups a user does the most) will provide a much better experience and lead to better interactions with your user.
Voice user interfaces work well when they are focused, and give quick responses. We suggest you start with a primary use case that both communicates your business case, but is also a clear winner for a voice user interface. Let’s do one thing well, and add in capabilities allowing it to get smarter over time. This follows the current model we have with Alexa, she is learning new things over time including your skills!
For Example: Let’s say you want to build a skill for traffic. First, identify your primary use case. Although, in theory, you could give the traffic information for the distance between any locations, it would be highly complex to successfully create a skill that could recognize any location. So let’s identify the primary use case for traffic: From your home to your office. If we first gather the user’s home/office locations, then the next time they ask for traffic, we would be able to give them the information they seek quickly and easily.
Here is an example of how a user would interact with this skill:
First Time Launched:
Alexa, launch Travel Buddy
Hi, I’m travel buddy. I can easily tell you about your daily commute. Let’s get you set up. Where are you starting from?
Philadelphia
Ok, and where are you going?
Boston
Great, now whenever you ask, I can tell you about the commute from Philadelphia to Boston. The current drive time is five hours and twenty-three minutes. There is an accident on I95 near Hartford.
Alexa, launch Travel Buddy
Your commute is currently five hours and two minutes.
Future Launches:
Alexa, launch Travel Buddy
Your commute is currently five hours and two minutes.
The experience of using your Alexa skill should allow users to not have to think about what to say and allow them to not remember how to say it. They should be able to converse with Alexa just as they would another human. All they need is a rough idea of what Alexa can do (e.g. playing music, setting a timer, etc.), and they just ask her to do it. This is the real value of voice interface, but this value can quickly erode in a skill that forces users to interact in unnatural ways. Here are a few examples of what a poor interaction might look like:
Poor: Alexa, ask [davefacts] for a fact when the fact is of type davefact.
Better: Alexa, ask [dave] for a [fact].
Poor: Alexa, ask [transportation service alerts] for the [current status] of [the monorail A].
Better: Alexa, ask [trafficbuddy] about [monorail A]
Poor: Alexa, ask [developerinfo] for a [developer info].
Better: Alexa, ask [developerinfo].
You should try to remove artificial skill syntax and make interactions within your skill as natural as possible. Allowing your users to make simple requests without having to think about the format those requests should be in, will create a much better experience.
In the core Alexa experience, most requests are understood and acted on. The same experience should be provided within your own skill without numerous attempts to invoke your skill failing for the end user.
Currently, the biggest contributor for requests to your skills not being consistently understood is a lack of sample utterances in your interaction model. For example, when users try to invoke a skill in a one-shot utterance, it will often not recognize what they are saying and either error out give a response in error. When this happens, it’s unclear whether the problem was with your skill or the core functionality of Alexa and overall will provide a poor experience. When skills do not work as consistently and reliably as the core Alexa experience, users will become frustrated.
In the past the major reason for a lack of sample utterances was the sheer amount of manual work it required to generate them. A common skill may have as many as 100 utterances or more for any given Intent. With the introduction of Custom Slot Types and new Built-In Intents the need to generate numerous sample utterances has been removed and in some cases is not required at all! For more information about designing good Intents and Sample Utterances check out the Alexa Skills Kit Voice Design Handbook.
An Alexa skill should provide adequate error handling for unexpected or unsupported utterances. A user should never be exposed directly to a skill’s error handling. Instead Alexa should respond with a request for more information from the user or simply that she is unable to do the current task. When an error does occur it should be clear to the user what went wrong and where it occurred. The combination of the wake word + invocation phrase + the intent allows for any combination of errors, and it’s difficult for the user to know which one failed. Things can get especially confusing if a skill terminates the interaction without any indication to the user what went wrong (Did Alexa not hear you? Is the skill broken? Did you say the wrong invocation phrase?). The natural thing for the user to do is try again, but it becomes tedious to attempt to remember how to re-invoke the skill, only to potentially fall into the same trap since you don’t know what went wrong the last time.
Since Alexa will not be doing any client side checking of slot values being sent with your Intents, you should check for missing values and value types server side within your service. If you find any missing information you should respond to the Alexa service with a reprompt inside the OutputSpeech object containing a clear message about what was not understood and what information the user needs to provide.
For more info on best practices for designing an effective voice experience, check out the Voice Design Best Practices document. If you’re interested in learning more about designing voice interactions, check out the following:
-Dave (@TheDaveDev)