Design an Engaging Voice User Interface

Welcome to Module 2 of the beginner workshop about how to build an engaging Alexa skill. In this module, you'll learn how to design a voice user interface (VUI) for your skill.

Time required: 15 - 30 minutes

What you'll learn:

How users interact with Alexa
Voice design concepts: utterances, intents, slots, interaction model, and situational design
Characteristics of a well-designed VUI
Key challenges of voice design

How users interact with Alexa

To create a VUI for your skill, you need to understand key voice design concepts..

How users interact

Voice design concepts: utterances, intents, and slots

A voice interaction model uses the following design concepts.

Utterances and Intents

Wake word: The wake word tells Alexa to start listening to your commands.

Launch word: A launch word is a transitional action word that signals to Alexa that a skill invocation will likely follow. Examples of launch words include "tell", "ask", "open", "launch", and "use".

Invocation name: To begin to interact with a skill, a user says the skill's invocation name. For example, to use the Daily Horoscope skill, the user could say, "Alexa, open my daily horoscope."

Utterance: An utterance is a user's spoken request. These spoken requests can invoke a skill, provide inputs for a skill, confirm an action for Alexa, and so on. Consider the many ways your users could form their requests.

Prompt: A string of text that you have Alexa speak to the user to ask for information. You include the prompt text in your response to a user's request.

Intent: An intent represents an action that fulfills a user's spoken request. Intents can optionally have arguments called slots.

Slot value: Slots are input values provided in a user's spoken request. These values help Alexa figure out the user's intent.

In the following example, the user opens a travel skill and gives input information: the travel date of Friday. This value is a slot value for a slot in a defined intent, which Alexa passes on to the Lambda function for skill code processing.

Slots can be defined with different types. The travel date slot uses Amazon's built-in AMAZON.DATE type to convert words that indicate dates (such as "today" and "next Friday") into a date format, while both from city and to city use the built-in AMAZON.US_CITY slot.

If you extended this skill to ask the user what activities they plan to do on the trip, you might add a custom LIST_OF_ACTIVITIES slot type to reference a list of activities, such as hiking, shopping, skiing, and so on.

Practice: How to identify slots for an intent

Review the utterances in the following table, and note the words or phrases that represent variable information. These will become the intent's slots.

Utterance	Maps to
"I am going on a trip Friday."	`TRAVEL_DATE`
"I want to visit Portland."	`TO_CITY`
"I want to travel from Seattle to Portland next Friday."	`FROM_CITY`, `TO_CITY`, and `TRAVEL_DATE`
"I'm driving to Portland to go hiking."	`MODE_OF_TRAVEL`, `TO_CITY`, and `ACTIVITIES`

Advanced voice design tips: If your skill is complex and has a lot of back-and forth-conversation (multi-turn conversation), create a dialog model for the skill. A dialog model is a structure that identifies the steps of a multi-turn conversation between your skill and the user. The dialog model uses these steps to collect all the information Alexa needs to fulfill each intent. This simplifies the code you need to write to ask the user for information.

Key Concepts: Voice interaction model and situational design

Voice interaction model

Now that you know what voice design concepts are, it is easier to understand what a voice interaction model is. An interaction model is a combination of utterances, intents, and slots that you identify for your skill.

To create an interaction model, define the requests (intents) and the words (sample utterances). Your Lambda skill code then determines how your skill handles each intent. Define the intents and utterances on paper, and then iterate on those intents and utterances to try to cover as many possible ways the user can interact with the skill.

Then, go to the Alexa developer console and start to create the intents, utterances, and slots. The console creates JSON code of your interaction model. You can also create the interaction model in JSON yourself by using any JSON tool, and then copy and paste the model into the developer console.

Voice design

To have an engaging skill experience, you need to design your skill to mimic human conversation. Before you write one line of code, carefully think through how your users will interact with your skill. If you skip this step, you will likely end up with a poorly written skill that will not work well with your users.

Visual design

The first Alexa devices had a microphone and a speaker. Now devices with screens are a rapidly growing segment of Alexa use. Devices with screens include Amazon's Echo Show models, Fire TV models, Fire tablets, the Alexa for PC app on Windows, and many other computing, communications, and media devices with Alexa Built-in.

Visuals can be as simple as some complementary text, a graphic, an interactive form, or an animation. While you should generally design so the user can interact with your skill without seeing or touching anything, when you give them complementary graphics and touch controls (especially for lists), you can make the overall experience with your skill more engaging.

In fact, Amazon studies show that user engagement with skills increases significantly when you include visuals. On average, multimodal skills based on Alexa Presentation Language (APL) have more than three times the amount of monthly active users, when compared to voice-only skills on multimodal devices. Skills that have APL video have nearly double (1.8x) the user engagement of voice-only skills on multimodal devices.

Situational design

Situational Design is a voice-first method to design a VUI. You start with a simple dialog that helps keep the focus on the conversation. Each interaction between your user and the skill represents a turn. Each turn has a situation that represents the context. If it's the user's first time interacting with the skill, there is a set of data that is yet unknown. After the skill has stored the information, it will be able to use it the next time the user interacts with the skill.

In the following example, a skill for a celebrity birthday guessing game asks the user to guess a birthday.

Practice: The following script shows how the skill named Cake Time gets a user started with the game. Later it tells the user how many birthdays they guessed correctly.

Situational design

You represent each turn as a card that contains the user utterance, situation, and Alexa's response. Combine these cards together to form a storyboard that shows how the user progresses through the skill over time. Storyboards are conversational, flowcharts are not.

Situational design 2

Characteristics of a well-designed VUI

Uses natural forms of communication

When a user talks with your skill's VUI, the user shouldn't have to learn a new language or remember the rules. The VUI should conform to the user's paradigm, not the other way around.

Navigates through information easily

Your skill's VUI should offer an easy way to cut through layers of information hierarchy by providing the user with voice commands to find important information.

Creates an eyes- and hands-free experience

Your skill's VUI should allow a user to perform tasks while their eyes and hands are busy.

Creates a shared experience

Your skill's VUI should let users collaborate, contribute, or play together through natural conversation. For example, a family can play a game together on an Alexa-enabled device.

Key challenges of voice design

Humans developed sophisticated spoken communication over thousands of years. The devices we interact with, however, have had a much shorter period to learn how to talk with us. There are inherent challenges when you design VUIs, including how to deal with context switching or ambiguity in the conversation, how to discover intent, and not knowing the user's current state or mood. For a good user experience, you should plan for these challenges when you build your skill.

The following videos show a few examples of how things could go wrong if you don't carefully design a VUI for your skill.

In this example, the user provides all the needed information at one time, but this skill was designed to process one piece of information at a time, not multiple pieces of information at a time. A better designed VUI for the skill would have anticipated that the user would have provided information both ways.

In this example, Alexa fails to recognize that she already has the answer she needs from context. Again, the VUI design fails to infer information from the context of the situation and is rather rigid on getting the answer for a specific question. This can be frustrating to a user.

The two examples show that it's important that you design the VUI to be as similar as possible to a natural conversation between two people. A good VUI can dramatically increase the ease of use and user satisfaction for any given skill.

Five best practices for voice design

To design an engaging VUI for a skill, you must write natural dialog, involve the user throughout the skill, and stay true to Alexa's personality. Consider these five design best practices to help you design an engaging VUI:

1. Stay close to Alexa's persona

Alexa's personality is friendly, upbeat, and helpful. She's honest about anything blocking her way but also fun, personable, and can make small talk without being obtrusive or inappropriate.

Try to keep the tone of your skill's VUI as close to Alexa's persona as possible. One way to do this is to keep the VUI natural and conversational.

Slightly vary the responses given by Alexa for responses like "thank you" and "sorry". Engaging the user with questions is also a good technique for a well-designed VUI.

Do

Alexa: That's not quite right. One more try. What year was the Bill of Rights signed?
User: 1986
Alexa: Shoot. That wasn't it. The correct answer was 1791.

Don't

Alexa: That's not quite right. One more try. What year was the Bill of Rights signed?
User: 1986
Alexa: That's not correct. Let's move on.

Engage the user with questions and avoid ending questions with "yes or no?" The following is an example.

Do

Alexa: Do you want to keep shopping?

Don't

Alexa: Do you want to keep shopping: Yes or no?

2. Write for the ear, then for the eye

The way we speak is far less formal than the way we write. Therefore, it's important to write Alexa's prompts to the user in a conversational tone.

No matter how good a prompt sounds when you say it, it may sound odd in text-to-speech (TTS).

Make sure to listen to the prompts on your test device, and then iterate on the prompts based on how they sound.

Then you can create a complementary visual experience for devices with screens that reinforces and enhances the verbal interaction.

The following example shows how to keep your VUI informal.

Do

Alexa: Getting your playlist.

Don't

Alexa: Acquiring your playlist.

If there are more than two options, present the user with the options and ask which they would like. The following is an example.

Do

Alexa: I can tell you a story, recite a rhyme, or sing a song. Which would you like?

Don't

Alexa: Do you want me to tell you a story, recite a rhyme, or sing you a song?

The user's response to Alexa's "Don't" question is going to depend on the way Alexa's speaks the question. Depending on pitch, the user could interpret Alexa's question as "Which would you like?" or "Do you want any of these?" If the user interprets the question the second way, they might say "Yes" or "No", answering a different question than what you wanted the user to answer.

3. Be contextually relevant

List options in order from most to least contextually relevant to make it easier for the user to understand. Avoid giving the user options in an order that changes the subject of the conversation, and then returns to it again. This approach helps the user understand and verbalize their choices better without spending time and energy figuring out what's most relevant to them. The following example shows how to list options from most to least contextually relevant.

Do

Alexa: That show plays again tomorrow at 9 PM. I can tell you when a new episode is playing, when another show is playing, or you can do something else. Which would you like?

Don't

Alexa: That show plays again tomorrow at 9 PM. You can find out when another show is playing, find out when a new episode of this show is playing, or do something else. What would you like to do?

4. Be brief

Reduce the number of steps to complete a task wherever possible to keep the conversation brief. Simplify messages to their essence. The following example shows how to reduce steps and simplify the message.

Do

Alexa: Ready to start the game?

Don't

Alexa: All right then, are you ready to get started on a new game?

5. Write for engagement to increase retention

You should build Alexa skills to last and grow with the user over time. Your skill should provide a delightful user experience, whether it's the first time a user invokes the skill or the 100th.

Design the skill to phase out basic information that users won't need after they gain experience over time. Give fresh dialog to repeat users so that the skill doesn't become tiresome or repetitive. The following example shows how to write content for customer engagement.

Do

First use:
Alexa: Thanks for subscribing to Imaginary Radio. You can listen to a live game by saying a team name, like Seattle Seahawks, location, like New York, or league, like NFL. You can also ask me for a music station or genre. What would you like to listen to?

Return use:
Alexa: Welcome back to Imaginary Radio. Want to keep listening to the Kids Jam station?

Don't

First use:
Alexa: Thanks for subscribing to ABC Radio. What do you want to listen to?

Return use:
Alexa: Welcome back. What do you want to listen to?

Wrap-Up

Now that you've learned how to design an engaging VUI, you can bookmark this page as a reference. In Module 3, you'll learn how to create a skill named "Cake Time."

Was this page helpful?

Provide feedback