Build an Engaging Alexa Skill

Training Course

Module 2: Design an Engaging Voice User Interface

Welcome to module 2 of our introductory course on building an engaging Alexa skill. In this module, we'll discuss how to design a voice user interface for your skill.
Time required: 15 - 30 minutes
What you’ll learn:

  • How users interact with Alexa
  • Voice design concepts: utterances, intents, slots, interaction model and situational design
  • Characteristics of a well-designed voice user interface (VUI)
  • Key challenges of voice design

How Users Interact With Alexa

To create a voice user interface for your skill, you need to understand key voice design concepts. A user wakes an Alexa-enabled device with the wake word (“Alexa”) and asks a question or makes a request. For Alexa-enabled devices with a screen, a user can also touch the screen to interact with Alexa. 

Voice Design Concepts: Utterances, Intents, and Slots

To create a voice user interface for your skill, you need to understand key voice design concepts.

Wake word: The wake word tells Alexa to start listening to your commands.

Launch word: A launch word is a transitional action word that signals Alexa that a skill invocation will likely follow. Sample launch words include tell, ask, open, launch, and use.

Invocation name: To begin interacting with a skill, a user says the skill's invocation name. For example, to use the Daily Horoscope skill, the user could say, "Alexa, read my daily horoscope."

Utterance: Simply put, an utterance is a user's spoken request. These spoken requests can invoke a skill, provide inputs for a skill, confirm an action for Alexa, and so on. Consider the many ways a user could form their request.

Prompt: A string of text that should be spoken to the customer to ask for information. You include the prompt text in your response to a customer's request. 

Intent: An intent represents an action that fulfills a user'spoken request. Intents can optionally have arguments called slots.

Slot value: Slots are input values provided in a user's spoken request. These values help Alexa figure out the user's intent.

In the example below, the user gives input information, the travel date of Friday. This value is a slot of intent, which Alexa will pass on to Lambda for skill code processing.

Slots can be defined with different types. The travel date slot in the above example uses Amazon's built-in AMAZON.DATE type to convert words that indicate dates (such as "today" and "next Friday") into a date format, while both from City and to City use the built-in AMAZON.US_CITY slot.

If you extended this skill to ask the user what activities they plan to do on the trip, you might add a custom LIST_OF_ACTIVITIES slot type to reference a list of activities such as hiking, shopping, skiing, and so on.

Practice: How to identify slots for an intent

Look at the utterances in the table, and note the words or phrases that represent variable information. These will become the intent's slots.

 

Utterance Maps to
"I am going on a trip Friday." TRAVEL_DATE
"I want to visit Portland." TO_CITY
"I want to travel from Seattle to Portland next Friday." FROM_CITY, TO_CITY, and TRAVEL_DATE
"I'm driving to Portland to go hiking." MODE_OF_TRAVEL, TO_CITY, and ACTIVITIES


Advanced voice design tips: if your skill is complex and has a lot of back-and forth-conversation (multi-turn conversation), create a dialog model for the skill. A dialog model is a structure that identifies the steps for a multi-turn conversation between your skill and the user to collect all the information needed to fulfill each intent. This simplifies the code you need to write to ask the user for information.

Key Concepts: Interaction Model and Situational Design

Interaction model

Now that you know what the components of a skill are, it is easier to understand what an interaction model is. An interaction model is simply a combination of utterances, intents, and slots that you identify for your skill.

To create an interaction model, define the requests (intents) and the words (sample utterances). Your Lambda skill code then determines how your skill handles each intent. You can start defining the intents and utterances on paper and iterate on those to try to cover as many possible ways the user can interact with the skill.

Then, go to the Alexa developer console and start creating the intents, utterances, and slots. The console creates JSON code of your interaction model. You can also create the interaction model in JSON yourself using any JSON tool and then copy and paste it in the developer console.

Voice design

A major part of the experience is designing your skill to mimic human conversation well. Before you write one line of code, you should work really hard to think through how your customers will interact with your skill. Skipping this step will result in a poorly written skill that will not work well with your users.

While it may be tempting to use a flow chart to represent how a conversation may branch, don't! Flow charts are not conversational. They are complicated, impossible to read, and tend to lead to an inferior experience not unlike a phone tree. No one likes calling customer support and diving into a phone tree, so let's avoid that. Instead of flow charts, you should use situational design.

Situational Design

Situational Design is a voice-first method to design a voice user interface. You start with a simple dialog which helps keep the focus on the conversation. Each interaction between your customer and the skill represents a turn. Each turn has a situation that represents the context. If it's the customer's first time interacting with the skill, there is a set of data that is yet unknown. Once the skill has stored the information, it will be able to use it the next time the user interacts with the skill. 

With situational Design, you start with the conversation and work backwards to your solution. Each interaction between the user and Alexa is treated as a turn. In the example below, the situation is that the user's birthday is unknown and the skill will need to ask for it.

Practice: The script below shows how the skill “Cake Walk” asks the user for their birthday and remembers it. Later, it will be able to tell them the number of days until their next birthday and to wish them Happy Birthday on their birthday.

Each turn can be represented as a card that contains, the user utterance, situation and Alexa's response. Combine these cards together to form a storyboard which shows how the user will progress through the skill over time. Storyboards are conversational, flow charts are not.

Characteristics of a Well-Designed Voice User Interface

Uses natural forms of communication

When talking with a machine, a user should not be required to learn a new language or remember the rules. A machine should conform to the user's paradigm, not the other way around.

Navigates through information easily

Your skill’s VUI should offer an easy way to cut through layers of information hierarchy by using voice commands to find important information.

Creates an eyes- and hands-free experience

Voice interfaces should allow a user to perform tasks while their eyes and hands are occupied.

Creates a shared experience

Voice experiences let users collaborate, contribute, or play together through natural conversation. For example, a family could play a game together on an Alexa-enabled device.

Key Challenges of Voice Design

Humans have been learning, evolving, and defining language and norms for communication for thousands of years. However, the machines we interact with have had a much shorter time frame to learn how to talk with us. There are inherent challenges with voice interfaces, including context switching or ambiguity in the conversation, discovering intent, and being unaware of the user's current state or mood. For a good user experience, you should plan for these challenges when developing your skill.

The following videos show a few examples of how things could go wrong if you don’t carefully design a VUI for your skill.

In this example, the user provides all the needed information at once, but Alexa is unable to parse information provided all at once. This doesn’t mean that Alexa is unable to comprehend what the user says, but rather that the VUI of the skill is not properly or correctly designed to infer information from the natural way a person may speak.

In this example, Alexa fails to recognize that she already has the answer she needs from context. Again, the VUI design fails to infer information from the context of the situation and is rather rigid on getting the answer for a specific question. This can be quite frustrating to a user.

The two examples show it is important to design the VUI to be as similar as possible to a natural conversation that might take place between two human beings. A good VUI dramatically increases the ease of use and user satisfaction for any given skill.

Five Best Practices for Voice Design

Designing a good voice user interface for a skill involves writing natural dialog, engaging the user throughout the skill, and staying true to Alexa's personality. Consider these five design best practices to help you design an engaging VUI:

1. Stay close to Alexa's persona

Alexa's personality is friendly, upbeat, and helpful. She's honest about anything blocking her way but also fun, personable, and able to make small talk without being obtrusive or inappropriate.

Try to keep the tone of your skill’s VUI as close to Alexa’s persona as possible. One way to do this is by keeping the VUI natural and conversational.

Slightly vary the responses given by Alexa for responses like "thank you" and "sorry". Engaging the user with questions is also a good technique for a well-designed VUI.

Alexa should be helpful by providing the correct answer. The following is an example:

Do

Alexa: That's not quite right. One more try. What year was the Bill of Rights signed? 

User: 1986 

Alexa: Shoot. That wasn't it. The correct answer was 1791.

Don't

Alexa: That's not quite right. One more try. What year was the Bill of Rights signed? 

User: 1986 

Alexa: That's not correct. Let's move on.

Engage the user with questions and avoid ending questions with "yes or no?" The following is an example.

Do

Alexa: Do you want to keep shopping?

Don't

Alexa: Do you want to keep shopping: Yes or no?

2. Write for the ear, not the eye

The way we speak is far less formal than the way we write. Therefore, it's important to write Alexa’s prompts to the user in a conversational tone.

No matter how good a prompt sounds when you say it, it may sound odd in text-to-speech (TTS).

It is important to listen to the prompts on your test device and then iterate on the prompts based on how they sound.

Keep your VUI informal. The following is an example.

Do

Alexa: Getting your playlist.

Don't

Alexa: Acquiring your playlist.

If there are more than two options, present the user with the options and ask which they would like. The following is an example.

Do

Alexa: I can tell you a story, recite a rhyme, or sing a song. Which would you like?

Don't

Alexa: Do you want me to tell you a story, recite a rhyme, or sing you a song?

3. Be contextually relevant

List options in order from most to least contextually relevant to make it easier for the user to understand. Avoid giving the user options in an order that changes the subject of the conversation, then returns to it again. This helps the user understand and verbalize their choices better without spending mental time and energy figuring out what's most relevant to them. The following is an example.

Do

Alexa: That show plays again tomorrow at 9 PM. I can tell you when a new episode is playing, when another show is playing, or you can do something else. Which would you like?

Don't

Alexa: That show plays again tomorrow at 9 PM. You can find out when another show is playing, find out when a new episode of this show is playing, or do something else. What would you like to do?

4. Be brief

Reduce the number of steps to complete a task wherever possible to keep the conversation brief. Simplify messages to their essence wherever possible. The following is an example.

Do

Alexa: Ready to start the game?

Don't

Alexa: All right then, are you ready to get started on a new game?

5. Write for engagement to increase retention

Alexa skills should be built to last and grow with the user over time. Your skill should provide a delightful user experience, whether it's the first time a user invokes the skill or the 100th.

Design the skill to phase out information that experienced users will learn over time. Give fresh dialog to repeat users so the skill doesn't become tiresome or repetitive.

Do

First use:

Alexa: Thanks for subscribing to Imaginary Radio. You can listen to a live game by saying a team name, like Seattle Seahawks, location, like New York, or league, like NFL. You can also ask me for a music station or genre. What would you like to listen to?

Return use:

Alexa: Welcome back to Imaginary Radio. Want to keep listening to the Kids Jam station?

Don't

First use:

Alexa: Thanks for subscribing to ABC Radio. What do you want to listen to?

Return use:

Alexa: Welcome back. What do you want to listen to?

Great job! Bookmark this page and jump to the next module where you will learn how to create a first skill named “Cake Walk”.

The next four steps of this course are in Node.js. If you would like to follow the same steps in Python, please visit our Github which contains the same steps in Python.