Fundamentals of Voice Experience Design

A voice experience is a two-way interaction between a user and a system based upon the fundamentals of human conversation. For an effective conversation to take place between people, speakers should have a shared mental model and communicate their goals, questions, and intentions. The more that your voice experience can make use of human conversation, the less you have to teach users how to use the system.

When you design a voice experience between a user and a system, the role of the designer is to map out these conversations, delving into the user's needs and journey, as well as storyboarding out a full experience. Whether acting as a lone skill builder in a design phase or working as a designer with a team of developers, iterating on these designs helps make a compelling experience that users will love.

Voice experience design process

The process of designing voice experiences includes several phases: conceptualizing your voice-experience idea, drawing out your users' journey, designing the personality of your skill, creating storyboards, prototyping your designs, and testing your designs and iterating to make improvements.

Conceptualize your idea

Users are attracted to voice experiences because of their strengths. The following list shows the strengths of voice user interfaces (VUI):

  • Ease of use– Users are naturally familiar with voice communication as a medium of interaction with fellow humans. They already know how to talk to each other, and don't have to be taught basic interaction paradigms or elements such as forming sentences, asking for information, or answering basic questions.
  • Avoid the need for navigation– Voice offers a quick way to cut through layers of information hierarchy. Using a combination of voice commands, users can get what they want efficiently. For a well-designed voice interface, the ways in which users access content are numerous, flexible, and responsive.
  • Adapts to users’ needs– Due to its invisible information architecture, well-designed voice interfaces can be more flexible and adaptable to specific users and use cases, as well as new product features.
  • Enables multitasking– Voice experiences enable ease in multitasking. Far-field interfaces enable users to perform their tasks while their eyes and hands are occupied.


In addition, users look to skills to add value into their everyday life. Here are some good questions to ask yourself when conceptualizing an Alexa skill.

  • What is the value of the Alexa skill you’re making to a user?
  • If a user was to write a testimonial about their experience using the skill, what would you like them to say?
  • What problem is the skill solving?

Draw user journeys

A user journey is composed of the interactions a user has with your brand of products and the goals they complete within them.

Let’s dig in a little deeper into what that means.

Interactions are how a user steps through a product. On a website, these interactions are clicks through hyperlinks. On mobile devices, these are various taps and swipes. And through Alexa, these are user utterances.

Goals are what your user is attempting to achieve along the way with your product. Most of the time, a user is trying to achieve these goals in a simple, quick, frictionless way.

When you draw out a user journey, it should follow this format.

Journey Template

These user journeys shouldn’t be viewed as flow diagrams. The diagrams always go from one goal to the next and they never branch. The focus is not on showing all the possible paths a user might go down, but instead on the ideal paths that lead the user toward their goals.

Design your skill's personality

Alexa skills by their nature are two-way dialogs. A user speaks and your skill responds. Both the user and your skill are working together to achieve a goal. Now it’s time for us to get into the skill response side of the dialog. Instead of simply going straight into dialog writing, however, you should step back and come up with a clear picture of the personality of your skill. This personality is a combination of your skill’s spoken and visual identity. Your personality should feel like one singular voice to the user.

To help build out this personality, we have a few simple steps:

  1. Write testimonials Write out a few testimonials that a user would write about your skill’s personality and highlight the important words in those testimonials.
  2. Narrow down the personality traits Narrow down those important words. Choose three words that you believe are the most essential for the persona.
  3. Write a short description of your personality Write down a short description of your persona, giving it a name and using the three words chosen.
  4. Write some sample dialogs Write down three different quick dialogs between a user and your persona.


To help get you started, download the following quick reference sketch file.

Create storyboards

When you design a skill, there’s a lot to consider from both the user's and Alexa's perspective.
With a user, you need to design for the following elements:

  • User journey– Where has the user come from? Where are they going next? What are their goals along the way? What is the context in which they’re interacting?
  • Device type– Does a user have a Fire TV? Echo Show? Echo Dot?
  • User utterance– What is it that the user is saying to Alexa?
  • Touch input– Has a user tapped or swiped on a screen?


With Alexa, you need to design for the following responses:

  • Audio response– What is it that Alexa is saying? Are there earcons and background music?
  • Visual response– What is Alexa displaying on the screen?


To combine all these elements together into a single artifact, you use a design method called storyboarding. A storyboard is the design artifact composed of a user journey, screens, and scripts. Storyboards are helpful to organize and convey to others how the users will interact with the skill.

The following example is a storyboard for a skill in which someone can order cake from a bakery.

Storyboard

User journey

Customer Journey

The user journey section of the storyboard is where you give the context of all the goals of a user from the start until the end of their journey. The storyboard walks you through this user journey. Each frame of the storyboard focuses on a specific goal. In this case, the user goal is for a user to find a cake to order.

Screen

The screen section of the storyboard is where you give an example of how the screen will be displayed. When you first start out with your screens, don’t worry about laying them out perfectly, but instead focus on what will be the main content.

Screens

Script

The script is where you write out both the user utterances and Alexa’s responses. It might seem intimidating to begin writing out these scripts. Have the user’s goal (finding a cake) as your guide when you write out your dialog. Try to be as simple and straight forward as possible to reach this goal.

Script

To help you get started in storyboarding, see the Sketch files. In the files, you’ll find storyboards to fill in your own skill types.

Prototype your designs

A prototype is useful for communicating your design with stakeholders, as well as putting your concepts in front of users for feedback. After a few storyboards, you can use them as a guide to begin prototyping your design.

The following tools can help you make this process simple.

Adobe XD toolkit

Adobe XD

With Adobe XD, you can design APL screens more easily by using the Adobe XD UI Kit. You can stitch together the screens with interactions, such as tap, touch, and voice triggers. When you transfer between screens, Adobe XD enables speech playback and visual display. You can view the prototype within Adobe XD and on your Echo Show device. Simply say, “Alexa Open Adobe XD” to try your own device prototype.

Sketch toolkit

Sketch Toolkit

The Alexa Design System Sketch toolkit includes libraries and templates to design multimodal skills built with the Alexa Presentation Language (APL). These libraries and templates represent the code-backed Alexa styles and Alexa layout packages. The responsive templates and responsive components automatically adapt to different viewport profiles. Amazon updates the toolkit with every major release of APL, so you always have the most advanced tools for your design.

The downloadable toolkit includes the following features:

  • Alexa Design System library plugin– Sketch libraries are a collection of components and styles—such as list items, icons, and colors— to help build layouts and user experiences. This library includes Alexa Responsive Layouts that support device viewport profiles built with components and styles from the library.
  • Alexa Design System templates– This file includes full-screen Alexa Responsive Layouts for supported viewport profiles that are built using library components and styles.
  • Amazon typefaces– The templates and library use Amazon Ember Display and Bookerly typefaces, which are also available to download.

Amazon Polly audio files

Amazon Polly

Spoken words aren't the same as written text. As described in our article about designing your skill’s persona: spoken word can differ in tone, pitch, rate of speech, and stress on words. A voice can be soothing or startling. Using the Amazon Polly Text-to-Speech tool you can listen to what you’ve written in your scripts and download example Alexa responses. The Amazon Polly Text-to-Speech tool also allows you to use Speech Synthetic Markup Language (SSML) to add pauses and other speech effects to your speech output.

Test your designs and iterate

After you’ve built out a prototype, make sure to put it in front of actual people. It’s best if you can get the prototype in front of users who have no involvement in the product. If your budget allows, there are tools, such as usertesting.com (http://usertesting.com/), which can recruit users for you. If you lack sufficient budget, enlist family and friends. With testing, some user feedback is always better than no feedback.

User feedback can help you answer the following questions:

  • Have you designed the right user journeys? - Are users completing the goals you imagine? Are they completing them with the interactions you imagined?
  • Have you designed the right skill personality? - Do users find the skill’s personality relatable? Does it cause more friction than help?
  • Do users understand your visual language? - Do you have the right information on the screen for users to make key decisions? Do users understand what they can type or swipe?
  • Have you created the right prompts? - Do users understand what they’re capable of doing with your skill? Are users going down paths you didn’t foresee?

Testing early makes it so you have confidence that you’re making the right decisions before you start coding out a solution.

Best practices

Keep the following best practices in mind while you design your voice experiences.

Write for the ear, not the eye

Prompts for Alexa are heard, not read, so it's important to write them for spoken conversation. Throw out what you learned in school—sentence fragments, contractions, and ending sentences with a preposition are all acceptable if they sound natural in spoken dialog.

Be direct & unambiguous

Use unambiguous, direct, and clear language. Direct language helps a user know that the personality of the skill they are interacting with is cooperating with them. This direct language is evident in elegant and simple syntactic structures that are crisp, easy to parse and understand.

Match a variety of utterances

Don’t assume that users say the exact phrase that you anticipate for an intent. While the user might say, "Plan a trip," they also might say "Plan a vacation to Hawaii." To make sure your skill can respond to a variety of user utterances, provide a wide range of sentences, phrases, and words that users are likely to say.

Handle errors gracefully

Avoid error messages that only say that Alexa didn't hear or understand the user correctly. For example, “I didn't hear you.” This response causes users to repeat the same phrase that caused the error. Instead, add in information that is more helpful and be as explicit in your directions as possible.