Amazon Alexa Voice Design Guide

What Users Say

Making sure Alexa understands what people are saying

Human conversation is about exchanging meaning in ways that make sense in the current situation. Expressing and extracting meaning is not as simple as it may seem, and you’ll need to design conversations between Alexa and your customers carefully and intentionally. A great voice experience allows for the many ways people might express meaning and intent.

Conversational UI consists of turns starting with a person saying something, followed by Alexa responding. This is a new form of interaction for many people, so make sure that you’re aware of the ways in which users participate in the conversation so that you can design for it.

For more information about intents and utterances, see this video.

Finish designing before you build

Think about what you want the person using your skill to experience and feel. Once you’ve identified the purpose of your skill, written scripts, and laid out flows, move on to designing intents and utterances.

Identify intents

Intents represent what users can ask your skill to do. Your skill might help plan a trip, get a status, tell a joke, or attack a monster—these are intents. For guidance on determining intents for your skill, see the design process.

Avoid assuming that people will say precisely the words that you anticipate for an intent. While the user might say “plan a trip,” he or she could just as easily say “plan a vacation to Hawaii.” To make sure your skill performs well for people, provide a wide range of sentences, phrases, and words people are likely to say.

The following are examples of utterances corresponding to the intent PlanMyTripIntent:

  • “J’aimerais faire un voyage”
  • “Commençons à planifier un voyage”
  • “Planifier un voyage”
  • “J’ai besoin de vacances”

Example of  flow

Use the built-in intents

Every Alexa skill needs to include the ability to cancel, stop, and offer help. For these and other common intents like repeat, play, and next, use the built-in intent library. Built-in intents are already configured so that Alexa can recognize the corresponding utterances. For example, with the help intent, you don’t need to specify the ways a person might ask for help. You can also extend built-in intents if your skill needs to react to additional things people might say.

Identify utterances

An utterance is what a person says to Alexa. Utterances are made up of keyword commands, natural speech sounds like filler words, and slots for information that varies. One of the most important aspects of designing a voice experience is defining the range of what people might say.

To help ensure a good experience, provide examples from complete commands all the way through incomplete and ambiguous fragments. To make sure you have coverage, include subtle variations and even mispronunciations. For example, include “arrangement” and “bouquet” when talking about flowers even though they have similar meanings.

One-shots: A one-shot utterance is given all at once and fully satisfies what is needed to activate an intent. They can be used to start a skill, and used within a skill.

Example

Je partirai de Seattle vendredi prochain pour faire de la randonnée à Portland.

Partial information: Users frequently offer a command or request and include an incomplete set of variable information needed for the intent. This will lead to a multi-turn situation to collect the rest of the information.

Example

J’aimerais faire de la planche à voile près de Rooster Rock.

Quand aimeriez-vous y aller ?

Handle over-answering

Occasionally, users offer more than one answer even when Alexa requests only one. If Alexa prompts for a departure date, the user may answer by providing the date and the departure city. The user might even provide other information that is needed like arrival city and activity, and not provide the date that Alexa requested.

Handling this situation well is important for conversational design. Learn more in the Dialog Interface Reference and the Plan My Trip tutorial.

Example

Quand aimeriez-vous y aller ?

Je quitterai Seattle vendredi prochain pour aller à New York.

Handle corrections from the user

Sometimes people make corrections when they know that Alexa got something wrong or when they change their minds. For example, a user might say something like “no,” or “I said,” followed by a valid utterance. Be prepared to handle these properly.

Example

Cela semble un voyage amusant. Vous allez faire de la planche à voile à Portland vendredi prochain et vous partirez de Seattle. Dois-je réserver ?

Non, je vais à Rooster Rock.

Ah, j’ai compris. Vous allez à Rooster Rock vendredi prochain pour faire de la planche à voile et vous partirez de Seattle. Prêt à faire la réservation ?

Cover a wide variety of utterances

To make sure your skill performs well, a good benchmark is 30 or more utterances per intent, even for simpler intents. You don’t need 100% coverage, but more examples are better. Also, plan to continue adding utterances over time to improve skill performance.

Tips for creating varied utterances

Let’s say that the user said “I’d like to plan a trip.” Alexa then needs to gather the destination city, arrival city, travel date, and activity. This is a good opportunity to ask a family member or a friend to play-act so that you can simulate the conversation.

One-shot variants:
Think about ways that the user might say all of the slots in one utterance.

Example

Je veux aller faire de la plongée à Aruba vendredi prochain.

J’ai besoin d’un billet de Seattle à Aruba vendredi prochain.

Partial information variants:
Think about common ways that people might give you a couple bits of information. This is an important place to focus because it’s unlikely that people will say everything you need in one shot.

Example

Je veux faire de la plongée.

J’ai besoin d’un billet de Seattle à Aruba.

J’aimerais aller à Aruba.

Identify slots

Slots allow people to specify variable parts of an utterance, for example city or date. Slots are common in task- and information-focused skills. Design how the slots show up in utterances, and then choose slot values from the built-in catalog or provide your own slot values.

In the following example utterances, {toCity} and {travelDate} are slots:

  • “J’aimerais aller à {toCity}”
  • “réserver un voyage pour {travelDate}”
  • “planifier des vacances à {toCity}”

Using built-in slot values

Use built-in slot values whenever possible to help save time and improve accuracy. As appropriate for your skill, you can also extend some of the built-in values. For example, for a local region, you might extend AMAZON.Cities to include all of the local cities and towns. For more information, see the slot values you can extend.

Review slot values closely

While it might be easy to find, copy, and paste a list of words to populate slot values, make sure to review and edit the content. Incorrect slot values create errors in the skill logic and disrupt the user experience. Watch for the following:

Duplicate slot values.
Make sure to eliminate duplicate values.
Words not related to the slot.
Avoid including words that are unrelated to the slot.
Misspellings or incorrect punctuation marks.
For values that include an apostrophe, for example “child’s play,” make sure to use a straight apostrophe and not the curly apostrophe commonly inserted by text-editing software. See supported punctuation.