About Voice Interaction Models


Every Alexa skill has a voice interaction model that defines the words and phrases that users can say to Alexa to make the skill do what they want. Alexa supports two types of interaction models:

  • Pre-built voice interaction model – Alexa defines the set of utterances for each skill type for you.
  • Custom voice interaction model – You define the phrases or utterances that users can say to interact with your skill.

These voice interaction models are available in the Alexa Skills Kit (ASK). For details about ASK, see What is the Alexa Skills Kit?

Pre-built voice interaction models

The pre-built voice interaction model gives you a set of predefined utterances that users say to interact with your skill. For example, to control cloud-connected devices, the user simply says, "Turn on the lights," or "Turn off the television." The skill accepts the turn on and turn off requests and responds when it has satisfied the request.

ASK offers pre-built voice interaction models for different skill types, such as the smart home and music pre-built models. For a complete list of skill types that use pre-built voice interaction models, see Index of Skill Types.

Use the pre-built voice interaction model

When you choose the pre-built voice interaction model, ASK defines the utterances and requests, called intents, for you. You just need to code your skill to respond to the predefined intents. To develop a skill with the pre-built voice interaction model, you design:

  • The name Alexa uses to identify your skill, if applicable. The user speaks this name, called the invocation name, when initiating a conversation with your skill.

  • The skill logic to fulfill the predefined intents.

Examples of pre-built voice interaction models

The functionality you want to implement determines the pre-built voice interaction model that you use. The following sections show utterances for two different pre-built voice interaction models.

Smart home skill interaction model

The following example shows a predefined interaction for turning on lights. Smart home skills don't require an invocation name.

User: Alexa, turn on the porch light.

Alexa: OK.

Based on the pre-built voice interaction model for smart home skills, Alexa recognizes the following:

  • The phrase turn on as part of the smart home pre-built voice interaction model.
  • The phrase porch light identifies a particular device, or group of devices, that the user configured and named in the Alexa app.

Alexa sends a TurnOn request, called the device directive in the smart home model, to the skill that controls the devices identified as porch light. The skill turns on the specified lights by communicating with the devices over the internet, and then returns a response indicating whether the request succeeded. Based on the response, Alexa gives the user an indication, such as an audio sound or speech, that the skill request succeeded.

Music and radio skill interaction model

The following example shows a predefined interaction for playing a song.

User: Alexa, play Poker Face by Lady Gaga on My Radio Player.

Alexa: OK, playing Poker Face by Lady Gaga.

Based on the pre-built voice interaction model for music, radio, and podcast skills, Alexa recognizes the following:

  • My Radio Player is the invocation name that identifies the skill to invoke.
  • The word play as part of the music, radio, and podcast pre-built voice interaction model.
  • The phrases Poker Face and Lady Gaga identify a song title and artist.

Alexa composes a GetPlayableContent request to the My Radio Player skill. The skill sends back the audio content for "Poker Face" by Lady Gaga.

Custom voice interaction models

The custom voice interaction model gives you the most control over the user experience. You design the set of words and phrases for every action your skill can perform to deliver an entirely custom voice experience. Custom skills give you flexibility and control over the skill design and code. You can integrate voice, visuals, and touch interactions into custom skills. The Alexa Skills Kit includes an Alexa Design Guide for best practices to follow when you design your voice interaction model and a library of built-in intents for common actions.

Design a custom voice interaction

When you design a custom voice interaction, you define:

  • The intents the skill can handle. The intents represent actions, such as planning a trip, that users can do with your skill. Each intent invokes specific skill functionality. Intents can have arguments, called slots, that collect variable values that your skill needs to fulfill the user's request.

  • The utterances users say to invoke the intents. For example, the user might say, "Plan a trip to Hawaii." The mapping of utterances to intents forms the voice interaction model for your skill. There can be a many-to-one mapping of utterances to intents because the model must define every way a user might communicate the same request to your skill.

  • The name Alexa uses to identify your skill. The user speaks the invocation name when initiating a conversation with your skill.

  • (Optional) The visual elements and touch interactions for Alexa-enabled devices with a screen.

  • The skill logic to fulfill your custom intents.

Examples of custom voice interaction models

The following sections show examples of custom voice interaction models, the user interaction with Alexa, and the subsequent request to the skill.

Custom travel planning skill interaction

An example custom voice interaction model for a travel planning skill might be defined as follows:

  • Plan My Trip as the invocation name that identifies the skill.
  • A NewTrip intent with a toCity slot, a travelDate slot, and an activity slot.
  • Slot values defined by AMAZON.US.CITY for the toCity slot.
  • Slot values defined by AMAZON.DATE for the travelDate slot.
  • Slot values for the activity slot, which includes biking, camping, hiking, ….
  • The phrases plan a trip and I want to go to that map to the NewTrip intent.

The user interaction with the Plan My Trip skill may go like this.

User: Alexa, ask Plan My Trip to plan a trip to Moab on Friday.

Plan My Trip: Moab, nice! What activity would you like to do?
User: Biking.

Plan My Trip: All set! You leave on Friday to go biking in Moab. Enjoy!

Based on the custom interaction model and the user interaction, Alexa recognizes the following:

  • The phrase Plan My Trip is the skill to invoke.
  • The phrase plan a trip corresponds to the NewTrip intent.
  • The word Moab fills the toCity slot.
  • The word Friday is the value for the travelDate slot.
  • The word biking is the value for the activity slot.

Alexa composes an IntentRequest to the Plan My Trip skill with the newTrip intent and the collected slot values. The skill processes the request and responds with the confirmation.

Custom game skill interaction

In the following example, a custom voice interaction model for a game skill is defined as follows:

  • Quick Trivia as the invocation name that identifies the skill.
  • A PlayGame intent with a category slot.
  • Slot values for the category slot, which include movies, geography, food, ….
  • The phrase yes that maps to the AMAZON.YesIntent built-in intent.

In this user interaction, the skill drives the conversation.

User: Alexa, open Quick Trivia.

Quick Trivia: Welcome to Quick Trivia! Are you ready to play?
User: Yes.

Quick Trivia: OK, please choose the category you want to play. You can say a category or select a category on the screen.
User: Movies.

Quick Trivia: Who directed….

Based on the custom interaction model and the user interaction, Alexa recognizes the following:

  • Quick Trivia is the skill to invoke.
  • Category is the slot the skill needs to play the game.
  • The word movies is the slot value that fills the category slot.

When the user invokes the Quick Trivia skill with no specific intents, Alexa sends a LaunchRequest to the skill. The skill replies with the speech for Alexa to say to prompt the user for the category. Then, Alexa composes an IntentRequest with the StartGame intent and the collected slot value for the category slot. The skill replies with the question and a list of possible answers for Alexa to say or display on an Alexa-enabled device with a screen. The game interaction continues in the same manner.


Was this page helpful?

Last updated: Jan 26, 2024