About Alexa Conversations

Alexa Conversations (Beta) is a deep learning–based approach to dialog management that enables you to create natural, human-like voice experiences on Alexa. Alexa Conversations helps skills respond to a wide range of phrases and unexpected conversational flows, and gives skills the conversational memory to sustain long, two-way interactions between Alexa and the user.

You provide sample dialogs in the dialog editor, and then annotate the sample dialogs with dialog acts, utterance sets, and responses that contain audio and visual elements. You also specify when to invoke APIs and which arguments to use so the dialog manager can gather the information to trigger your skill code. During the course of the conversation, your skill responds to Alexa to fulfill the user request. You can continuously improve your experience by updating the sample dialogs and debugging with the updated testing tools, all without refactoring your logic.

You can create a skill that uses Alexa Conversations to manage the entire skill experience, or you can extend an existing skill with Alexa Conversations. For example, your skill can use your existing code to handle simple interactions. Then, your skill can delegate dialog management to Alexa Conversations for tasks that involve many two-way conversations with the user.

Why Alexa Conversations?

Alexa Conversations helps users experience natural conversations with Alexa. Alexa Conversations uses AI to bridge the gap between experiences that you can build manually and the vast range of possible conversations. You provide sample dialogs that show your expected interactions and you provide templates for the APIs you need called, and Alexa's AI extrapolates the spectrum of phrasing variations and dialog paths. Instead of identifying and coding every possible way users might engage your skill, Alexa's AI creates the permutations and handle dialog state management, context carry-over, and corrections for you.

Alexa Conversations is especially useful for use cases where the conversation can take a number of unanticipated paths as the user naturally talks to the skill, such as when a user chooses a movie, orders food, or makes a reservation. For example, when ordering a pizza, a user might do the following:

  • Answer more than one question at once ("Medium, two toppings.")
  • Ask questions and expect Alexa to track previously provided information ("How many people does that feed?")
  • List values ("Pepperoni and green pepper.")
  • Make a correction ("Make that a large.")

Through machine learning, Alexa Conversations can handle the complexity and wide variation in these types of conversations. Instead of configuring hard-coded conversation paths, you provide ideal user experiences as dialogs. Alexa's AI extrapolates additional conversational paths, learns to handle a wide range of unexpected dialogs, different pathways, and nonlinear user flows. Alexa Conversations monitors the dialog context, producing a natural conversational experience that gathers the information that your skill requires to complete a task. Only then does Alexa Conversation call your skill code. You can re-train your model to fill in missing gaps or handle new inputs and re-certify your skill at any time.

Alexa Conversations features

Alexa Conversations performs the following functions for your skill:

  • State management – Selects and renders Alexa speech prompts to guide the user to the next state.
  • Dialog variations – Asks the user follow-up questions to gather missing information.
  • User-driven corrections – Handles the user changing their mind.
  • Context carry-over – Updates an option without needing the user to repeat the other options.

Should I build my skill with Alexa Conversations?

Consider using Alexa Conversations if:

  • Your skill is goal-based, such as for booking transportation, buying tickets, providing recommendations, or ordering food.
  • Your skill has open-ended, two-way interactions with the user and requires collecting several complex data points to accomplish the user goal.
  • You can't manage all potential user interactions and states in your skill code to create a flexible, natural experience for users.
  • You don't want to write code to manage the state for all use interactions.

Alternately, you can build your skill using intent-based dialog management. For details, see Create the Interaction Model for Your Skill and Define the Dialog to Collect and Confirm Required Information. Use intent-based dialog management if:

  • Your skill requires a pre-determined dialog path and specific workflow the user is expected to follow.
  • You want to maintain complete control over turn-by-turn state management within your skill code.

How you build an Alexa Conversations skill

When you build an Alexa Conversations skill, you create the following components that train Alexa Conversations how to interact with your user.

Alexa Conversations elements.
Component Description

Dialogs

Dialogs are sample conversations between the user and Alexa.

You write user turns and Alexa turns in simple text, and then annotate the dialogs to show which parts of the dialogs represent Alexa responses, utterance sets, API calls, and slot types. You write and annotate dialogs entirely in the developer console. For details, see Write Dialogs for Alexa Conversations.

Utterance sets

Utterance sets are sample variations in how a user might say a response or request.

In a pizza-ordering skill, a user might ask for available toppings by saying, "What toppings do you have?", "What can I add?", and so on. You might use those lines as sample utterances for a RequestToppingList utterance set. For details, see Add Utterance Sets for Alexa Conversations.

API definitions

API definitions represent requests that your skill handles and the corresponding responses that your skill returns to Alexa Conversations.

You define an API for every request that your skill code handles. When you define an API, you specify which arguments pass into and out of the API. As the user interacts with your skill, Alexa Conversations predicts the correct API based on the current dialog context. For details, see Define APIs for Alexa Conversations.

Responses

Responses include audio and visual elements that Alexa uses to respond to the user.

You specify audio and visual responses as Alexa Presentation Language for Audio and Alexa Presentation Language (APL), respectively. You can also optionally pass arguments into responses. For details, see Define Responses from Alexa for Alexa Conversations.

Slot types

All variables that pass between user utterances, Alexa responses, and APIs must have a slot type. As with intent-based interaction models, slot types define how Alexa recognizes, handles, and passes data between components. For details, see Use Slot Types in Alexa Conversations.

Dialog variables

Dialog variables are instances of a slot type, provided by the user or an API response and used for dialog state, business logic, or response content.

Dialog acts

Dialog acts are tags that indicate the purpose of each interaction in a dialog to describe what is happening at a specific point in a conversation. Dialog acts train the conversational AI.

Introduction to dialog acts

A key task in Alexa Conversations skill development is to label each turn of your sample conversations with a dialog act. Dialog acts represent the purpose of the utterance. For a full list of dialog acts, see Dialog Act Reference for Alexa Conversations. The following example shows the dialog act associated with the turns of a dialog for a weather skill.

User: What's the weather? (Dialog act: Invoke APIs)
Alexa: What city? (Dialog act: Request Args)
User: Seattle. (Dialog act: Inform Args)
Alexa: What date? (Dialog act: Request Args)
User: Today. (Dialog act: Inform Args)
Alexa: Are you sure you want the weather for Seattle today? (Dialog act: Confirm API)
User: Yes. (Dialog act: Affirm)
Alexa: The weather in Seattle for today is 70 degrees. (Dialog act: API Success)

Keep in mind that if Alexa doesn't have all the required information, the dialog act might not happen right away. For example, the dialog act associated with a user's request, "I want to order a pizza," is to order a pizza (that is, to invoke an API in your skill code that places a pizza order). However, Alexa doesn't have all the information — such as the size and toppings — that your API needs to fulfill the request. Alexa therefore asks the user for the required information in a flexible, natural-sounding way. Alexa asks as many times as necessary for the user to provide the pieces of information that your API needs. Only then does Alexa invoke your skill code. Alexa Conversations AI controls the rest of the conversation.

The flow of dialog acts within a dialog must meet certain guidelines. For the supported dialog act flows, see Dialog Act Guidelines for Alexa Conversations. For details about all dialog acts, see Dialog Act Reference for Alexa Conversations.

Requests from Alexa Conversations to your skill

During run time, Alexa Conversations uses artificial intelligence, based on the dialog model, to manage the conversation with the user. Alexa calls your skill endpoint only when the user has provided all the information that the API needs to fulfill the request. You can host your skill endpoint on AWS Lambda or your web server. When Alexa does call your skill, the JSON requests and responses are similar to the format described in the Request and Response JSON Reference for custom skills.

The request to your skill is similar to an intent request, but is of type Dialog.API.Invoked. The response from the skill includes a status and a return value. These values contain the data that Alexa Conversations uses to select and populate the response template and inform subsequent dialog turns and API calls. For details about the request and response format for Alexa Conversations, see Request and Response Reference for Alexa Conversations. For details about how your skill handles calls from Alexa Conversations, see Handle API Calls for Alexa Conversations.

Adding Alexa Conversations to an existing skill

You can have Alexa Conversations handle all or part of the dialog management for an existing skill. To switch from intent-based dialog management to Alexa Conversations (or vice versa), you send a Dialog.DelegateRequest directive from your skill code. Depending on how you configure the directive, you can have delegation automatically switch back after the next turn or only when the skill explicitly sends another Dialog.DelegateRequest directive. In any case, you can save session attributes, such as the dialog state, when you hand off delegation. For details, see Steps to Add Alexa Conversations to an Existing Skill and Hand off Dialog Management to and from Alexa Conversations.