About Alexa Conversations
Alexa Conversations is offered as a beta and may change as we receive feedback and iterate on the feature. Alexa Conversations currently supports
Alexa Conversations (Beta) is a deep learning–based approach to dialog management that enables you to create natural, human-like voice experiences on Alexa. Alexa Conversations helps skills respond to a wide range of phrases and unexpected conversational flows, and gives skills the conversational memory to sustain long, two-way interactions between Alexa and the user.
You provide sample dialogs in the dialog editor, and then annotate the sample dialogs with dialog acts, utterance sets, and responses that contain audio and visual elements. You also specify when to invoke APIs and which arguments to use so the dialog manager can gather the information to trigger your skill code. During the course of the conversation, your skill responds to Alexa to fulfill the user request. You can continuously improve your experience by updating the sample dialogs and debugging with the updated testing tools, all without refactoring your logic.
You can create a skill that uses Alexa Conversations to manage the entire skill experience, or you can extend an existing skill with Alexa Conversations. For example, your skill can use your existing code to handle simple interactions. Then, your skill can delegate dialog management to Alexa Conversations for tasks that involve many two-way conversations with the user.
- Why Alexa Conversations?
- Alexa Conversations features
- Should I build my skill with Alexa Conversations?
- How you build an Alexa Conversations skill
- Introduction to dialog acts
- Requests from Alexa Conversations to your skill
- Adding Alexa Conversations to an existing skill
- Related topics
Why Alexa Conversations?
Alexa Conversations helps users experience natural conversations with Alexa. Alexa Conversations uses AI to bridge the gap between experiences that you can build manually and the vast range of possible conversations. You provide sample dialogs that show your expected interactions and you provide templates for the APIs you need called, and Alexa's AI extrapolates the spectrum of phrasing variations and dialog paths. Instead of identifying and coding every possible way users might engage your skill, Alexa's AI creates the permutations and handle dialog state management, context carry-over, and corrections for you.
Alexa Conversations is especially useful for use cases where the conversation can take a number of unanticipated paths as the user naturally talks to the skill, such as when a user chooses a movie, orders food, or makes a reservation. For example, when ordering a pizza, a user might do the following:
- Answer more than one question at once ("Medium, two toppings.")
- Ask questions and expect Alexa to track previously provided information ("How many people does that feed?")
- List values ("Pepperoni and green pepper.")
- Make a correction ("Make that a large.")
Through machine learning, Alexa Conversations can handle the complexity and wide variation in these types of conversations. Instead of configuring hard-coded conversation paths, you provide ideal user experiences as dialogs. Alexa's AI extrapolates additional conversational paths, learns to handle a wide range of unexpected dialogs, different pathways, and nonlinear user flows. Alexa Conversations monitors the dialog context, producing a natural conversational experience that gathers the information that your skill requires to complete a task. Only then does Alexa Conversation call your skill code. You can re-train your model to fill in missing gaps or handle new inputs and re-certify your skill at any time.
Alexa Conversations features
Alexa Conversations performs the following functions for your skill:
- State management – Selects and renders Alexa speech prompts to guide the user to the next state.
User: I'd like to order a pizza for delivery.
Alexa: Ok. What size pizza?
Alexa: What toppings?
User: Green peppers and olives.
Alexa: Would you like to order anything else?
Alexa: Ok. Your pizza will arrive in about 25 minutes.
- Dialog variations – Asks the user follow-up questions to gather missing information.
User: I want the weather.
Alexa: What date and city?
Alexa: What date?
Alexa: Ok. Tomorrow in Seattle, you can expect a high of 71 degrees and a low of 57 degrees.
- User-driven corrections – Handles the user changing their mind.
User: I want a medium family dog.
Alexa: Do you prefer high energy or low energy dogs?
User: High energy.
Alexa: What about a border collie?
User: How about a small dog?
Alexa: In that case, I recommend a Jack Russell Terrier. Would you like me to search local shelters and rescue groups for a Jack Russell Terrier?
- Context carry-over – Updates an option without needing the user to repeat the other options.
Alexa: That's a medium pizza with green peppers, olives, and light sauce for delivery, right?
User: I'd like two of those.
Alexa: No problem. Two medium pizzas with green peppers, olives, and light sauce for delivery, right?
For details on how Alexa Conversations performs these functions, see How Alexa Conversations Works.
Should I build my skill with Alexa Conversations?
Consider using Alexa Conversations if:
- You're a published voice developer who has experience with Node.js or Python, and you're familiar with the basic constructs of artificial intelligence and machine learning.
- Your skill is goal-based, such as for booking transportation, buying tickets, providing recommendations, or ordering food.
- Your skill has open-ended, two-way interactions with the user and requires collecting several complex data points to accomplish the user goal.
- You can't manage all potential user interactions and states in your skill code to create a flexible, natural experience for users.
- You don't want to write code to manage the state for all use interactions.
Alternately, you can build your skill using intent-based dialog management. For details, see Create the Interaction Model for Your Skill and Define the Dialog to Collect and Confirm Required Information. Use intent-based dialog management if:
- Your skill requires a pre-determined dialog path and specific workflow the user is expected to follow.
- You want to maintain complete control over turn-by-turn state management within your skill code.
How you build an Alexa Conversations skill
When you build an Alexa Conversations skill, you create the following components that train Alexa Conversations how to interact with your user.
Dialogs are sample conversations between the user and Alexa.
Utterance sets are sample variations in how a user might say a response or request.
API definitions represent requests that your skill handles and the corresponding responses that your skill returns to Alexa Conversations.
Responses include audio and visual elements that Alexa uses to respond to the user.
All variables that pass between user utterances, Alexa responses, and APIs must have a slot type. As with intent-based interaction models, slot types define how Alexa recognizes, handles, and passes data between components. For details, see Use Slot Types in Alexa Conversations.
Dialog variables are instances of a slot type, provided by the user or an API response and used for dialog state, business logic, or response content.
Dialog acts are tags that indicate the purpose of each interaction in a dialog to describe what is happening at a specific point in a conversation. Dialog acts train the conversational AI.
Introduction to dialog acts
A key task in Alexa Conversations skill development is to label each turn of your sample conversations with a dialog act. Dialog acts represent the purpose of the utterance. For a full list of dialog acts, see Dialog Act Reference for Alexa Conversations. The following example shows the dialog act associated with the turns of a dialog for a weather skill.
User: What's the weather? (Dialog act: Invoke APIs)
Alexa: What city? (Dialog act: Request Args)
User: Seattle. (Dialog act: Inform Args)
Alexa: What date? (Dialog act: Request Args)
User: Today. (Dialog act: Inform Args)
Alexa: Are you sure you want the weather for Seattle today? (Dialog act: Confirm API)
User: Yes. (Dialog act: Affirm)
Alexa: The weather in Seattle for today is 70 degrees. (Dialog act: API Success)
Keep in mind that if Alexa doesn't have all the required information, the dialog act might not happen right away. For example, the dialog act associated with a user's request, "I want to order a pizza," is to order a pizza (that is, to invoke an API in your skill code that places a pizza order). However, Alexa doesn't have all the information — such as the size and toppings — that your API needs to fulfill the request. Alexa therefore asks the user for the required information in a flexible, natural-sounding way. Alexa asks as many times as necessary for the user to provide the pieces of information that your API needs. Only then does Alexa invoke your skill code. Alexa Conversations AI controls the rest of the conversation.
The flow of dialog acts within a dialog must meet certain guidelines. For the supported dialog act flows, see Work with Dialog Acts in Alexa Conversations. For details about all dialog acts, see Dialog Act Reference for Alexa Conversations.
Requests from Alexa Conversations to your skill
During run time, Alexa Conversations uses artificial intelligence, based on the dialog model, to manage the conversation with the user. Alexa calls your skill endpoint only when the user has provided all the information that the API needs to fulfill the request. You can host your skill endpoint on AWS Lambda or your web server. When Alexa does call your skill, the JSON requests and responses are similar to the format described in the Request and Response JSON Reference for custom skills.
The request to your skill is similar to an intent request, but is of type
Dialog.API.Invoked. The response from the skill includes a status and a return value. These values contain the data that Alexa Conversations uses to select and populate the response template and inform subsequent dialog turns and API calls. For details about the request and response format for Alexa Conversations, see Request and Response Reference for Alexa Conversations. For details about how your skill handles calls from Alexa Conversations, see Handle API Calls for Alexa Conversations.
Adding Alexa Conversations to an existing skill
You can have Alexa Conversations handle all or part of the dialog management for an existing skill. To switch from intent-based dialog management to Alexa Conversations (or vice versa), you send a
Dialog.DelegateRequest directive from your skill code. Depending on how you configure the directive, you can have delegation automatically switch back after the next turn or only when the skill explicitly sends another
Dialog.DelegateRequest directive. In any case, you can save session attributes, such as the dialog state, when you hand off delegation. For details, see Steps to Add Alexa Conversations to an Existing Skill and Hand off Dialog Management to and from Alexa Conversations.