Alexa Skills Kit    >    Get Deeper    >    Tutorials & Code Samples    >   Build Multi-turn Skills Tutorial with Alexa Conversations

We're Still in Beta...

Alexa Conversations is still in beta and updates may have been released that are not reflected in this tutorial yet. If you have questions or feedback about Alexa for Conversations, Click on the feedback button on the lower right-hand corner.

Module 1: Voice Design Challenges

To appreciate the benefits of Alexa Conversations, it is important to understand the complexity of designing a conversational user interface that just works.

There are two major interactions on Alexa one-shot and multi-turn.

line-break

One-shot Interactions

One-shot interactions involve a single turn such as setting a timer, checking the weather, and turning on a light. These simple instructions require little to no input. The customer speaks an utterance (voice command) that is mapped to an intent (intended action). The Alexa Service normalizes the request into JSON and passes it to the skill’s backend code. It handles the request and sends a JSON response which Alexa speaks.

line-break

Multi-turn Interactions  

More complex situations like booking a flight, ordering food, and giving recommendations require a lot of input. In this case it’s often too much to ask the customer to give all the input in one breath. Instead it’s more natural to have a multi-turn conversation, where the skill asks follow-up questions until all necessary inputs have been collected. In order to do this, your skill’s backend code needs to keep track of the conversational context, and ask follow-up questions to collect all the required information.

When designing a skill, it’s important to start with a script that represents the multi-turn conversation between the skill and the customer to complete a task. This is known as the happy path.
 

Interactions that follow the prescripted happy path are linear experiences.

Linear

They are linear because it only covers the case where everything goes as planned with no detours. The image below represents our happy path. Since it happens to be a straight line. We can call this a linear multi-turn dialog.

linear dialog
Figure 1. Happy Path, Linear dialog, no detours

Building an Alexa Skill that only supports the happy path dialog is quite simple, you can map a set of utterances to an intent for each input you need to collect and in your skill code determine what to prompt for next based upon the input you need. You can even use dialog management to roll all the utterances and prompts into a single intent and pass dialog directives back and forth leaving it up to the Alexa Service to determine what to prompt for next based upon the inputs.
 

Since there are infinitely many ways a conversation can take place. The likelihood that our customer will follow our prescripted linear happy path is very low.

Nonlinear

Below you can see that our customer deviated from our happy path. Although the path isn’t as straight forward the end result is the same. After collecting the required inputs the skill provides a response.

non-linear dialog
Figure 2. Non-linear dialog, deviating from the path

Maybe our customer is indecisive, changed their mind, or asked some clarifying questions. These are few of the ways the actual conversation may deviate. Nonlinear multi-turn conversations are orders of magnitude more difficult to build a solution for.

To complicate matters your interaction model where you map utterances to intents is built and certified before customers interact with it. At runtime if your customer says something you left out your skill will not respond correctly.

 


It requires a lot of forethought, experimentation and testing to build out a conversational user interface that:

  • Accurately predicts how the user may deviate

  • Tracks the conversational context

  • Determines what to prompt for next based upon needs

  • Gracefully handles input out of sequence

  • Handles user correction

  • Handles confirmations

  • Performs the task

  • Formats raw data into spoken word

As you can see there’s a lot of stuff that you as a voice designer and skill builder must do in order to provide a naturally conversant voice user interface
 

Sometimes skill builders scale back their idea when they’ve determined it would take too much effort to fully support all the possible ways the conversation could deviate.

Wouldn’t it be great if there was something that could ease this burden allowing you to focus on providing the core experience? This is where Alexa Conversations really shines! It’s a new dialog manager that uses artificial intelligence to analyze your happy path, predict how the customer will deviate and update the model with those deviations. It also keeps track of the conversational context, determines what to prompt for next, handles corrections, confirmations and formats the data your backend skill code provides based upon a template you provide.

Now that you understand the problem and Alexa Conversations helps solve it, let’s dive deeper into Alexa Conversations.
 

Continue to Module 2 to learn about Pet Match the skill you’ll be building and the fundamentals of Alexa Conversations.