Alexa Skills Kit    >    Get Deeper    >    Tutorials & Code Samples    >   Build Multi-turn Skills Tutorial with Alexa Conversations

We're Still in Beta...

Alexa Conversations is still in beta and updates may have been released that are not reflected in this tutorial yet. If you have questions or feedback about Alexa for Conversations, Click on the feedback button on the lower right-hand corner.

Module 2: Building a Skill Using Alexa Conversations  

line-break

Building Pet Match  

Throughout this course you will to learn how to use Alexa Conversations to build a multi-turn conversational Alexa skill called Pet Match. Pet Match is a skill that recommends a dog based upon three parameters. To understand how it works take a look at Pet Match’s happy path script.

Customer: I want a large family dog.

Alexa: Do you want a high-energy or low-energy dog?

Customer: Low.

Alexa: So you want a large low-energy family dog. I recommend a chihuahua.

The happy path for Pet Match guides our customer through the process of filling three parameters our skill code needs in order to provide a recommendation. Those parameters are sizetemperament and energy.

Note

Did you notice that we are recommending a chihuahua when the user asked for a large dog? No need to fear. By the time we finish annotating our dialog Alexa Conversations will know how to replace chihuahua with the output it receives from your skill code.

In this case, the customer provided both the size and temperament at the beginning, but what if they only provided size? Or just the temperament? Or maybe they ignore the question entirely and provide the energy. What if they provide all three parameters at the beginning?
 

As you can see our single happy path is a great place to start, but it is not equipped to handle all the ways the conversation could deviate. Throughout this course you’ll learn how to train Alexa Conversations to assist you.

Note

Alexa Conversations doesn’t limit you to one happy path. You can create several different happy paths that represent things like fulfilling one or more tasks, and demonstrating branching paths based upon user input. Throughout this course you will create several happy paths.

Now that you understand what you’ll be building, take a moment to familiarize yourself with the fundamentals of Alexa Conversations.

line-break

What is Alexa Conversations?

We are only just beginning to scratch of the surface of Alexa Conversations. There is much to learn, so we will start with a high-level description and go deeper throughout the course.
 

Alexa Conversations is an A.I. driven dialog manager that:

  • Simulates and predicts how the customer will deviate from your happy path.

  • Keeps track of the conversational context.

  • Determines how to respond to customer input.

  • Handles inputs out of sequence and customer corrections.

  • Handles context carry-over so the skill doesn’t forget previously collected information.

  • Tracks the required inputs in order to perform a task

  • Delivers a JSON request to perform a task to your skill code.

  • Converts the JSON response from your skill code to spoken output.


That’s quite a lot of stuff! These benefits fall into two categories front-end and back-end. Since Alexa Conversations simulates and predicts how the conversation will deviate from your happy path, you don’t have to provide as much training data to your front-end to support deviations. Alexa Conversations does a lot of the heavy lifting when it comes to tracking the conversational context, handling user corrections, value confirmations, determining what to prompt for next and how to surface the results of a task to the customer. Since you no longer have to write this code yourself, you are able focus on receiving input and returning output.

 

line-break

How to Train Alexa Conversations

That training data is composed of 5 components that you will define at build-time.

 

The five build-time components are:

  • Dialogs

  • Slots

  • Utterance sets

  • Response templates

  • API definitions


Throughout the course, you’ll get hands-on experience defining and building each component.

Dialogs

Your dialogs represent the multi-turn conversation between your skill and the customer in order to perform a task. It is a happy path. You’ll want to create a dialog for each task or set of tasks your skill will complete for your customer.
 

Here’s the happy path presented earlier in this module as our first dialog.

Customer: I want a large family dog.

Alexa: Do you want a high-energy or low-energy dog?

Customer: Low.

Alexa: So you want a large low-energy family dog. I recommend a chihuahua.

When designing a skill, you’ll start from the conversation and write a happy path dialog. Using that dialog, you’ll work back to a solution via a process called dialog annotation using the four other build-time assets to create your skill. Alexa Conversations will simulate the conversation, predict how the conversation may deviate from the happy path and supplement the model with these predictions. The result is a far more flexible experience.
 

Since the skill needs the sizetemperament and energy to perform a match, you’ll need to tell Alexa Conversations what parts of the conversation to collect and how to capture the response from our skill code. To do this, you’ll need to define slots. If you’ve built an Alexa skill before, you should already be familiar with them, however, Alexa Conversations has slots that can also represent data returned from your skill code.

Slots 

In short, slots represent the input your skill code needs from the user utterance in order perform a task. Slots are backed by a type that helps Alexa’s automatic speech recognition (ASR) figure out how to fill it.
 

To provide a recommendation Pet Match needs three slots: sizetemperament and energy. Once you’ve created your slots, we can replace the hard-coded values in our dialog with the slots.

Customer: I want a {size} {temperament} dog.

Alexa: Do you want a high-energy or low-energy dog?

Customer: {energy}.

Alexa: So you want a large low-energy family dog. I recommend a chihuahua.

Now that you’ve replaced the values with slots, your dialog can now capture the value given by the customer at run-time. For example, if the customer says they want a small low-energy dog then the value of size will be small and the value of energy will be high. These slots are known as custom slots with values (VCS) because they represent values provided by our customer. But what about the values returned from our skill code?

Alexa Conversations introduces custom slot types with properties (PCS) to define the data passed between our build-time components. They can be singular or compound. Pet Match’s skill code returns the recommendation as a complex JSON object. We can define a compound PCS that represents this object so Alexa Conversations knows how to convert the raw data into speech using a response template. (We’ll cover response templates below.)
 

Now that we have a place to collect input and output, we need a way to assist the A.I. when building our model.

Utterance Sets 

The dialog represents one possible way that our customer might respond. What happens if they say something slightly different? When building your model, Alexa Conversations runs simulations to determine how utterances could vary and adds them to your model. Utterance sets enable you to manually specify these additional utterances.

Note

If you've built an intent-based Alexa skill, this is similar to mapping sample utterances to intents. With Alexa Conversations you’re mapping utterances to a line of dialogue instead.

Your utterance sets will contain a type of annotation called dialog acts. A dialog act categorizes the action taking place at each line of dialog. This categories each line of dialog related to the intent and helps Alexa Conversations determine how to handle input (what the user says) and output (how Alexa responds). To demonstrate we’ll add dialog acts to our dialog.

U: I want a large family dog.[invoke]

A: Do you want a high-energy or low-energy dog?[request]

U: Low energy.[inform]

A: So you want a large low-energy family dog. I recommend a chihuahua.[notify success]

When the customer starts off asking for, "a large family dog" there intent is to invoke a recommendation. Since we don’t have all the necessary information, the skill requests the missing information and the customer informs their choice. Once all required information is gathered, the API is invoked and the skill notifies the customer of a successful API invocation.
 

Now that you’ve set your utterance sets so that Alexa Conversations can predict what the customer will say, you’ll need to provide a set of response templates so Alexa Conversations can respond appropriately based upon the context of the conversation.

Response Templates 

Alexa Conversations uses response templates to respond to the customer. Just like utterance sets, they are associated with dialog acts. This helps Alexa Conversations determine when to use the template.
 

Let’s isolate the lines that our skill responds with from our dialog.

A: Do you want a high-energy or low-energy dog?[request]

A: So you want a large low-energy family dog. I recommend a chihuahua.[notify success]

Since our customer provided the size and temperament slots, Alexa Conversations uses the request template to prompt the customer for the energy slot. Once our customer provides it, Alexa Conversations will determine it has everything it need and sends a JSON request to your skill code.

When your skill code provides a JSON response, Alexa Conversations will use the response template you associated to the notify success dialog act that corresponds to an API definition. In order to respond to a failure you’ll need to define a response template that is mapped to the notify failure dialog act.

Note

Response templates that request slots aren’t limited to just one slot. To have the skill request the size and temperament slots we can create another template that requests both and an utterance set that allows the customer fill both. You’ll learn how to do this in module 4.

Response templates are built with Alexa Presentation Language-Audio (APL-A) a JSON based document schema that provides a set of easy to use components that you can use to build your response without much code. You’ll learn how to use APL-A throughout this course.
 

Now that you’ve set Alexa Conversations to respond to the customer, you’ll need to define what your skill needs to perform tasks for the customer. To do that you’ll create an API definition.

API Definitions

You can think of an API definition as a representation of a task. It includes a name, a set of inputs, outputs and the response template(s) necessary to provide a response. At run-time, Alexa Conversations will collect all the required slot values your skill needs using the response templates you defined to request the slot values. Once Alexa Conversations has all the necessary slots, it will send a JSON request to your skill’s backend.
 

Below is an example of the request that gets sent to Pet Match.

Copied to clipboard

{
    ...
    "request": {
        "type": "Dialog.API.Invoked",
        "requestId": "amzn1.echo-api.request.",
        "timestamp": "2020-07-22T07:52:34Z",
        "locale": "en_US",
        "apiRequest": {
            "name": "getRecommendation",
            "arguments": {
                "size": "large",
                "temperament": "guard",
                "energy": "low"
            },
            "slots": {
                "size": {
                    "type": "Simple",
                    "value": "large",
                    "resolutions": {
                        ...
                    }
                },
                "temperament": {
                    "type": "Simple",
                    "value": "guard",
                    "resolutions": {
                        ...
                    }
                },
                "energy": {
                    "type": "Simple",
                    "value": "low",
                    "resolutions": {
                        ...
                    }
                }
            }
        }
    },
    ...
}

Alexa Conversations requests sent to your skill code are identified by a new request type, Dialog.API.Invoked. This new request type contains a new property called apiRequest.

The apiRequest contains all the information our skill code needs to handle the request, including, the name of the API, arguments and slots. The slots inlcude any synonyms that you’ve defined with entity resolution.

The skill code will take the sizetemperament, and energy slots, resolve them to their canonical values and make a recommendation. It bundles the results into a JSON object based upon the compound VCS we defined called RecommendationResult and sends a JSON response to Alexa Conversations.

 

Below is what the response looks like:

Copied to clipboard

{
    "apiResponse": {
        "name": "Great Pyrenees",
        "size": "large",
        "energy": "low",
        "temperament": "family"
    }
}

At build-time it will associate a response template to the notify success dialog act for the task. At run-time, when Alexa Conversations receives the response from the skill code, it will insert the data into the template and Pet Match will provide the recommendation. At this point the skill will give a recommendation based upon the slots rather than the Chihuahua we hard coded entity.

 

Since our customer requested a large, low-energy family dog, Pet Match recommends a great pyrenees not a chihuahua.

line-break

Get Started 

Now that you have a high-level understanding of the Alexa Conversations build-time assets you need to create, it’s time to start building! Take your time to navigate through the step-by-step instructions to build the skill. Each step will provide context and explanations to equip you with the knowledge and understanding necessary at that moment to complete the task. This will help solidify the concepts so you’ll understand the 'how' and the 'why' as you go along.

Continue to Module 3 below to start building Pet Match with Alexa Conversations.