Over the past few months, I've spent a lot of time deep diving into dialog management and sharing what I've learned with the Alexa developer community. I'm often asked, "Is there away to change the slot and the question that Alexa prompts the user for next based on the previous answer?" The answer is yes!
Dialog management simplifies creating a multi-turn conversational skill. When you mark at least one of your intent's slots as required to fulfill the intent, the Alexa voice service will keep track of the dialog state and the required slots that still need to be collected. When your back end returns the Dialog.Delegate directive, the Alexa voice service will automatically prompt for the next slot. If you want to override what Alexa will prompt for next, you can do so with the Dialog.ElicitSlot directive.
For example, let's say we are making a coffee shop skill and our customer can choose between coffee and tea. When the customer chooses coffee, our skill needs to ask if they want a light, medium, medium-dark, or dark roast. If the customer wants tea, then our skill should ask the user their choice of either black, green, oolong, or white tea. It wouldn't make sense in response to the customer asking for coffee, the skill asked the customer what type of tea they want.
In this technical post, I’ll walk you through how to use dialog management to dynamically elicit slots based on a previous answer from a customer.
For our voice model, we will create an intent called OrderIntent, which will handle the utterances for placing an order at our coffee shop.
When our customer interacts with our skill, they may say the following things to tell us what they want to order.
Start my order
I'll have coffee
I want to drink tea
I want dark coffee
I want tea to drink
I want oolong tea
Green tea sounds great
tea please
Looking through the utterances above, we can identify the three slots that we need to capture what the user orders: drink, coffeRoast, and teaType.
Now that we've determined our slots, let's update our utterances and replace the values with their slots so we can capture the values in our skill's back-end code.
Start my order
I'll have {drink}
I want to drink {drink}
I want {coffeeRoast} {drink}
I want {drink} to drink
I want {teaType} {drink}
{teaType} {drink} sounds great
{drink} please
The "I want {teaType} {drink}" utterance will enable the customer to give all the information in one breath, while "I'll have {drink}" requires our skill to ask a follow-up question based on the value given for drink. We'll go over how to do that when we talk about the backend.
Our slot values will correspond to the options that are available for purchase from our coffee shop. The table below shows the values of each of our slots.
drink |
coffeRoast |
teaType |
---|---|---|
coffe |
light |
black |
tea |
medium |
green |
|
medium dark |
white |
|
dark |
oolong |
Once we've set up our utterances and slots, it's time to activate dialog management so Alexa will handle prompting the user for the new slots. To activate dialog management, we need to set at least one of the slots belonging to the OrderIntent to be required.
Once we mark it required, we need to provide a prompt that Alexa will say when prompting the user to fill the slot and the sample utterances the user might say to fill it. While we could technically make all three slots required and manually exit out of dialog management when we have the minimum slot values we need, which is either drink and coffeeRoast or drink and teaType, or (drink || coffeeRoast && teaType) for short, we are going to use the ElicitSlot directive to control what gets prompted for after the user makes a drink selection and we'll only mark drink as required. Our prompt will be "Which would you like coffee or tea?" and our sample utterances will be:
I'll have {drink}
I want to drink {drink}
I want {coffeeRoast} {drink}
I want {drink} to drink
I want {teaType} {drink}
{teaType} {drink} sounds great
{drink} please
Notice that the slot utterances are almost identical to the intent utterances. This affords the customer more leeway in case they give us more information that what was asked for.
Note: We didn't define "{drink}" as one of our utterances. We could have done so, but it's optional because Alexa automatically adds it to the model for us.
With our front end properly configured, let's take a look at how the back end works so we can elicit our slots based upon the value of drink.
While it might be tempting to use a flow chart to express what slot should be elicited next based on upon a previous answer, we must be careful not to turn the skill into a phone tree. Phone trees are rigid and brittle and don't often result in great conversational experiences. Instead we should use situational design to make our skill react to the user's answer based on the information that we already have collected, which is why I prefer to use expressions rather than flow charts to express what information my skill needs based on the situation. Ultimately the expression that we are trying meet is (drink || coffeeRoast && teaType). Our skill needs a value for drink and either coffeeRoast or teaType and that depends on the value of drink which is our situation.
If the value of drink is coffee then our skill needs a value for coffeeRoast. Likewise, if the drink is tea then our skill needs a value for teaType.
To elict our slots based on the value of drink we will define four handlers. These handlers will represent the various situations of our skill and will elicit the appropriate slots.
Our handlers are:
This handler is a simple one. The canHandle function returns true if:
This will occur when the user says "start my order."
canHandle(handlerInput) {
return handlerInput.requestEnvelope.request.type === "IntentRequest"
&& handlerInput.requestEnvelope.request.intent.name === "OrderIntent"
&& handlerInput.requestEnvelope.request.dialogState !== 'COMPLETED';
},
The handle function simply returns Dialog.Delegate so Alexa will automatically prompt for the missing required slot.
handle(handlerInput) {
return handlerInput.responseBuilder
.addDelegateDirective()
.getResponse();
}
Below is the whole function:
const StartedInProgressOrderIntentHandler = {
canHandle(handlerInput) {
return handlerInput.requestEnvelope.request.type === "IntentRequest"
&& handlerInput.requestEnvelope.request.intent.name === "OrderIntent"
&& handlerInput.requestEnvelope.request.dialogState !== 'COMPLETED';
},
handle(handlerInput) {
return handlerInput.responseBuilder
.addDelegateDirective()
.getResponse();
}
}
Here is where things become a little more interesting. This handler will use Dialog.ElicitSlot in order to elicit the coffeRoast slot since our drink slot is coffee.
The canHandle function will return true if:
The last condition is super important. Without it the skill will never stop eliciting the coffeeRoast slot. Remember the scene in "Dude Where's My Car" where they are ordering Chinese Food and the drive-thru keeps asking, "and then?". You don't want your skill to pester your customers like that.
canHandle(handlerInput) {
return handlerInput.requestEnvelope.request.type === "IntentRequest"
&& handlerInput.requestEnvelope.request.intent.name === "OrderIntent"
&& handlerInput.requestEnvelope.request.intent.slots.drink.value
&& handlerInput.requestEnvelope.request.intent.slots.drink.value === 'coffee'
&& !handlerInput.requestEnvelope.request.intent.slots.coffeeRoast.value
}
Our handle function will define the speak and reprompts just as if it were returning a standard speech directive, however calling addElicitSlotDirective will cause Alexa to elicit the given slot next even if it wasn't marked required in our Voice Model.
return handlerInput.responseBuilder
.speak('Which roast would you like light, medium, medium-dark, or dark?')
.reprompt('Would you like a light, medium, medium-dark, or dark roast?')
.addElicitSlotDirective('coffeeRoast')
.getResponse();
Zoomed out the CoffeeGivenOrderIntentHandler will appear as it does below:
const CoffeeGivenOrderIntentHandler = {
canHandle(handlerInput) {
return handlerInput.requestEnvelope.request.type === "IntentRequest"
&& handlerInput.requestEnvelope.request.intent.name === "OrderIntent"
&& handlerInput.requestEnvelope.request.intent.slots.drink.value
&& handlerInput.requestEnvelope.request.intent.slots.drink.value === 'coffee'
&& !handlerInput.requestEnvelope.request.intent.slots.coffeeRoast.value
},
handle(handlerInput) {
return handlerInput.responseBuilder
.speak('Which roast would you like light, medium, medium-dark, or dark?')
.reprompt('Would you like a light, medium, medium-dark, or dark roast?')
.addElicitSlotDirective('coffeeRoast')
.getResponse();
}
}
By now you should have an understanding of how the TeaGivenOrderIntentHandler will be structured. In fact is pretty much the exact same thing as CoffeeGivenOrderIntentHandler, but we are checking if drink is tea and that teaType has no value.
canHandle(handlerInput) {
return handlerInput.requestEnvelope.request.type === "IntentRequest"
&& handlerInput.requestEnvelope.request.intent.name === "OrderIntent"
&& handlerInput.requestEnvelope.request.intent.slots.drink.value
&& handlerInput.requestEnvelope.request.intent.slots.drink.value === 'tea'
&& !handlerInput.requestEnvelope.request.intent.slots.teaType.value
}
Now that we've identified what we can handle, just like the CoffeeGivenOrderIntentHandler we are going to use the ElicitSlot directive to tell Alexa which slot to elicit next. In this case we will elicit the teaType slot.
return handlerInput.responseBuilder
.speak("Which would you like black, green, oolong, or white tea?")
.reprompt("Would you like a black, green, oolong, or white tea?")
.addElicitSlotDirective('teaType')
.getResponse();
}
Combining the above pieces the TeaGivenOrderIntentHandler will appear as:
const TeaGivenOrderIntentHandler = {
canHandle(handlerInput) {
return handlerInput.requestEnvelope.request.type === "IntentRequest"
&& handlerInput.requestEnvelope.request.intent.name === "OrderIntent"
&& handlerInput.requestEnvelope.request.intent.slots.drink.value
&& handlerInput.requestEnvelope.request.intent.slots.drink.value === 'tea'
&& !handlerInput.requestEnvelope.request.intent.slots.teaType.value
},
handle(handlerInput) {
return handlerInput.responseBuilder
.speak("Which would you like black, green, oolong, or white tea?")
.reprompt("Would you like a black, green, oolong, or white tea?")
.addElicitSlotDirective('teaType')
.getResponse();
}
}
As you can see, these handlers are very similar. If we were to add another drink like boba and we wanted to elicit the bobaType slot when the customer wants boba, we could simply change the drink value check to boba and the teaType has a value to check to check for bobaType instead, and update the handle fuction to elicit the bobaType slot.
Once we've collected all the necessary slots, we will need to place the customer's drink order. For our sample, we'll simply have Alexa repeat their order back. To do that we need to make a handler that runs when dialog management has completed collecting the slots.
CompletedOrderIntentHandler's canHandle function will true when:
Translated into code:
canHandle(handlerInput) {
return handlerInput.requestEnvelope.request.type === "IntentRequest"
&& handlerInput.requestEnvelope.request.intent.name === "OrderIntent"
&& handlerInput.requestEnvelope.request.dialogState === "COMPLETED";
}
Once we're done, we are going to access our slots and build up the response based upon the customer's drink selection.
const drink = handlerInput.requestEnvelope.request.intent.slots.drink.value;
const type;
if (drink === 'coffee') {
type = handlerInput.requestEnvelope.request.intent.slots.coffeeRoast.value;
} else if (drink === 'tea') {
type = handlerInput.requestEnvelope.request.intent.slots.teaType.value;
} else {
type = 'water';
}
The code above will set the type value based upon the user's drink selection. For example, if the user said, "I want coffee" and then answered "medium" for their preferred roast, then type will be set to medium.
Now that we are able to set type based upon the drink value, all we need to do now is build and return a response:
const speechText = `It looks like you want ${type} ${drink}`;
return handlerInput.responseBuilder
.speak(speechText)
.getResponse();
Here we are simply building a string to tell the user what they ordered and return a response with the ResponseBuilder.
Following these steps enables you to leverage dialog management to easily change what your skill will prompt for next based upon the value of a slot. Dialog management takes care of keeping the conversation open based upon the required slots, but it allows you to elicit any slot that belongs to the intent with ElicitSlot even if the slot isn't marked required.
Now that you've read through this post, try to think about how you can put these technique to use in your own skills. Let's continue the discussion online! You can find me on Twitter @SleepyDeveloper.