Conversations don't follow a predetermined script. They ebb and flow throughout time and space between one or more participants. This shared context along with multiple participants leads conversations to deviate.
When designing the interaction model for your Alexa skill, it may be tempting to try to map out all of the possible ways the conversation could take place with a flow chart. This approach severly limits how customers can interact with your skill since flow charts restrict what the customer can say at each turn of the interaction and often results in a unnatural and frustrating experience. Consider the following example:
Customer: Alexa, open movie time.
Alexa: Welcome to movie time. I can help you find a movie and book tickets. Where would you like to see a movie, Northgate or Lynwood?
Customer: I want to see Bohemian Rhapsody!
Alexa: I'm sorry. Bohemian Rhapsody is not a movie theatre. If you'd like to search a different way, say "main menu" to return to the main menu.
Customer: Forget this. I'm looking this up on my phone.
See how awkward and annoying that is? The customer already knows what movie they want to see, but the skill is too rigid and requires the customer to jump back to the main menu to start over. When designing your voice experience, avoid locking your customers into a phone tree.
Instead, your interaction model should mirror sailing. With sailing, there are many variables like wind, current, and gravity acting upon a sailboat that it's pretty much impossible to sail in a straight line. Sailors use a method called tacking to constantly course correct as they sail to their destination. Like sailing, participating in conversation requires constant attention to the flow of dialog and adapting to what's happening around you, not just plowing ahead towards a goal.
This post will cover design practices that can help you build a conversational voice experience. We’ll use an example skill called The Foodie to illustrate the concepts.
The Foodie is a conversational skill because it adheres to the 4 Hallmarks of Conversational Skills. The Foodie skill can:
The above hallmarks are very important. Let's take a look at how odd the interaction would be if The Foodie required the customer provide all the information in one breath:
That was awkward, wasn't it? Luckily The Foodie follows the 4 Hallmarks of Conversational Skills and was not designed based on flow charts. It's flexible and allows the customer to provide answers to questions in any order they choose. It is forgiving and will respond with follow-up questions until the minimum set of required slots have been collected. If the customer already knows they want a hamburger for lunch, they can simply say so and it will recommend a few restaurants with hamburgers on the menu.
Let's take a look at how we can avoid building a rigid phone-tree menuing system into our skill and enable more conversational dialog.
You'll want to start by writing a happy path script. The happy path script is the ideal conversation that you envision your customer having with your skill. It allows you to focus on defining the core set of features for your skill. You'll want to write multiple happy path scripts to demonstrate how interactions with your skill will change over time and space. For more information about designing your happy path check out Lesson 5: Script it out of our free online course, Designing for Conversation.
Starting from our first-time use happy path, we wrote the following script and determined what information we need to collect from our customer in order to provide a meal and restaurant recommendation.
Customer: Alexa, open the foodie
Alexa: Howdy! Welcome to the Foodie! I'll help you find the right food right now. To make that easier, you can give me permission to access your location, just check the Alexa app. How rude of me. I forgot to ask. What's your name?
Customer: My name is Justin.
Alexa: Hi Justin nice to meet you. I recommend cuisines based on flavor profiles. You can say things like spicy, savory, greasy, and fresh. What sounds good?
Customer: I want something savory and spicy.
Alexa: Ah! I'd recommend something Korean or Spanish. Which of those sounds better right now?
Customer: Korean sounds great!
Alexa: Great choice! Do you have any food allergies or preferences that I should be aware of? You can say peanuts, shellfhish, or I'm a vegetarian.
Customer: I'm allergic to shellfish and I'm pescatarian.
Alexa: I'll remember not to recommend anythign with shellfish and that you eat plants and fish. Great, I've found 3 meals: Domi Maeuntang, Mae Un Tang and Daegu Jorim. Which sounds best?
Customer: Mae Un Tang sounds great!
Alexa: Would you like to cook, go out, or eat in?
Customer: I want to get out of the office.
Alexa: Ok, one last thing, so that I can recommend something close by, what's your location?
Customer: Seattle, Washington
Alexa: Great. There are two Korean restaruants nearby serving something similar to Mae Un Tang. Korean Bamboo and One Pot. Which would you like to visit?
Customer: Korean Bamboo.
Alexa: Great! I've sent the address to the Alexa app.
The information The Foodie needs comes from the customer's utterances, so lets take a closer look at those.
I want something savory and spicy.
Korean sounds great!
I'm allergic to shellfish and I'm pescatarian.
Mae Un Tang sounds great!
I want to get out of the office.
Seattle, Washington.
Korean Bamboo.
The items in bold will become the slots The Foodie will use to capture our customer's preferences. Let's take a look at the utterances after we've converted the bold information into slots.
I want something {cuisine}.
{cuisine} sounds great!
I'm allergic to {allergies} and I'm {diet}.
{meal} sounds great!
I want to {diningLocation}.
{city}, {state}.
{restaurantName}.
Notice how the first two utterances both use the {cuisine} slot. We're leveraging entity resolution to map our flavors directly to corresponding cuisines. This way, if the customer specifies spicy and savory, we can present the user with Korean, Spanish, and Indian, for example. Also note that we didn't list off every flavor profile available. We don't want to overwhelm our customer with too many choices. Three is okay, but keep it less than five.
Our happy path demonstrates the ideal straight line path that our customer follows through our skill. But just like sailing in a boat, it's highly likely that the conversation will deviate. Instead of wind, gravity, and currents pushing and pulling you off course, the open-ended nature of conversation will do the pushing and pulling for you.
Instead of tacking, we can use dialog management to collect information conversationally. It keeps track of what information is required, the state of the conversation and what slots have and haven't been collected. In your skill code, you can inspect the state and determine what to do next. For example, if the customer says, "I want a hamburger," which will fill the meal slot, at that point even if there are still required slots that haven't been collected, you know what they want so you can look up some restaurants close by that sell hamburgers and make a recommendation.
Once we've vetted our design, it's time to finally start implementing. To build your voice user interaction model, you will need to:
You can find detailed instructions as to how to create your voice user interface from step 1 to 3 in the Designing for Conversation course.
Following the steps above, you will end up with a voice user interaction model that supports dialog management. If you created a skill from scratch, the default behavior is for the skill to automatically delegate. For The Foodie, we want fine-grained control, which will allow us to decide when to stop collecting slots. For example, if our customer were to say, "I want a cheeseburger," we can skip asking for the cuisine slot because we already know what meal they want to eat.
You'll want to turn off automatic delegation by setting the Dialog Delegation Strategy to disable auto delegation.
Now that you've turned off automatic delegate, you'll need to update your backend code so it returns a Dialog Directive which will have Alexa automatically prompt for the next empty required slot. The following code will do so:
return handlerInput.responseBuilder
.addDelegateDirective()
.getResponse();
The addDelegateDirective() function adds the Dialog.Delegate directive to the response JSON that our skill sends back to the Alexa service. The Alexa service will then figure out whether or not your intent still has required slots that need to be filled. For example, if our customer said, "I want Japanese food" and our skill's back end returned the Dialog.Delegate directive, our skill would automatically prompt for the allergies slot. Alexa will use the prompt for the allergies slot that we defined in our voice user interaction model.
You can find more information about dialog management and how it relates to The Foodie in steps 4 and 5 of the Designing for Conversation course. You can also take a look at The Foodie source code on GitHub.
Dialog management is a great tool to facilitate collecting the information your skill needs through conversation. It also improves accuracy because the Alexa service has improved focus on which slot the customer is going to fill next. The very nature of conversation is dynamic. If your customer provides some of the necessary information your skill needs, your skill will be able to follow up with questions until all the information that you need has been provided.
Like a sailor who relies on tacking to constantly course correct to sail to their destination, voice designers will want to utilize dialog management to handle the dynamic nature of conversation. Dialog management allows you to course correct when the information customers provide your skill is too little or just right.
Now that you've read through this post, try to think about how you can put these techniques and features to use in your own skills. Let's continue the discussion online! You can find me on Twitter @SleepyDeveloper.