Build for Dialog: Solving the Order Problem When Designing Conversational Alexa Skills : Alexa Blogs

Alexa Blogs

Want the latest?

Subscribe to Alexa Announcements
Subscribe via RSS

alexa topics

Understanding Designing for Conversation

The Foodie is a conversational skill because it adheres to the 4 Hallmarks of Conversational Skills. The Foodie skill can:

Collect information from the customer through natural back and forth dialog
Solve a complex problem for customers without creating confusion
Confirm understanding with customers to ensure shared context
Remember past interactions to make future ones quicker and simpler

The above hallmarks are very important. Let's take a look at how odd the interaction would be if The Foodie required the customer provide all the information in one breath:

That was awkward, wasn't it? Luckily The Foodie follows the 4 Hallmarks of Conversational Skills and was not designed based on flow charts. It's flexible and allows the customer to provide answers to questions in any order they choose. It is forgiving and will respond with follow-up questions until the minimum set of required slots have been collected. If the customer already knows they want a hamburger for lunch, they can simply say so and it will recommend a few restaurants with hamburgers on the menu.

Let's take a look at how we can avoid building a rigid phone-tree menuing system into our skill and enable more conversational dialog.

Design the Conversation and Identify Slots

You'll want to start by writing a happy path script. The happy path script is the ideal conversation that you envision your customer having with your skill. It allows you to focus on defining the core set of features for your skill. You'll want to write multiple happy path scripts to demonstrate how interactions with your skill will change over time and space. For more information about designing your happy path check out Lesson 5: Script it out of our free online course, Designing for Conversation.

Starting from our first-time use happy path, we wrote the following script and determined what information we need to collect from our customer in order to provide a meal and restaurant recommendation.

Customer: Alexa, open the foodie

Alexa: Howdy! Welcome to the Foodie! I'll help you find the right food right now. To make that easier, you can give me permission to access your location, just check the Alexa app. How rude of me. I forgot to ask. What's your name?

Customer: My name is Justin.

Alexa: Hi Justin nice to meet you. I recommend cuisines based on flavor profiles. You can say things like spicy, savory, greasy, and fresh. What sounds good?

Customer: I want something savory and spicy.

Alexa: Ah! I'd recommend something Korean or Spanish. Which of those sounds better right now?

Customer: Korean sounds great!

Alexa: Great choice! Do you have any food allergies or preferences that I should be aware of? You can say peanuts, shellfhish, or I'm a vegetarian.

Customer: I'm allergic to shellfish and I'm pescatarian.

Alexa: I'll remember not to recommend anythign with shellfish and that you eat plants and fish. Great, I've found 3 meals: Domi Maeuntang, Mae Un Tang and Daegu Jorim. Which sounds best?

Customer: Mae Un Tang sounds great!

Alexa: Would you like to cook, go out, or eat in?

Customer: I want to get out of the office.

Alexa: Ok, one last thing, so that I can recommend something close by, what's your location?

Customer: Seattle, Washington

Alexa: Great. There are two Korean restaruants nearby serving something similar to Mae Un Tang. Korean Bamboo and One Pot. Which would you like to visit?

Customer: Korean Bamboo.

Alexa: Great! I've sent the address to the Alexa app.

The information The Foodie needs comes from the customer's utterances, so lets take a closer look at those.

I want something savory and spicy.
Korean sounds great!
I'm allergic to shellfish and I'm pescatarian.
Mae Un Tang sounds great!
I want to get out of the office.
Seattle, Washington.
Korean Bamboo.

The items in bold will become the slots The Foodie will use to capture our customer's preferences. Let's take a look at the utterances after we've converted the bold information into slots.

I want something {cuisine}.
{cuisine} sounds great!
I'm allergic to {allergies} and I'm {diet}.
{meal} sounds great!
I want to {diningLocation}.
{city}, {state}.
{restaurantName}.

Notice how the first two utterances both use the {cuisine} slot. We're leveraging entity resolution to map our flavors directly to corresponding cuisines. This way, if the customer specifies spicy and savory, we can present the user with Korean, Spanish, and Indian, for example. Also note that we didn't list off every flavor profile available. We don't want to overwhelm our customer with too many choices. Three is okay, but keep it less than five.

Use Dialog Management to Add Flexibility to Your Skill

Our happy path demonstrates the ideal straight line path that our customer follows through our skill. But just like sailing in a boat, it's highly likely that the conversation will deviate. Instead of wind, gravity, and currents pushing and pulling you off course, the open-ended nature of conversation will do the pushing and pulling for you.

Instead of tacking, we can use dialog management to collect information conversationally. It keeps track of what information is required, the state of the conversation and what slots have and haven't been collected. In your skill code, you can inspect the state and determine what to do next. For example, if the customer says, "I want a hamburger," which will fill the meal slot, at that point even if there are still required slots that haven't been collected, you know what they want so you can look up some restaurants close by that sell hamburgers and make a recommendation.

Building the Voice User Interface

Once we've vetted our design, it's time to finally start implementing. To build your voice user interaction model, you will need to:

Create an intent
Define sample utterances
Create slots
Define and assign SlotTypes
Mark necessary slots required
Provide prompts for each required slot
Return the Dialog.Delegate directive from your skill's back-end code

You can find detailed instructions as to how to create your voice user interface from step 1 to 3 in the Designing for Conversation course.

Build the Back End with Dialog Management

Following the steps above, you will end up with a voice user interaction model that supports dialog management. If you created a skill from scratch, the default behavior is for the skill to automatically delegate. For The Foodie, we want fine-grained control, which will allow us to decide when to stop collecting slots. For example, if our customer were to say, "I want a cheeseburger," we can skip asking for the cuisine slot because we already know what meal they want to eat.

You'll want to turn off automatic delegation by setting the Dialog Delegation Strategy to disable auto delegation.

Now that you've turned off automatic delegate, you'll need to update your backend code so it returns a Dialog Directive which will have Alexa automatically prompt for the next empty required slot. The following code will do so:

return handlerInput.responseBuilder
    .addDelegateDirective()
    .getResponse();

The addDelegateDirective() function adds the Dialog.Delegate directive to the response JSON that our skill sends back to the Alexa service. The Alexa service will then figure out whether or not your intent still has required slots that need to be filled. For example, if our customer said, "I want Japanese food" and our skill's back end returned the Dialog.Delegate directive, our skill would automatically prompt for the allergies slot. Alexa will use the prompt for the allergies slot that we defined in our voice user interaction model.

You can find more information about dialog management and how it relates to The Foodie in steps 4 and 5 of the Designing for Conversation course. You can also take a look at The Foodie source code on GitHub.

Why Dialog Management

Dialog management is a great tool to facilitate collecting the information your skill needs through conversation. It also improves accuracy because the Alexa service has improved focus on which slot the customer is going to fill next. The very nature of conversation is dynamic. If your customer provides some of the necessary information your skill needs, your skill will be able to follow up with questions until all the information that you need has been provided.

Like a sailor who relies on tacking to constantly course correct to sail to their destination, voice designers will want to utilize dialog management to handle the dynamic nature of conversation. Dialog management allows you to course correct when the information customers provide your skill is too little or just right.

Now that you've read through this post, try to think about how you can put these techniques and features to use in your own skills. Let's continue the discussion online! You can find me on Twitter @SleepyDeveloper.

Alexa Blogs

Want the latest?

alexa topics

Recent Posts

Archive

Understanding Designing for Conversation

Design the Conversation and Identify Slots

Use Dialog Management to Add Flexibility to Your Skill

Building the Voice User Interface

Build the Back End with Dialog Management

Why Dialog Management

Related Resources

Want the latest?

alexa topics

Recent Posts

Archive