Getting Started with the Cake Time Course: Designing the Voice User Interface (VUI) for Your Alexa Skill

Justin Jeffress Jun 04, 2019

Editor’s Note: We recently launched a new training course called Cake Time: Build an Engaging Alexa Skill. The course is a self-paced, skill-building course that offers guidance on how to build a high-quality skill from start to finish. In this 4-part blog series, we’re going to dive deep into the coding modules within the course.

Editor’s Note: We have changed the name of the Alexa skill in our beginner tutorial from Cake Walk to Cake Time given the term’s racially insensitive history.

Great voice design is the foundation for an engaging Alexa skill, which is why it is the starting point for the Cake Time course. Before we dive into design, I want you to take a moment and ask yourself the following: “Who leads the interaction between customers and an Alexa skill? Alexa or the Customer?”

Design is an important part of building voice-first user interfaces. Unlike screen-based experiences, a major part of designing the experience is mimicking a human conversational partner. Even simple skills can benefit from a well-thought out design. Before you write one line of code, you should work really hard to think through how your customers will interact with your skill. At the very least, your skill should be:useful, simple and sticky. Useful skills provide value to customers. Simple skills surprise and delight customers. By simple, I mean simple to interact with; no one likes calling a customer service phone tree. Sticky skills retain customer interest and inspire them to keep coming back. As you can see, your customer is a major part of the design. Customers interact with your skill through conversation. So, you should always start from the conversation and work back to a solution.

When my colleagues and I came together to bake the Cake Time sample skill (pun intended) for our course, we started with a simple idea and moved immediately into thinking about how we’d converse with the skill. We let the conversation drive the design and the features we would need to implement. We made sure the interaction was simple and that the experience would inspire our customer to keep coming back for more. We started with our happy path, which represents the simplest interaction our customer would have with our skill. Once we finished the design, we wrote a definition that defined the skill. Here’s what we ended up with:

Cake Time is a skill that celebrates your birthday! Tell it your birthday to have it count down the days. Interact with the skill on your special day to hear a happy birthday message.

The skill is useful, simple to use, and sticky. It gives the user a reason keep checking in and using the skill day after day. We also thought about how we could monetize the skill. Once our customer shares their date of birth with the skill, we’ll know their astrological sign, so we can offer a daily horoscope for a monthly subscription. For the rest of this post, we’ll focus on how we designed the core voice experience for Cake Time.

To help guide our conversation-focused design process, we’ve thrown away flow charts. We’ve started with situational design, which is a framework for designing voice-first experiences.

Starting with Situational Design

Situational design is a voice-first method to design a voice user interface. You start with a simple dialog that helps keep the focus of your skill on the conversation. Each interaction between your customer and the skill represents a turn. Each turn has a situation that represents the context. If it’s the customer’s first time interacting with the skill, there is a set of data that is yet unknown. If the customer is in the middle of playing a game, the situation is “game in progress.” The situation affects how the skill will respond to the customer. For example, a shopping skill with an empty shopping cart shouldn’t offer to let the customer check out. Likewise if the customer asks to checkout with an empty cart, the skill’s response should indicate that their cart is empty and assist them with shopping.

Now it’s time to revisit the question I asked at the beginning of this post. Who leads the interaction between customers and an Alexa skill? Alexa or the Customer? If you said Alexa, you’re incorrect. From the start of the interaction, your customer leads the conversation. They call out to your skill and it responds based upon the context (or the situation).

Take a look at the script below. Our customer started the conversation and our skill responded. We’ve drawn a blue box around each call and response that represents a turn in the interaction. We’ve also drawn a red box around the situation.

Once you’ve converted your happy path into cards, you can start trying to break your happy path by testing out unexpected utterances. For each unexpected utterance you can create a new card to handle it. This approach is much better than a flow chart because you’re focused on the dialog and not on arrows.

Now that we have a basic understanding of how situational design works, let’s identify our scenarios for Cake Time and walk through the storyboards that represent them. My colleagues and I identified three main scenarios:

Birthday Unknown
Birthday Known, Not Birthday
Birthday Known, Birthday

Let’s take a deep dive into the storyboards we created for each scenario.

Scenario 1: Birthday Unknown

The first time the customer opens the skill, their birthday is unknown. Our skill needs to ask our customer when they were born. Since we started with the happy path, we assumed our customer provided the month, day, and year they were born in their response. We identified that we would need to save this information somewhere otherwise we’d have to ask them every time they opened the skill. We also wanted to indicate that we understood the user, so we used an implicit confirmation by repeating back their birthday. The storyboard below represents our happy path.

We’ve accounted for our happy path, and now we need to think about how things could differ. Our customer could just say, “November seventh” leaving out the year. We identified that we need a new card. Our situation is month: known, day: known, year: unknown. Before we create our new card, we’re also going to make some updates to our second card. We have incomplete information and we need to follow up with a question. So instead of using an implicit confirmation, we’ll change our response to give some context to the question we’re about to ask. So we’ll say, “I was born in two thousand fifteen.” This will be our prompt. Our question will be, “What year were you born?” Our new card will represent the customer telling us the year. Now that we know the month, day and year, we can respond with our implicit confirmation.

To support an unexpected utterance, all we had to do was add a new card and shift our implicit confirmation over to the new one. We can also create storyboard to represent the case where our customer only told us just the month. The situation would month: known, day: unknown, year: unknown. In this case we can reuse our card that asks for the year, and add a new card that asks for the day and insert in between the month and day. By now you can see that is so much more flexible than a flow chart.

Our customer could also say, “Today!”, or “Next tuesday!”, but we decided to leave that up to you to figure out how to solve. Hint: You can use the Amazon.Date slot type.

Now that we have storyboards that can handle collecting our customer’s birthday, let’s take a look at how the skill responds when it’s not their birthday.

Scenario 2: Birthday Known, Not Birthday

When we know our customer’s birthday and they open the skill on any day but their birthday, we need to count down the number of days until their next. In this case we only have one turn, one card.

Scenario 3: Birthday Known, Birthday

When our customer opens the skill on their birthday, we need to wish them happy birthday. In this case we will only have one turn, one card.

Changing the Birthday

Our skill already supports changing our customer’s birthday. Our customer can say, “Alexa, tell Cake Time I was born July 12, 1981.” and the skill will update their birthday. This is known as a one-shot since it will go straight to our CaptureBirthdayIntent. Some customers may not be aware of how to use a one-shot so, we should think about how to modify the skill so that customers can modify their birthday once the skill has been opened. That’s a challenge I’ll leave to you.

Implementing Cake Time

Thanks to situational design, we have created a useful, simple, and sticky skill. We can now start building our skill. We’re going to build it over a series of blog posts, which will allow us to zoom in on specific concepts throughout the skill-building process. We’ll start with creating a simple interaction that just greets the user and says, “Hello welcome to Cake Time. That was a piece of cake. Bye!”

If you want to start building right away, you can follow the steps below:

Create an account on developer.amazon.com/alexa
Navigate to the developer console
- Click ‘Create Skill’
- Name your skill ‘Cake Time’
- Select ‘Custom’ and ‘Hosted Skill’
- The infrastructure to support your skill will be provisioned - this will take a few moments.
Skills are composed of a front end and a back end. The front end is where you map spoken phrases (what the user says, which we call “utterances”) into a desired action, which we call an “intent.” The intent is what you want to have happen as a result of what the user said. It’s up to you in your back end to decide how to handle the user’s intent.
The first thing a user will want to do with your skill is open it! Opening your skill, in this case, is their intent. Opening a skill is a special kind of intent known as a LaunchRequest that is built into the experience, so you don’t need to define it in your front end. However, you do have to respond to it in your back end skill code.
The user is likely to say something like “Alexa, open Cake Time!” The built-in launch request will understand this intent. It’s up to us to go to the back end and define how to handle it.
Click Code.
A handler is the Alexa Skills Kit (ASK) Software Development Kit’s (SDK) way of defining a reaction to a user’s request. There are two pieces to a handler 1.) a canHandle and 2.) a handle. The canHandle function is where you define whether or not the particular handler can deal with the incoming user request. For example, if your skill receives a LaunchRequest, the canHandle function within each handler determines whether or not that handler can service the request. In this case, the user wants to launch the skill. So the canHandle function within the LaunchRequestHandler will ‘raise it’s hand’ to let the SDK know “I can handle this!” In computer terms, the canHandle returns ‘True’ to confirm that it can do the work.
So what do we want to happen after user launches the Cake Time skill? How should the skill respond? In this case, we want the skill to simply confirm that the user accessed it. Let’s have the skill respond “Hello! I am Cake Time. That was a piece of cake! Bye!”
To make the skill respond, we need to zero in on the handle function within the LaunchRequestHandler. This handle function will use something in the SDK called responseBuilder to compose and return the response to the user when they say “Open Cake Time”. You’ll notice that there are already some lines of code there. Let’s briefly examine each one.
- First, you’ll see a variable called speechText. This is set to contain the string of words the skill should say back to the user when they launch the skill. Right now what it says doesn’t really make sense for our skill. So, let’s change it! For those who aren’t familiar with programming, strings are encapsulated in double or single quotes. So, to change the string, just replace the text within the quotes to say “Hello! I am Cake Time. That was a piece of cake! Bye!”
- Underneath the speech text, you’ll see a line that begins with the word return. We’ll get back to what this means in just a moment. Following the return you’ll see handler.responseBuilder. Remember that we mentioned this earlier? This helpful piece of the SDK will build our response to the user.
- On the next line you’ll see .speak(speechText). Recognize speechText? Calling the .speak() function and passing speechText to it, tells responseBuilder to speak the value of speechText back to the user.
- Next, there’s a line called reprompt. If our skill were going to listen for the user’s response, we would use this. In this case, we want the skill to speak, and then exit. So, we’ll omit this line of code for now. The easy way to do that is to place a double slash // in front of it.
- Last, you’ll see .getResponse(). This converts the responseBuilder’s work into the response that our skill will return. You can think of it like hitting the send button.
Now you’ve build the code that will service skill’s LaunchRequest. Before doing anything else, we need to save our changes, and deploy the code. Click save and click deploy (wait for it to deploy).
Now we can test! Click on the Test tab. First we’ll need to enable our skill for testing. Click on the drop-down menu and select Development - You have not published your skill to the store, but you can test it out by using it from this console in development mode.
- There are two ways to test. You can type to the skill what the user is saying (be careful, spelling matters!) or, you can speak to the skill by clicking and holding on the microphone icon and speaking.
- Our skill, Cake Time, has one intent. That special intent known as the LaunchRequest responds to the user when they ask Alexa to open or launch the skill. The user will say, “Alexa, open Cake Time.” Cake Time is the name of our skill, and also the ‘invocation name’. When we named the skill, it’s invocation name was automatically set to “cake time”. This can be changed from the invocation name settings area of the Build tab - but we’ll leave it as is. Go ahead and try using the skill with “Alexa, open Cake Time”.
- You can see, and hopefully hear, Alexa responding in the way that we programmed!
Congratulations! You’ve laid the foundation for Cake Time. Stay tuned for the next post in this series to learn how to collect the month, day and year slots with auto-delegation.

Conclusion

Situational design is a great way to keep yourself focused on the conversations that your customers have with your skill. The process is more adaptive than a flow chart. It helped us build a solid representation of our skill and we were able to identify some holes in the experience. To get started with situational design, think about a skill that you want to build or one that you built in the past. Start with your happy path and turn each turn into cards then take those cards and combine them into a storyboard. Then try to break your happy path with unexpected utterances. Create a new card for each unexpected utterance. This will help you identify holes, necessary features and technology that you may have missed during your initial planning.

Let’s keep in touch. If you have any skill ideas or questions about situational design please reach out to me on twitter @SleepyDeveloper.

Getting Started with the Cake Time Course: Designing the Voice User Interface (VUI) for Your Alexa Skill

Starting with Situational Design

Scenario 1: Birthday Unknown

Scenario 2: Birthday Known, Not Birthday

Scenario 3: Birthday Known, Birthday

Changing the Birthday

Implementing Cake Time

Conclusion

Related Content

Alexa Skills Kit

Resources

Alexa Voice Service

AVS Resources

Connected Devices

Agreements

Blogs

Support