Creating an Alexa skill is like cooking a delicious meal. There are many recipes to choose from and several ingredients that go into each one. The Alexa Skill-Building Cookbook on GitHub gives you the recipes and ingredients to build engaging Alexa skills, using short code samples and guidance for adding features to your voice experience. With each installment of the Alexa skill recipe series, we’ll introduce you to a new recipe that can help you improve your voice design and skill engagement. You’re the chef. Let’s get cooking!
Alexa devices with screens like Echo Show and Echo Spot have opened up a new world of possibilities for what you can do with voice on devices with display. These devices enable you to extend your voice experience to show more information or enable touch interactions for things like picking from a list and watching videos. In this case, a visual user interface can be a great complement to a voice-first user experience.
As a skill builder, you can choose whether or not to specifically support a particular interface, such as screen display. For the best customer experience, however, you should plan to create a conditional workflow so that customers who use "headless" devices like Amazon Echo or Echo Plus can have an optimized experience, and so can the customers accessing your skill from Echo Show or Echo Spot. Even if the screen experience is not the focus of your skill, you should still think about how visual components could enhance your skill on devices with screens. For example, it wouldn't make sense for Alexa to say, "select an item on the screen for more information" if your skill is invoked from a headless Echo device.
While not every device has a screen, the user will quickly adapt and come to expect that your skill supports one. In fact, even if you do not take any steps to support screen display, the cards you provide for the Alexa app will be displayed on the screen device, if that is what the customer is using. If, however, you want to take full advantage of the options provided by a screen, such as the ability to select a particular image from a list or play a video, then you must specifically support screen display in your code. To do this, your skill should be able to detect if a device the request is coming from supports display.
The JSON request that your skill receives includes all the information you need to determine not just whether the device has a screen display, but also if it supports other interfaces, like Audio Player and Video. However, you can’t distinguish between Echo Show and Echo Spot. Let’s look closely at the JSON requests received from a variety of Alexa devices: Echo (no display screen), and Echo Show/Echo Spot (display screen).
As you can see, the request differs in the node “supportedInterfaces.” It reflects the availability of the interfaces in the device the request is coming from. Here’s a table that shows the interfaces available for Echo, Echo Show, Echo Spot, and Echosim.io.
Interface | AudioPlayer | Display | VideoApp |
Echo Show/Echo Spot | Yes | Yes | Yes |
Echo/Echo Dot/Echo Plus, Echosim.io | Yes | No | No |
First, for your skill to be able to serve on display devices, you need to enable it through the developer console as shown below.
Now that we understand how to determine if a device has display through the JSON request, let’s look at a simple helper function you can include in your skill that will give your skill the ability to optimize for a display device.
Step 1: Include this helper function in your skill code to detect if the device has display.
As you can see from the JSON above, to determine whether the device supports display, we need to check if the node “Display” exists within the “supportedInterfaces” node in the JSON request we receive. Here’s the helper function that can do that for you:
// returns true if the skill is running on a device with a display (Echo Show|Echo Spot)
function supportsDisplay() {
var hasDisplay =
this.event.context &&
this.event.context.System &&
this.event.context.System.device &&
this.event.context.System.device.supportedInterfaces &&
this.event.context.System.device.supportedInterfaces.Display
return hasDisplay;
}
You can also see this function in action in the quiz skill template.
Step 2: Call the helper function from within your intent to check if the device has display.
'findMovieByGenreIntent': function (){
//checking if the device has display by calling our supportsDisplay helper function and passing the JSON request received by the skill as an argument
if (supportsDisplay.call(this)) {
//device has display
}
else {
//device does not have display
}
}
Step 3: Respond differently (display vs. no-display).
Generally speaking, the customer will respond to a skill using different responses and different actions depending on whether the customer does or does not see a screen while using the skill. Now that your skill is able to detect if a device has display, your skill service code should reflect this difference and should reflect both types of interactions.
Here’s an example where after we detect if a device has display, we generate display using one of the body templates provided by the Alexa Skills Kit.
const Alexa = require('alexa-sdk');
const makePlainText = Alexa.utils.TextUtils.makePlainText;
const makeRichText = Alexa.utils.TextUtils.makeRichText;
const makeImage = Alexa.utils.ImageUtils.makeImage;
'findMovieByGenreIntent': function (){
var speechOutput
//checking if the device has display by calling our supportsDisplay helper function and passing the JSON request received by the skill as an argument
if (supportsDisplay.call(this)) { //if device has display, generate display using a template, and the speech output
var title = 'Mission Impossible';
var description = 'Ethan Hunt and his IMF team, along with some familiar allies, race against time after a mission gone wrong.';
var imageURL = 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTc3NjI2MjU0Nl5BMl5BanBnXkFtZTgwNDk3ODYxMTE@._V1_SY1000_CR0,0,675,1000_AL_.jpg'
speechOutput = description;
// building display directive
const builder = new Alexa.templateBuilders.BodyTemplate1Builder();
const template = builder.setTitle(title)
.setBackgroundImage(makeImage(imageURL))
.setTextContent(makeRichText('' + description + ''), null, null)
.build();
this.response.renderTemplate(template);
}
else{ //if device does not have display, simply respond back with speech
speechOutput = "Here's your " + genreRequested + "Here's your movie - " + title
}
this.response.speak(speechOutput);
this.emit(':responseReady');
}
For more recipes, visit the Alexa Skill Building Cookbook on GitHub.
Here are some more resources for designing multimodal skills for devices with screens:
Every month, developers can earn money for eligible skills that drive some of the highest customer engagement. Developers can increase their level of skill engagement and potentially earn more by improving their skill, building more skills, and making their skills available in in the US, UK and Germany. Learn more about our rewards program and start building today. Download our guide or watch our on-demand webinar for tips to build engaging skills.