There are tens of millions of Alexa-enabled devices out there today that allow you to communicate with Alexa across multiple mediums. Amazon just recently launched the new Fire TV Cube, which is the first hands-free 4K Ultra HD streaming media player with Alexa—delivering an all-in-one entertainment experience. Now, Alexa skill developers can create a voice-first experience with a large screen in mind.
Here I’ll walk you through some tips for building an Alexa skill for Fire TV Cube, using core best practices for creating voice-first experiences for devices with screens.
Fire TV Cube can enable all published Alexa skills. This means that customers can use the skills you built for any Echo device, including Echo Show and Echo Spot, on Fire TV Cube.
For the most part, you can assume that a customer using Fire TV Cube will interact with your custom skill as they would with any other Echo device with a screen. They will primarily use it for voice interactions, reference the Alexa app to see display cards, and then look toward the screen to view the display templates.
As with any device, consider where and how a customer might be using your skill when crafting visual responses. With Echo Spot, for example, we previously advised to make the primary content of each template visible and recognizable from up to five feet away. For the Echo Show display, we advised a template that is visible from seven feet away. With Fire TV Cube, we suggest designing for something 10 feet away or more.
So, how should you optimize your skill design across devices? The best approach is to keep it simple. Regardless of the visual response, the display should always be second to what Alexa is saying. Rely on your voice experience, and use the screens to complement that experience.
As with any multimodal skill, you can choose what interfaces—voice user interfaces (VUI) or graphical user interfaces (GUI)—you can support. Regardless of the interface, you want to assure you are delivering an experience that can be handled on any device.
Developing an Alexa skill with a TV could easily be interpreted as creating a skill that more heavily relies on visuals, thus making them necessary for a standout skill. This is not the case for voice-first experiences: If you do not incorporate screens into your custom skill, Fire TV Cube will display the cards you provide for the Alexa app.
To add displays as a supported interface into your Alexa skill, you need to edit your skill’s manifest. There are two easy ways to do this—programmatically or through the Alexa Developer Console.
In your skill.json, under apis.custom.interfaces, add a type RENDER_TEMPLATE. This is essentially saying that you want your skill to support rendering templates, so a response including a render template directive should be interpreted as valid.
"apis": {
"custom": {
"endpoint": {
"sourceDir": "lambda/custom"
},
"interfaces": [
{
"type": "RENDER_TEMPLATE"
}
]
}
},
Once you have added the interface, you need to add the required display intents to your interaction model.
{
"interactionModel": {
"languageModel": {
"invocationName": "my custom skill",
"intents": [
{
"name": "AMAZON.MoreIntent",
"samples": []
}, {
"name": "AMAZON.NavigateHomeIntent",
"samples": []
},{
"name": "AMAZON.NavigateSettingsIntent",
"samples": []
},{
"name": "AMAZON.NextIntent",
"samples": []
},{
"name": "AMAZON.PageUpIntent",
"samples": []
},{
"name": "AMAZON.PageDownIntent",
"samples": []
},{
"name": "AMAZON.PreviousIntent",
"samples": []
},{
"name": "AMAZON.ScrollRightIntent",
"samples": []
},{
"name": "AMAZON.ScrollDownIntent",
"samples": []
},{
"name": "AMAZON.ScrollLeftIntent",
"samples": []
},{
"name": "AMAZON.ScrollUpIntent",
"samples": []
},
...
Another way to do this is through the developer console. Navigate to your skill and scroll to interfaces. Once there, toggle “Display Interface,” save and build your skill. This will enable rendering templates in your skill and automatically add all of the required intents to your interaction model.
Now that you have enabled templates to be enabled within your skill, you need to assure you are incorporating them into your response at the appropriate time. Before adding the template to your response, you need to assure that the current device supports the display interface. To do so, you can use the supportsDisplay()function in the Alexa Skill-Building Cookbook. Then, within your skill code you can use the function to determine if you should include the template in your responseBuilder.
if (supportsDisplay.call(this, handlerInput)) {
// insert render template code here
}
We include this condition because your code needs to respond to both cases where the screen is or is not supported.
There are two categories of display templates that you can use within your custom skill: Body Templates and List Templates. A great example of how both template types are handled can be seen in the demo display directive. Note that the skill builds with an Echo Show in mind, but the use of Body Templates and List Templates renders similarly across devices.
Remember that each template is displayed alongside a speech response. Currently, you cannot render a template without a user prompting a response. It is bad practice to render a template without Alexa saying something to go with it. Along with that, you can only render one template per response. The templates should help direct the conversation.
At its core, a display template is just a JSON file. Each template has fields you can specify, such as title, backgroundImage, backButton, etc. Regardless of which template you are using, at minimum, to create the display you need to follow three steps:
From there, you can specify various fields to build a complementary display. Here is an example of rendering ListTemplate2 in the Alexa Skills Kit (ASK) Software Development Kit (SDK) for Node.js.
if (supportsDisplay.call(this, handlerInput)) {
handlerInput.responseBuilder.addRenderTemplateDirective({
type: 'ListTemplate2',
backButton: 'visible',
backgroundImage: 'http://background-image-src.com/image.jpg',
title: 'This is my title',
listItems: [
{
image: 'http://search-image-src.com/image1.jpg',
textContent: 'This is my list item 1',
}, {
image: 'http://search-image-src.com/image2.jpg',
textContent: 'This is my list item 2',
}, {
image: 'http://search-image-src.com/image3.jpg',
textContent: 'This is my list item 3',
}
]});
}
Templates can become large. To reduce the size of your skill code file, consider decoupling your template code from your skill code and hosting it or organizing it into a separate file. You can use data-binding logic to incorporate speech text into your template. Doing so allows for quick editing, reusing your templates across multiple skills, and more readable code.
Remember that Fire TV Cube is also not touch-enabled and the customer will be using a remote control. The touch components on the display templates will be translated to be selectable. Assure that you are using the templates to intuitively handle both touch and click. See the Display Interface Reference for more information.
To accommodate clicks within your template, all you have to do is add a token attribute to whatever you want to be selectable. Following our previous example, here is each list item with a token attribute, making the image + text together selectable.
listItems: [
{
token: "listItem1",
image: 'http://search-image-src.com/image1.jpg',
textContent: 'This is my list item 1',
}, {
token: "listItem2",
image: 'http://search-image-src.com/image2.jpg',
textContent: 'This is my list item 2',
}, {
token: "listItem3",
image: 'http://search-image-src.com/image3.jpg',
textContent: 'This is my list item 3',
}
]
Every touchable or clickable item should lead to a response that is already incorporated into your voice interaction. In other words, every event from one of these items should be discoverable via voice. For example, when a customer clicks on a list item, the display might show them more information about that particular entry. Another way to navigate to it could be by the customer saying, “Tell me more about list item one.”
To handle the touch event fired from selecting one of these items, you need to incorporate the events in the canHandle of the appropriate intent. Doing so is a simple Boolean statement. Here is an example using the ASK SDK for Node.js:
canHandle(handlerInput) {
const request = handlerInput.requestEnvelope.request;
const hasBeenClicked = request.type === 'Display.ElementSelected'
&& request.token === 'listItem1';
return hasBeenClicked
|| (request.type === 'IntentRequest'
&& request.intent.name === 'MoreInfoIntent');
}
You can have multiple tokens navigate to the same intent, and adjust the response according to whatever token was selected. Logically, tokens can be used alongside slot logic. For example, if a customer says, “Tell me more about list item one,” then “list item one” could be the slot value for the MoreInfoIntent. You could use the same slot evaluation logic to evaluate what to do when the token has been clicked.
Particularly with Fire TV Cube, you will probably get instances of a customer wanting to view something with their conversation. Using a Video or Audio App directive in your skill is easy to handle and navigate to within your templates. Both of these directives can be initiated via voice or click.
To use the directives, you will also need to incorporate the Video Player or Audio Player interfaces into your skill manifest. Indicating these interfaces is done similarly to the Render Template interface.
Both of these directives also have required, built-in intents. These are AMAZON.PauseIntent and AMAZON.ResumeIntent.
Beyond that, the implementation using both of these is very similar. You can reference the Alexa Audio Player skill sample to view the implementation. The sample demonstrates how to use single- or multiple-streams in your skill.
When building a custom skill with display for Fire TV Cube, remember to build a voice-first experience with templates that create harmony with what Alexa is saying. Cater your experience to a large audience and a wide set of Alexa-enabled devices. And remember to keep it simple.
Check out these resources to find out more information on Fire TV Cube and how to build multimodal skills: