A visual user interface can be a great complement to a voice-first user experience. Echo Show enables you to extend your voice experience to show more information or enable touch interactions for things like picking from a list and watching videos. In this blog post, we’ll walk you through how to build an Alexa skill using the core developer capabilities of Echo Show.
All published Alexa skills will automatically be available on Echo Show. Skills will display any skill cards you currently return in your response objects. This is the same as how customers see your Alexa skill cards on the Alexa app, Fire TV, and Fire Tablet devices today. If no skill card is included in your skill’s response, a default template shows your skill icon and skill name on Echo Show. To improve the experience of your skill for Echo Show, you should update it to make use of the new display templates. See the updated voice and graphic guide for more information.
The custom skills you have designed for Echo Show must take the following interactions into account:
When you develop a skill, you can choose whether or not to support specific interfaces, such as the display interface for screen display. You will want to create the best possible experience for your customer, regardless of which type of device they have. Even if the screen experience is not the focus of your skill, you should still consider what type of experience you are creating on those devices. The good news is that even if you take no steps to support screen display, the cards you provide for the Alexa app will be displayed on a screen device (Fire Tablet, Fire TV, Echo Show, and AVS-enabled devices that support display cards). If you want to take full advantage of the options provided by Echo Show, such as the ability to select a particular image from a list or play a video, then you must specifically support the Render Template directive in your code.
The best way to support both voice-only and multimodal device scenarios is to have your skill check for a device’s supported interfaces and then render the appropriate content. In general, the customer will respond to a skill in different ways based on whether the customer is using Echo Show or Echo. On Echo Show, your skill can accomplish this by parsing the value of event.context.System.device.supportedInterfaces.Display in the Alexa request which will indicate the supported interfaces. In the following example, parsing this JSON-formatted sample request indicates that supportedInterfaces includes AudioPlayer, Display, and VideoApp. If any of these is not listed as a SupportedInterface, that would mean that the unlisted interface is not supported.
Your skill service code needs to respond conditionally both to the case where these interfaces are not supported or are supported, such as Display.RenderTemplate on an Echo (screen-less) device, and an Echo Show device.
Below is an example showing when a display is supported (notice event.context.System.device.supportedInterfaces.Display).
{
"context":{
"device":{
"supportedInterfaces":{
"Display":{},
"AudioPlayer": {},
"VideoApp":{}
}
}
}
}
The Alexa Skills Kit provides two categories of display templates, each with several specifically defined templates for display on Echo Show:
These templates differ from each other in the size, number, and positioning of the text and images, but each template has a prescribed structure you can work within. When you, as the skill developer, construct a response that includes a display template, you specify the template name, text, and images, and markdown like font formatting and sizes, so you have latitude to provide the user experience you want.
The Echo Show screen interactions are created with the use of these new templates for the Display interface.
1. On the Skill Information page for your (new or existing) skill, in the Amazon Developer Portal, select Yes for Render Template. Note that you can also select Yes for Audio Player and Video App support, if you want those to be part of your skill. Display.RenderTemplate is the directive used to display content on Echo Show.
2. On the Interaction Model page, you can choose whether to use Skill Builder, or the default page, for building your interaction model.
3. In the service code that you write to implement your skill, implement each of these specified built-in intents as desired. Include the Display.RenderTemplate directive in your skill responses to display content on screen as appropriate, just as you would include other directives, as shown in the examples below.
To display interactive graphical content in your skill, you must use display templates. These templates are constructed so as to provide flexibility for you. For each of these templates, the strings for the text or image fields may be empty or null; however, list templates must include at least one list item.
Each body template adheres to the following general interface:
{
"type": string,
"token": string
}
Where the type is “RichText” or “PlainText” and the token is a name that you choose for the view.
Each list template adheres to the following general interface:
{
"type": string,
"token": string,
"listItems": [ ]
}
Where the type is “RichText” or “PlainText” and the token is a name that you choose for the view.
The template attribute identifies the template to be used, as well as all of the corresponding data to be used when rendering it. Here is the form for an object that contains a Display.RenderTemplate directive. The type property has the value of the template name, such as BodyTemplate1 in this example.
The other template properties will differ depending on the template type value.
{
"directives": [
{
"type": "Display.RenderTemplate",
"template": {
"type": "BodyTemplate1",
"token": "CheeseFactView",
"backButton": "HIDDEN",
"backgroundImage": ImageURL,
"title": "Did You Know?",
"textContent": {
"primaryText": {
"type": "RichText",
"text": "The world’s stinkiest cheese is from Northern France"
}
}
}
}
]
}
For context, see Display and Hint directive for and example response body that includes multiple directives.
Here is another example using BodyTemplate2 that will display the title “Parmigiano Reggiano”, the skill icon at the upper right, and an image at the right, with the image scaled, if needed, to the appropriate size for this template. The back button, title, background image, and hint text are optional.
"directives": [
{
"type": 'Display.RenderTemplate',
"template": {
"type": "BodyTemplate2",
"token": "CheeseDetailView",
"backButton": "HIDDEN",
"backgroundImage": https://www.example.com/background-image1.png,
"title": "Parmigiano Reggiano",
"image": https://www.example.com/parmigiano-reggiano.png,
"textContent": {
"text"="Parmigiano Reggiano
Country of origin: Italy
Parmesan cheese is made from unpasteurized cow’s milk. It has a hard, gritty texture, and is fruity and nutty in taste.",
"type" = "RichText"
}
}
} ,
{
"type": "Hint",
"hint": {
"type": "PlainText",
"text": "search for blue cheese"
}
}
]
For more details on the new Display Interface Templates please check out our Display Interface Reference.
The new VideoApp interface provides the VideoApp.Launch directive for streaming native video files. Your skill can send the VideoApp.Launch directive to start the video playback. Your skill can send a play directive in response to a voice request, or in response when a user taps an action link on the template.
To use the VideoApp directive for video playback, you must configure your skill as follows:
Note: When your skill is not in an active session but is playing video, or was the skill most recently playing video, utterances such as “stop” send your skill an AMAZON.PauseIntent instead of an AMAZON.StopIntent.
The VideoApp interface provides the VideoApp.Launch directive, which sends Alexa a command to stream the video file identified by the specified videoItem field. The source for videoItem must be a native video file and only one video item at a time may be supplied.
When including a directive in your skill service response, set the type property to the directive you want to send. Here is an example of a full response object sent from a LaunchRequest or IntentRequest.
In this example, one native-format video will be played.
{
"version": "1.0",
"response": {
"outputSpeech": null,
"card": null,
"directives": [
{
"type": "VideoApp.Launch",
"videoItem":
{
"source": "https://www.example.com/video/sample-video-1.mp4",
"metadata": {
"title": "Title for Sample Video",
"subtitle": "Secondary Title for Sample Video"
}
}
}
],
"reprompt": null
},
"sessionAttributes": null
}
For more details on VideoApp controls and playback you can check out our VideoApp Interface Reference.
Built-in intents allow you to add common functionality to skills without the need for complex interaction models. For Echo Show, all of the standard built-in intents are available, but we have added additional built-in intents as well.
These include built-in intents that are handled on the skill’s behalf, as well as built-in intents that are forwarded to the skill and must be managed by the skill developer (such as navigating to particular template).
Skill Developer Handles Intent? | Intents for Echo Show | Common Utterances |
Yes |
|
|
No |
|
|
For more details on these new intents, check out the built-in intents for Echo Show documentation.
You may also want to check out these Alexa Skills Kit resources: