Step 2: Enable your Video Skill on a Multimodal Device and Test (VSK Echo Show)

Integrate the VSK into a Multimodal Device

STEP 1:
Create Video Skill and Lambda

→

STEP 2:
Enable Skill on Device and Test

→

STEP 3:
Understand Directives and Responses

→

STEP 4:
Understand Web Player & Playback URL

→

STEP 5:
Build Your Web Player

→

STEP 6:
Respond to Alexa Directives

→

STEP 7:
Implement Account Linking

→

STEP 8:
Test for Certification

Warning: Video Skills Kit (VSK) is no longer supported. For questions about existing integrations, please contact your Amazon technical account manager.

Now that you have a video skill created, it's time to connect it to your multimodal device. With Fire TV apps, you ran your app on a Fire TV device, connecting through ADB. However, multimodal devices don't use apps nor can you connect through ADB. Instead, you'll run your video skill by publishing a web player, supplying your video skill with a URI for your web player, and then associating your skill with your multimodal device. The multimodal device will invoke your web player (from your server) to display the requested media.

Step 2.1: Set Up a Sample Web Player
Step 2.2: Activate the Video Skill on Your Multimodal Device
Step 2.3: Enable Your Skill on an Alexa Device
Step 2.4: Test Your Skill
Step 2.5: View the Directives Sent to Cloudwatch
Next Steps

Step 2.1: Set Up a Sample Web Player

Although you'll build a web player later in the integration process (specifically, in Step 5: Build Your Web Player), for now, download a sample web player project to get started and observe the Alexa directives sent to your Lambda. The sample web player includes initiation of video content, basic playback controls, and integration with Alexa Video JavaScript Library.

You can download the sample web player here:

To set up the sample web player:

Download the sample_web_player.zip and unzip the files.
If you don't already have Node JS and npm on your system, install them.
From the command line, browse to the sample web player directory.
Run the following:
```
npm install -g gulp http-server
```
Then run:
```
npm install
gulp
```
Upload the web player onto a web server.

For this sample setup, you can also use the following web player URL: https://amzndevresources.com/video-skills-kit/sample-web-player/sample_web_player/index.html. This URL simply provides the default web player code in an S3 bucket that is configured as a static website.

As desired, update the Web Player URI in the video skill configuration that you performed in Step 1.4: Finish Setting Up Your Video Skill in Step 1.

Step 2.2: Activate the Video Skill on Your Multimodal Device

Now that you've set up a video skill and Lambda function, it's time to test your skill and observe the directives Alexa sends to your Lambda function. For this testing, you will need an actual multimodal device, such as an Echo Show.

Step 2.3: Enable Your Skill on an Alexa Device

To test your skill on an Alexa-enabled device:

Set up and register your multimodal device using the same developer account you used to create your video skill. Note the following:
- If your device is already registered to another user, do a factory reset and register it with your developer account. (Or you can see Deregister a Device to deregister it through amazon.com.)
- You will need the Alexa smartphone app to complete the device setup. If you don't already have this app, download it and sign in with your developer account.
- Set up your multimodal device (e.g., Echo Show) with your developer account. See Set Up Your Echo Device with a Screen for details.
- Make sure both your Alexa smartphone app and Echo Show device are on the same wifi network.
Note: The end-user documentation for multimodal devices calls these types of devices "Echo Devices with a Screen." However, not all Echo devices with screens support video skills. For example, Echo Spot isn't supported. Currently, all the Echo Show devices (1st Gen, 2nd Gen, Echo Show 5) support video skills. The term "multimodal devices" refers to Echo devices that support video skills.
Open the Alexa smartphone app on your phone and sign in with the same developer account.
Tap Devices in the bottom navigation bar and then select All Devices. Confirm that your Echo device appears in your app.

Tip: You can find out your Echo's device name by asking it, "Alexa, what's your device name?"

Your list of devices in the Alexa smartphone app

The smartphone app can be a bit slow, so if the app doesn't seem to respond immediately when you press a button, be patient. This slowness will be addressed in an upcoming release.
From the app's home screen, tap the menu button (upper-left corner) , then tap Settings, then tap TV & Video. Scroll down the list and identify any skills that say "Enabled" under them. If a skill says "Enabled," click on that skill and then click Disable Skill, so that no TV & Video skills are enabled. (Otherwise, the other skills could interfere with your testing.)

Disabling video skills

Then click the back arrow at the top to return to the list of TV & Video skills.
Scroll down the list and look for the video skill you created earlier. It will appear at the bottom. Click the plus button next to your video skill.

Video skills in the Alexa app
After clicking your skill to view its details, click Link Your Alexa Device.

A new page opens up in your browser that indicates your skill has been successfully linked.

Your skill has been successfully linked
Close this notification window by clicking the X in the upper-left corner of your app. When you close the window, the Alexa app shows the skills linked to your device.

Your list of linked devices

(You can click Save if you want — it's not necessary.) If you click the back button to go out to the TV & Video skills, you'll see the word "Enabled" next to your skill:

Your enabled skill. In this example, the name of the enabled skill is "Hawaii Echo".

You're done with configurations in the Alexa smartphone app. You can click the home button or just close the app.

Step 2.4: Test Your Skill

Now that your video skill is associated with your device, turn to your multimodal device and say, "Alexa, go to Video Home."

The logo image asset for your video skill should appear:

Your video skill logo appears on Video Home
Tap on your video skill logo with your finger. This initiates an Alexa GetDisplayableItems directive to get browsable items from the catalog associated with your video skill.

Alexa returns a result from your catalog. For example, if you're using the catalog "hawaii_us" (a sample catalog with Creative Commons assets), the device would show the following:

A thumbnail from your catalog

Alexa is now sending directives to your Lambda, and you can see this interaction in Cloudwatch.

Note: When you go to Video Home and tap on a specific video skill, any requests you make will look for media from within that skill (rather than looking at the catalogs for all the video skills on your device.) This is how you avoid having Alexa return results from other catalogs.
Say to your multimodal device, "Alexa, play the movie Big Buck Bunny."

Tip: Take extra care to enunciate Big Buck Bunny somewhat slowly. Otherwise, Alexa tends to interpret this as Bugs Bunny.

The video should play on the device. This content URL is hard-coded into the response's playbackContextToken for the sample Lambda.

Big Buck Bunny playing on Echo Show

Step 2.5: View the Directives Sent to Cloudwatch

When you say phrases to your Alexa-enabled device, these phrases get processed and converted into directives. Directives are information blocks (formatted in JSON) sent from Alexa to your Lambda for processing.

Note: The documentation here often refers more generally to these directives as "requests" that Alexa sends to your Lambda. Requests contain directives. The word request is more general and is used to align with the web API model of requests and responses.

You can view the directives that Alexa sends to your Lambda function through Cloudwatch. Cloudwatch is an AWS service that collects monitoring and operational data for Lambda (and other services) in the form of logs, metrics, and events. Each time your Lambda function receives directives from Alexa, you can view the directives and other logs in Cloudwatch.

Tip: The core task in configuring a video skill is in understanding the directives Alexa sends and the expected responses Alexa expects your Lambda to return.

To view the directives received by your Lambda function in Cloudwatch:

You can navigate to Cloudwatch in two ways:
- Option 1: From your Lambda function in AWS, click the Monitoring tab. Then click the View logs in Cloudwatch button. This takes you directly to the logs for the Lambda function you were viewing.
  
  Viewing Cloudwatch logs for your Lambda function
- Option 2: Click the Services menu in AWS's top navigation and select Cloudwatch. Click Logs in the left sidebar. By default, a log appears named after your Lambda function. Select the log containing your Lambda function.
  
  Note: If you don't see your Lambda function, make sure you're in the right AWS region.
Select the latest Log Stream for your logs.

Tip: Streams are just groupings for your logs. Cloudwatch creates new log streams at regular intervals, so if you don't see your latest logs, click back a level to see if Cloudwatch created a new log stream. Also note that log streams are listed in reverse chronological order, with the most recent logs on top. However, when you drill into a specific log, the log's contents are listed in chronological order, with the most oldest log messages at the top.
Look for a log message that says Interaction starts. This is the start of the logs for an event that contains a request from Alexa and a response from your Lambda. The sample Lambda code prefaces each event with this message, so this note will appear multiple times.

Cloudwatch logs

Expand a log message that contains Alexa Request: GetPlayableItems.

There are actually multiple interactions going on here, some GetDisplayableItems directives and other GetPlayableItems. First, when when you press your video skill's logo, this triggers an GetDisplayableItems directive from Alexa, but without any specifics about the requested media. Your Lambda responds with items you want the multimodal device to show there. The items are displayed within a browse template provided by multimodal devices.

Next, when you say, "Alexa, play the movie Big Buck Bunny," this triggers a GetPlayableItems directive from Alexa. This request contains a payload with an entities block with the video the user requested (value: Big Buck Bunny). Try to locate this particular log with the value Big Buck Bunny in Cloudwatch.

Alexa Directive: GetPlayableItems

{
    "directive": {
        "profile": null,
        "payload": {
            "minResultLimit": 1,
            "entities": [
                {
                    "externalIds": null,
                    "type": "MediaType",
                    "value": "MOVIE",
                    "entityMetadata": null,
                    "mergedGroupId": 0
                },
                {
                    "externalIds": {
                        "hawaii_us": "tt1254207"
                    },
                    "type": "Video",
                    "value": "Big Buck Bunny",
                    "entityMetadata": null,
                    "mergedGroupId": 1
                }
            ],
            "timeWindow": null,
            "locale": "en-US",
            "contentType": null,
            "maxResultLimit": 40
        },
        "endpoint": {
            "cookie": {},
            "endpointId": "ALEXA_VOICE_SERVICE_EXTERNAL_MEDIA_PLAYER_VIDEO_PROVIDER",
            "scope": {
                "token": "1bdaa2eb-4aa3-d0dc-fb10-7a5513981cf8",
                "type": "BearerToken"
            }
        },
        "header": {
            "payloadVersion": "3",
            "messageId": "8dd96f67-9b5e-4db4-803b-2f6d71a4a62e",
            "namespace": "Alexa.VideoContentProvider",
            "name": "GetPlayableItems",
            "correlationToken": null
        }
    }
}

When users make requests to play movies, Alexa interprets this as a Quick Play scenario and first sends a GetPlayableItems directive. The payload in this directive contains the value that Alexa thinks the user is asking for: "value": "Big Buck Bunny".

At this point, normally your Lambda code might perform lookups or other queries to figure out what media the user is really asking for. Alexa has just done the job of parsing the user's utterances into a structured payload for you to handle. If the user had said something more general, such as "Avengers," you might have to clarify which Avengers movie the user really wants. Or if the user had said, "Play Mozart in the Jungle," you would need to clarify which season and episode.

Your Lambda returns a GetPlayableItemsResponse response to Alexa containing all titles in your catalog that match the user's request. (In the sample Lambda, the response is pre-defined rather than dynamically queried through some backend service.) Expand the Lamba Response: GetPlayableItemsResponse log message (under the Alexa Request: GetPlayableItems) to see the mediaIdentifier for the requested media:

Lambda Response: GetPlayableItemsResponse

{
    "event": {
        "header": {
            "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
            "messageId": "5f0a0546-caad-416f-a617-80cf083a05cd",
            "name": "GetPlayableItemsResponse",
            "namespace": "Alexa.VideoContentProvider",
            "payloadVersion": "3"
        },
        "payload": {
            "nextToken": "fvkjbr20dvjbkwOpqStr",
            "mediaItems": [
                {
                    "mediaIdentifier": {
                        "id": "tt1254207"
                    }
                }
            ]
        }
    }
}

In this case, there's just one mediaIdentifier, but other scenarios might return more matching titles. At this point, Alexa could ask the user which title the user wants to play.

After Alexa knows the right media to play, so it sends another request to your Lambda. This request is a GetPlayableItemsMetadata directive. This directive asks for the metadata (metadata includes runtime, user's progress, and other external metadata) for the media returned in your Lambda's GetPlayableItemsResponse:

Alexa Directive: GetPlayableItemsMetadata

{
"directive": {
    "profile": null,
    "payload": {
        "locale": "en-US",
        "mediaIdentifier": {
            "id": "tt1254207"
        }
    },
    "endpoint": {
        "endpointId": "ALEXA_VOICE_SERVICE_EXTERNAL_MEDIA_PLAYER_VIDEO_PROVIDER",
        "cookie": {},
        "scope": {
            "token": "1bdaa2eb-4aa3-d0dc-fb10-7a5513981cf8",
            "type": "BearerToken"
        }
    },
    "header": {
        "payloadVersion": "3",
        "messageId": "3b4041f4-6114-4034-b295-e3afd2df8e19",
        "namespace": "Alexa.VideoContentProvider",
        "name": "GetPlayableItemsMetadata",
        "correlationToken": null
    }
  }
}

Your Lambda then performs lookups to gather metadata about this specific video and returns the information in a GetPlayableItemsMetadataResponse:

Lambda Response: GetPlayableItemsMetadataResponse

{
    "event": {
        "header": {
            "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
            "messageId": "38ce5b22-eeff-40b8-a84f-979446f9b27e",
            "name": "GetPlayableItemsMetadataResponse",
            "namespace": "Alexa.VideoContentProvider",
            "payloadVersion": "3"
        },
        "payload": {
            "searchResults": [
                {
                    "name": "Big Buck Bunny",
                    "contentType": "ON_DEMAND",
                    "series": {
                        "seasonNumber": "1",
                        "episodeNumber": "1",
                        "seriesName": "Blender Foundation Videos",
                        "episodeName": "Pilot"
                    },
                    "playbackContextToken": "{\"streamUrl\": \"http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4\", \"title\": \"Big Buck Bunny\"}",
                    "parentalControl": {
                        "pinControl": "REQUIRED"
                    },
                    "absoluteViewingPositionMilliseconds": 1232340
                }
            ]
        }
    }
}

Notice the playbackContextToken value returned here. This value, which is an object containing both a streamUrl and a title parameter) contains a URL to play the video. The multimodal device will send this playbackContextToken to your web player, and your web player will begin playing the video.

Note: In this sample Lambda, the content URL to play the media is provided directly in the playbackContextToken. However, in your implementation, you don't need to include the URL to play the media here. Instead, you can merely provided a media identifier of some kind. Whatever identifier you send, Amazon sends that identifier to your web player (which is hosted on your server). Your web player would then need to implement logic to convert your identifier into the actual media playback URL. Using this approach, you wouldn't expose your media playback URLs to Amazon. However, this more sophisticated logic (for converting an identifier into a media playback URL for the media) is not implemented in this simple Lambda code, nor are you required to follow this pattern. More detail is explained about the playbackContextToken and web player in Step 4: Understand How Your Web Player Gets the Media Playback URL.

Why are there two calls — GetPlayableItems followed by GetPlayableItemsMetadata?

You might be wondering why Alexa makes two calls to get the content the user requests. As an analogy, consider a customer who walks into a fast-food restaurants and says to the worker at the counter, "I'd like a hamburger." The worker recognizes that the customer wants a hamburger, but there are multiple matches for hamburger's on the restaurant's menu. So the worker responds, "We have several types of hamburgers. Would you like a Deluxe hamburger, a Basic hamburger, or our Daily Special hamburger?"

The customer clarifies that he would like a "Deluxe Hamburger." So the worker returns information about the Deluxe Hamburger (price, included items, condiments, etc.) to fulfill the order.

Natural speech is a constant back-and-forth of requests and responses to clarify intent. It makes sense that with multimodal devices and users uttering requests for media, there would also be multiple interactions with requests and responses to clarify the media the user actually wants.

As you can see, the interactions on multimodal devices are quite different from those in the Fire TV app integration. In the Fire TV app integration, your Lambda function sent a brief success message back to Alexa but nothing more. However, with the multimodal device implementation, after your backend services retrieve the right information, you do actually send a detailed response back to Alexa. There are multiple calls back and forth with requests and responses.

Next Steps

Now that you've explored some initial workflows in sending events to your Lambda, let's dive a bit deeper for a fuller understanding of what's going on in the sample Lambda code. Go on to Step 3: Understand the Alexa Directives and Expected Responses.