Step 2: Enable your Video Skill on a Multimodal Device and Test (VSK MM)

Now that you have a video skill created, it's time to connect it to your multimodal device. With Fire TV apps, you ran your app on a Fire TV device, connecting through adb. However, multimodal devices don't use apps nor can you connect through adb. Instead, you'll run your video skill by publishing a web player, supplying your video skill with a URI for your web player, and then associating your skill with your multimodal device. The multimodal device will invoke your web player (from your server) to display the requested media.

Step 2.1: Set Up a Sample Web Player

Although you'll build a web player later in the integration process (specifically, in Step 4: Build Your Web Player), for now, download a sample web player project to get started and observe the Alexa directives sent to your Lambda. The sample web player includes initiation of video content, basic playback controls, and integration with Alexa Video JavaScript Library.

To set up the sample web player:

  1. Download the sample_web_player.zip and unzip the files.
  2. If you don't already have Node JS and npm on your system, install them.
  3. From the command line, browse to the sample web player directory.
  4. Run the following:

    npm install -g gulp http-server
    
  5. Then run:

    npm install
    gulp
    
  6. Upload the web player onto a web server.

    For this sample setup, you can also use the following web player URL: https://devportal-reference-docs.s3-us-west-1.amazonaws.com/video-skills-kit/sample-web-player/sample_web_player/index.html. This URL simply provides the default web player code in an S3 bucket that is configured as a static website.

    As desired, update the Web Player URI in the video skill configuration that you performed in Step 1.4: Finish Setting Up Your Video Skill in Step 1.

Step 2.2: Activate the Video Skill on Your Multimodal Device

Now that you've set up a video skill and Lambda function, it's time to test your skill and observe the directives Alexa sends to your Lambda function. For this testing, you will need an actual multimodal device, such as an Echo Show.

Step 2.3: Enable Your Skill on an Alexa Device

To test your skill on an Alexa-enabled device:

  1. Set up and register your multimodal device using the same developer account you used to create your video skill. Note the following:

    • If your device is already registered to another user, do a factory reset and register it with your developer account. (Or you can see Deregister a Device to deregister it through amazon.com.)
    • You will need the Alexa smartphone app to complete the device setup. If you don't already have this app, download it and sign in with your developer account.
    • Set up your multimodal device (e.g., Echo Show) with your developer account. See Set Up Your Echo Device with a Screen for details.
    • Make sure both your Alexa smartphone app and Echo Dot are on the same wifi network.
  2. Open the Alexa smartphone app on your phone and sign in with the same developer account.
  3. Tap Devices Devices button in Alexa app in the bottom navigation bar and then select All Devices. Confirm that your Echo device appears in your app.

    Your list of devices in the Alexa smartphone app
    Your list of devices in the Alexa smartphone app

    The smartphone app can be a bit slow, so if the app doesn't seem to respond immediately when you press a button, be patient. This slowness will be addressed in an upcoming release.

  4. From the app's home screen, tap the menu button (upper-left corner) Menu button in Alexa app , then tap Settings, then tap TV & Video. Scroll down the list and identify any skills that say "Enabled" below them. If a skill says "Enabled," click on that skill and then click Disable Skill, so that no TV & Video skills are enabled. (Otherwise, the other skills could interfere with your testing.)

    Disabling video skills
    Disabling video skills

    Then click the back arrow at the top to return to the list of TV & Video skills.

  5. Scroll down the list and look for the video skill you created earlier. It will appear at the bottom. Click the plus button next to your video skill.

    Video skills in the Alexa app
  6. After clicking your skill to view its details, click Link Your Alexa Device.

    A new page opens up in your browser that indicates your skill has been successfully linked.

    Your skill has been successfully linked
    Your skill has been successfully linked
  7. Close this notification window by clicking the X in the upper-left corner of your app. When you close the window, the Alexa app shows the skills linked to your device.

    Your list of linked devices
    Your list of linked devices

    (You can click Save if you want — it's not necessary.) If you click the back button to go out to the TV & Video skills, you'll see the word "Enabled" next to your skill:

    Your enabled skill
    Your enabled skill. In this example, the name of the enabled skill is "Hawaii Echo".

    You're done with configurations in the Alexa smartphone app. You can click the home button or just close the app.

Step 2.4: Test Your Skill

  1. Now that your video skill is associated with your device, turn to your multimodal device and say, "Alexa, go to Video Home."

    The logo image asset for your video skill should appear:

    Your video skill logo appears on Video Home
    Your video skill logo appears on Video Home
  2. Tap on your video skill logo with your finger. This initiates an Alexa GetDisplayableItems directive to get browsable items from the catalog associated with your video skill.

    Alexa returns a result from your catalog. For example, if you're using the catalog "hawaii_us" (a sample catalog with Creative Commons assets), the device would show the following:

    A thumbnail from your catalog
    A thumbnail from your catalog

    Alexa is now sending directives to your Lambda, and you can see this interaction in Cloudwatch.

  3. Say to your multimodal device, "Alexa, play the movie Big Buck Bunny."

    The video should play on the device. This content URL is hard-coded into the response's playbackContextToken for the sample Lambda.

    Big Buck Bunny playing on Echo Show
    Big Buck Bunny playing on Echo Show

Step 2.5: View the Directives Sent to Cloudwatch

When you say phrases to your Alexa-enabled device, these phrases get processed and converted into directives. Directives are information blocks (formatted in JSON) sent from Alexa to your Lambda for processing.

You can view the directives that Alexa sends to your Lambda function through Cloudwatch. Cloudwatch is an AWS service that collects monitoring and operational data for Lambda (and other services) in the form of logs, metrics, and events. Each time your Lambda function receives directives from Alexa, you can view the directives and other logs in Cloudwatch.

To view the directives received by your Lambda function in Cloudwatch:

  1. You can navigate to Cloudwatch in two ways:

    • Option 1: From your Lambda function in AWS, click the Monitoring tab. Then click the View logs in Cloudwatch button. This takes you directly to the logs for the Lambda function you were viewing.

      Viewing Cloudwatch logs for your Lambda function
      Viewing Cloudwatch logs for your Lambda function
    • Option 2: Click the Services menu in AWS's top navigation and select Cloudwatch. Click Logs in the left sidebar. By default, a log appears named after your Lambda function. Select the log containing your Lambda function.

  2. Select the latest Log Stream for your logs.

  3. Look for a log message that says Interaction starts. This is the start of the logs for an event that contains a request from Alexa and a response from your Lambda. The sample Lambda code prefaces each event with this message, so this note will appear multiple times.

    Cloudwatch logs
    Cloudwatch logs
  4. Expand a log message that contains Alexa Request: GetPlayableItems.

    There are actually multiple interactions going on here, some GetDisplayableItems directives and other GetPlayableItems. First, when when you press your video skill's logo, this triggers an GetDisplayableItems directive from Alexa, but without any specifics about the requested media. Your Lambda responds with items you want the multimodal device to show there. The items are displayed within a browse template provided by multimodal devices.

    Next, when you say, "Alexa, play the movie Big Buck Bunny," this triggers a GetPlayableItems directive from Alexa. This request contains a payload with an entities block with the video the user requested (value: Big Buck Bunny). Try to locate this particular log with the value Big Buck Bunny in Cloudwatch.

    Alexa Directive: GetPlayableItems

    {
        "directive": {
            "profile": null,
            "payload": {
                "minResultLimit": 1,
                "entities": [
                    {
                        "externalIds": null,
                        "type": "MediaType",
                        "value": "MOVIE",
                        "entityMetadata": null,
                        "mergedGroupId": 0
                    },
                    {
                        "externalIds": {
                            "hawaii_us": "tt1254207"
                        },
                        "type": "Video",
                        "value": "Big Buck Bunny",
                        "entityMetadata": null,
                        "mergedGroupId": 1
                    }
                ],
                "timeWindow": null,
                "locale": "en-US",
                "contentType": null,
                "maxResultLimit": 40
            },
            "endpoint": {
                "cookie": {},
                "endpointId": "ALEXA_VOICE_SERVICE_EXTERNAL_MEDIA_PLAYER_VIDEO_PROVIDER",
                "scope": {
                    "token": "1bdaa2eb-4aa3-d0dc-fb10-7a5513981cf8",
                    "type": "BearerToken"
                }
            },
            "header": {
                "payloadVersion": "3",
                "messageId": "8dd96f67-9b5e-4db4-803b-2f6d71a4a62e",
                "namespace": "Alexa.VideoContentProvider",
                "name": "GetPlayableItems",
                "correlationToken": null
            }
        }
    }
    

    When users make requests to play movies, Alexa interprets this as a Quick Play scenario and first sends a GetPlayableItems directive. The payload in this directive contains the value that Alexa thinks the user is asking for: "value": "Big Buck Bunny".

    At this point, normally your Lambda code might perform lookups or other queries to figure out what media the user is really asking for. Alexa has just done the job of parsing the user's utterances into a structured payload for you to handle. If the user had said something more general, such as "Avengers," you might have to clarify which Avengers movie the user really wants. Or if the user had said, "Play Mozart in the Jungle," you would need to clarify which season and episode.

    Your Lambda returns a GetPlayableItemsResponse response to Alexa containing all titles in your catalog that match the user's request. (In the sample Lambda, the response is pre-defined rather than dynamically queried through some backend service.) Expand the Lamba Response: GetPlayableItemsResponse log message (below the Alexa Request: GetPlayableItems) to see the mediaIdentifier for the requested media:

    Lambda Response: GetPlayableItemsResponse

    {
        "event": {
            "header": {
                "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
                "messageId": "5f0a0546-caad-416f-a617-80cf083a05cd",
                "name": "GetPlayableItemsResponse",
                "namespace": "Alexa.VideoContentProvider",
                "payloadVersion": "3"
            },
            "payload": {
                "nextToken": "fvkjbr20dvjbkwOpqStr",
                "mediaItems": [
                    {
                        "mediaIdentifier": {
                            "id": "tt1254207"
                        }
                    }
                ]
            }
        }
    }
    

    In this case, there's just one mediaIdentifier, but other scenarios might return more matching titles. At this point, Alexa could ask the user which title the user wants to play.

    After Alexa knows the right media to play, so it sends another request to your Lambda. This request is a GetPlayableItemsMetadata directive. This directive asks for the metadata (metadata includes runtime, user's progress, and other external metadata) for the media returned in your Lambda's GetPlayableItemsResponse:

    Alexa Directive: GetPlayableItemsMetadata

    {
    "directive": {
        "profile": null,
        "payload": {
            "locale": "en-US",
            "mediaIdentifier": {
                "id": "tt1254207"
            }
        },
        "endpoint": {
            "endpointId": "ALEXA_VOICE_SERVICE_EXTERNAL_MEDIA_PLAYER_VIDEO_PROVIDER",
            "cookie": {},
            "scope": {
                "token": "1bdaa2eb-4aa3-d0dc-fb10-7a5513981cf8",
                "type": "BearerToken"
            }
        },
        "header": {
            "payloadVersion": "3",
            "messageId": "3b4041f4-6114-4034-b295-e3afd2df8e19",
            "namespace": "Alexa.VideoContentProvider",
            "name": "GetPlayableItemsMetadata",
            "correlationToken": null
        }
      }
    }
    

    Your Lambda then performs lookups to gather metadata about this specific video and returns the information in a GetPlayableItemsMetadataResponse:

    Lambda Response: GetPlayableItemsMetadataResponse

    {
        "event": {
            "header": {
                "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
                "messageId": "38ce5b22-eeff-40b8-a84f-979446f9b27e",
                "name": "GetPlayableItemsMetadataResponse",
                "namespace": "Alexa.VideoContentProvider",
                "payloadVersion": "3"
            },
            "payload": {
                "searchResults": [
                    {
                        "name": "Big Buck Bunny",
                        "contentType": "ON_DEMAND",
                        "series": {
                            "seasonNumber": "1",
                            "episodeNumber": "1",
                            "seriesName": "Blender Foundation Videos",
                            "episodeName": "Pilot"
                        },
                        "playbackContextToken": "{\"streamUrl\": \"http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4\", \"title\": \"Big Buck Bunny\"}",
                        "parentalControl": {
                            "pinControl": "REQUIRED"
                        },
                        "absoluteViewingPositionMilliseconds": 1232340
                    }
                ]
            }
        }
    }
    

    Notice the playbackContextToken value returned here. This value, which is an object containing both a streamUrl and a title parameter) contains a URL to play the video. The multimodal device will send this playbackContextToken to your web player, and your web player will begin playing the video.

    Why are there two calls — GetPlayableItems followed by GetPlayableItemsMetadata?

    You might be wondering why Alexa makes two calls to get the content the user requests. As an analogy, consider a customer who walks into a fast-food restaurants and says to the worker at the counter, "I'd like a hamburger." The worker recognizes that the customer wants a hamburger, but there are multiple matches for hamburger's on the restaurant's menu. So the worker responds, "We have several types of hamburgers. Would you like a Deluxe hamburger, a Basic hamburger, or our Daily Special hamburger?"

    The customer clarifies that he would like a "Deluxe Hamburger." So the worker returns information about the Deluxe Hamburger (price, included items, condiments, etc.) to fulfill the order.

    Natural speech is a constant back-and-forth of requests and responses to clarify intent. It makes sense that with multimodal devices and users uttering requests for media, there would also be multiple interactions with requests and responses to clarify the media the user actually wants.

    As you can see, the interactions on multimodal devices are quite different from those in the Fire TV app integration. In the Fire TV app integration, your Lambda function sent a brief success message back to Alexa but nothing more. However, with the multimodal device implementation, after your backend services retrieve the right information, you do actually send a detailed response back to Alexa. There are multiple calls back and forth with requests and responses.

Next Steps

Now that you've explored some initial workflows in sending events to your Lambda, let's dive a bit deeper for a fuller understanding of what's going on in the sample Lambda code. Go on to Step 3: Understand the Alexa Directives and Expected Responses.