Step 4: Understand How Your Web Player Gets the Media Playback URL (VSK Echo Show)

Integrate the VSK into a Multimodal Device

STEP 1:
Create Video Skill and Lambda

→

STEP 2:
Enable Skill on Device and Test

→

STEP 3:
Understand Directives and Responses

→

STEP 4:
Understand Web Player & Playback URL

→

STEP 5:
Build Your Web Player

→

STEP 6:
Respond to Alexa Directives

→

STEP 7:
Implement Account Linking

→

STEP 8:
Test for Certification

From your GetPlayableItemsMetadataResponse, Amazon passes the playbackContextToken to your web player. The logic in your web player can convert this playbackContextToken to a media playback URL that your web player loads.

Overview to the Media Playback URL and Media Player
Workflow Detail for the Media Playback URL
Handlers and Functions Used
Code Walkthrough
playbackContextToken versus accessToken
Next Steps

Overview to the Media Playback URL and Media Player

One detail that might not be apparent from the previous section (Step 3: Understand the Alexa Directives and Lambda Responses), is how the multimodal device gets the playback URL for the media, since this playback URL isn't something that your Lambda passes back to Alexa. (With the Fire TV app implementation of video skills, your Lambda pushes content URLs directly to your app through ADM. But because multimodal devices don't have apps, the process here is a bit different.)

Basically, here's how it works. In your Lambda's GetPlayableItemsMetadataResponse, your Lambda passes a playbackContextToken back to Alexa. The playbackContextToken contains a value of your choosing to describe the media. Amazon passes the playbackContextToken value to your web player through handlers in the Alexa JavaScript Library (alexa.js).

Your web player's code can then convert the playbackContextToken value to a media playback URL. This way you keep your media playback URLs entirely private, known only to you. The following sections will go through this workflow and code in more detail.

Workflow Detail for the Media Playback URL

After your Lambda receives a GetPlayableItems directive from Alexa, your Lambda responds with a GetPlayableItemsResponse that contains the media matching the user's request. After any disambiguation with the user around multiple media titles, Alexa then sends a GetPlayableItemsMetadata directive to get more details about the chosen media. Your Lambda then responds with a GetPlayableItemsMetadataResponse that contains an identifier for the media. This identifier is specified in the playbackContextToken property as follows:

Lambda Response: GetPlayableItemsMetadataResponse

{
    "event": {
        "header": {
            "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
            "messageId": "38ce5b22-eeff-40b8-a84f-979446f9b27e",
            "name": "GetPlayableItemsMetadataResponse",
            "namespace": "Alexa.VideoContentProvider",
            "payloadVersion": "3"
        },
        "payload": {
            "searchResults": [
                {
                    "name": "Big Buck Bunny from the Blender Foundation",
                    "contentType": "ON_DEMAND",
                    "series": {
                        "seasonNumber": "1",
                        "episodeNumber": "1",
                        "seriesName": "Creative Commons Videos",
                        "episodeName": "Pilot"
                    },
                    "playbackContextToken": "{\"streamUrl\": \"http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4\", \"title\": \"Big Buck Bunny\"}",
                    "parentalControl": {
                        "pinControl": "REQUIRED"
                    },
                    "absoluteViewingPositionMilliseconds": 1232340
                }
            ]
        }
    }
}

This playbackContextToken contains an identifier, defined entirely by you, that you would like Amazon to send to your web player. Your web player will convert this identifier into a media playback URL.

In the sample Lambda and web player provided in earlier steps, the playbackContextToken is a JSON object converted to a string. The sample web player code expects the playbackContextToken to be a JSON object with streamUrl and title parameters. However, you can pass whatever value here you want, so long as your media player properly parses and passes on this value to your web player.

The multimodal device invokes your web player (which resides on your server) and passes it the playbackContextToken. At this point your web player code could convert the value of the playbackContextToken to your actual media playback URL.

In the sample Lambda code, rather than making playbackContextToken an identifier that would need to be converted by the media player into a media playback URL, the sample Lambda just includes the media playback URL directly as the value for streamUrl.

Handlers and Functions Used

If you'll later be customizing the playbackContextToken in your own web player, particularly in Step 5: Build Your Web Player, it's important to understand what handlers and functions are used with the playbackContextToken.

The sample web player incorporates the Alexa JavaScript Library (alexa.js) (a required library for all web players integrating video skills on multimodal devices). The Alexa JavaScript library contains some event handlers that you must implement with your web player. These handlers are how Alexa communicates information to your web player. The handlers are defined in the readyCallback function in alexa.js:

Handlers in alexa.js

function readyCallback(controller) {
    var Event = AlexaWebPlayerController.Event;
    var handlers = {
        Event.LOAD_CONTENT: function handleLoad(params) {},
        Event.PAUSE: function handlePause() {},
        Event.RESUME: function handleResume() {},
        Event.SET_SEEK_POSITION: function handleSetPos (positionInMilliseconds) {},
        Event.ADJUST_SEEK_POSITION: function handleAdjustPos(offsetInMilliseconds) {},
        Event.NEXT: function handleNext() {},
        Event.PREVIOUS: function handlePrevious() {},
        Event.CLOSED_CAPTIONS_STATE_CHANGE: function handleCCState(state) {},
        Event.PREPARE_FOR_CLOSE: function handlePrepareForClose() {},
        Event.ACCESS_TOKEN_CHANGE: function handleAccessToken(accessToken) {}
    };
    controller.on(handlers);
}

When you implement these handlers in your web player, Alexa can send you a LOAD_CONTENT event, a PAUSE event, a RESUME event, and so on. Your web player needs to act on the information in these handlers. LOAD_CONTENT is the handler that Alexa uses to pass information about the media playback URL.

It's important to note that you own the web player component of the implementation. Each video skill references a web player URI that the skill invokes to play video. Although multimodal devices have a basic Chromium web browser, Alexa doesn't provide anything beyond it. As such, in order to receive communication from Alexa, your web player must incorporate JavaScript.

When you initialize the Alexa JavaScript Library in your web player (through AlexaWebPlayerController.initialize), Alexa can send you a LOAD_CONTENT: function handleLoad(params) {} event to load content based on the user's request. In this LOAD_CONTENT call, Alexa can pass four parameters (params) as defined in Register Event Handlers: params.contentUri, params.accessToken, params.offsetInMilliseconds, and params.autoplay. The params.contentUri parameter includes the playBackContextToken.

Your JavaScript has the responsibility of handling these parameters from LOAD_CONTENT and performing whatever tasks are needed to play the content in your player. If you aren't including the playback URL directly in the playbackContextToken, you will likely need to convert the value by consulting with your backend service (where these identifiers are presumably mapped to the playback URLs). Then you would tell your JavaScript to load the playback URL into your web player. (This step becomes unnecessary if your Lambda passes the playback URL directly in the playbackContextToken.)

Code Walkthrough

Let's get even more specific with the code in the sample web player and how the playbackContextToken gets passed. The sample web player contains several JavaScript files in the src folder:

├── src
│   ├── alexa.js
│   ├── main.js
│   ├── mediaPlayer.js
│   ├── ui.js
│   └── util
│       └── logger.js

The alexa.js file contains a loadContentHandler function:

function loadContentHandler(playbackParams) {
  function loadContent(resolve, reject) {
    try {
      const content = JSON.parse(playbackParams.contentUri);
      mediaPlayer.load(playbackParams);
      ui.setVideoTitle(content.title);
      resolve();
    } catch (err) {
      reject(err);
    }
  }
  ...
}

This function loads in the playbackParams.contentUri and sets it as content. It also calls mediaPlayer.load (the web player's function) and passes in playbackParams. Open mediaPlayer.js and look for the load function to see how playbackParams gets passed into this function:

function load(playbackParams) {
  const { video } = self;
  const { contentUri, offsetInMilliseconds, autoplay } = playbackParams;
  const source = document.createElement('source');
  const { streamUrl, title } = JSON.parse(contentUri);

  // Set the video content based on contentUri being a stream URL
  source.setAttribute('src', streamUrl);

  // Set type based on extension loosely found in URL
  if (contentUri.indexOf('.mp4') >= 0) {
    source.setAttribute('type', 'video/mp4');
  } else if (contentUri.indexOf('.m3u8') >= 0) {
    source.setAttribute('type', 'application/x-mpegURL');
  } else {
    logger.debug(`Unhandled video type for url: ${streamUrl}`);
    throw new Error({
      errorType: alexa.ErrorType.INVALID_CONTENT_URI,
    });
  }
...
}

The line const { contentUri, offsetInMilliseconds, autoplay } = playbackParams; basically unzips the playbackParams values and converts them to contentUri, offsetInMilliseconds, and autoplay (three different values).

The line const { streamUrl, title } = JSON.parse(contentUri); extracts two values(streamUrl and title) from contentUri. (This is why in the sample Lambda and web player code, playbackContextToken has to be an object.) All we need to do now is load this into our player.

Although the sample Lambda implements contentUri this way, you can customize your own varied implementation. For example, it could simply be a string (rather than a stringified object).

In mediaPlayer.js, the video.load function loads the media:

if (autoplay) {
   video.play();
 } else {
   video.load();
 }

The src attribute used with the play method is set in the following line:

source.setAttribute('src', streamUrl);

The play method will load the source parameter:

video.innerHTML = '';
video.appendChild(source);

Note that this code is your own web player, which will likely vary from this sample player. The main work you do in customizing the web player code code is with alexa.js because is where you handle all the events from Alexa.

playbackContextToken versus accessToken

In alexa.js, you'll see that another handler takes a parameter called accessToken:

Event.ACCESS_TOKEN_CHANGE: function handleAccessToken(accessToken) {}

If your service requires the user be authenticated to stream content, an accessToken is included as well. The accessToken authorizes the user to view the content with your service. In contrast, as explained at length here, the playbackContextToken (and contentUri) is how you communicate the media playback URL. The playbackContextToken does not include any authentication information.

Next Steps

Go on to Step 5: Build Your Web Player.