Gracias por tu visita. Esta página solo está disponible en inglés.

Video Skills for Fire TV Apps Overview

Video skills for Fire TV apps allow customers to use natural language commands to search for your app's content, launch your app, control media playback, change the channel, and more. For example, your video skill can enable customers to say phrases like "Play Bosch from [app name]" or "Play Bosch," and your app will play the media.

To implement a video skill for your Fire TV app, you primarily implement the Alexa Video Skill API from the Alexa Skills Kit. However, the integration into Android-based streaming media apps for Fire TV will involve additional APIs and services, including Amazon Device Messaging (ADM), AWS Lambda, AWS S3, AWS IAM, Login with Amazon, Cloudwatch, Node JS, Alexa Client Library, and more. Incorporating a video skill for your Fire TV app gives customers the richest voice experience with your content, driving up the levels of engagement and discovery for your content.

Capabilities Provided with Video Skills for Fire TV Apps

Integrating a video skill for your Fire TV app gives customers the following capabilities:

  • App launching: When a customer asks to play or search for specific content, Alexa automatically launches the correct Fire TV app. When customers say "Alexa, open <app name>," they are directed to the app’s homepage. The video skill automatically enables the Alexa Video Skill API to launch the app.
  • Quick play: Customers can ask Alexa to play video by saying, "Alexa, play <show name> " or "Alexa, play <show name> on <app name>. Alexa routes the user to the correct app with that content, and Fire TV begins playback automatically (rather than just going to the detail page).
  • Search: Customers can ask Alexa to perform universal searches for content by saying "Alexa, find <show name>." Searches like this, which don't limit the scope to an app, are called "universal searches," since they look for the content across all catalog-integrated Fire TV apps. Searches that limit their scope to a specific app are called "local searches." Customers can also perform local searches by saying "Alexa, find <show name> on <app name>" or "Alexa, find <genre> on <app name>."
  • Transport Controls: Customers can control playback via voice through utterances such as "Alexa, fast forward", "Alexa, fast forward 5 minutes", "Alexa, next", "Alexa, previous", as well as rewind, pause, resume, and stop.
  • Channel Change: For apps that offer live TV functionality, customers can switch between channels through utterances such as "Alexa, tune to <app name>".

A frequent interaction might be as follows:

A customer asks Alexa to play a specific TV show
A customer asks Alexa to play a specific TV show
Through the video skill in your Fire TV app, Alexa finds and plays the content.
Through the video skill enabled in your app, Alexa finds and plays the content.

Note that you need only implement the logic for capabilities that are available in your app. For example, if channels aren't available in your app, you wouldn't need to implement Channel Change behaviors with the video skill.

Overall, voice capabilities make it easier for customers to discover and play your content. Apps with movies and TV shows work especially well with video skills. Including voice interactivity with your app encourages customers to engage more frequently with your content.

Not only does incorporating a video skill into your Fire TV app increase engagement with your app, voice interactions are becoming a standard expectation for more and more devices. (For more background on the ways voice interactions increase app usage by simplifying the experience, see Getting into the Voice Mindset from the AWS Training and Certification library.)

Prerequisite: Catalog Integration

To incorporate a video skill for your Fire TV app, your app must be catalog-integrated. Catalog integration refers to the process of describing your app's media according to Amazon's Catalog Data Format (CDF), which is an XML schema, and regularly uploading your catalog into an S3 bucket following the processes described in catalog documentation.

Catalog integration is restricted to apps that have long-form movies or episodic TV shows that are significant enough to be integrated in and matched to IMDb, Amazon Video, or Gracenote. If your catalog consists of content that might not be included in these sources, reach out to your Amazon Business contact for guidance.

If you don't qualify for catalog integration, then you cannot implement a video skill for your Fire TV app. However, you can still incorporate some voice interactivity with your app through two related technologies:

Note that if you implement the video skill for your Fire TV app, the capabilities skill's capabilities will automatically include transport controls. In-app scrolling and selection, not included in the video skill, is turned on manually by Amazon and might already be activated for your app. (In short, when you implement the Alexa video skill, you don't need to worry about any other voice integration efforts. The video skill provides the deepest level of voice enablement.)

Supported Countries

Video skills for Fire TV apps are not supported in every country. If you live in a country where video skills aren't supported, you cannot create a video skill for your Fire TV app.

Additionally, the AWS regions you must use for your Lambda function are strictly enforced rather than optional. For example, if you're in the UK, you must use the EU (Ireland) region in AWS for your Lambda function.

For a detailed list of countries and support, see Supported Countries for Video Skills on Devices. See also AWS Regions and Video Skills in that same topic.

What You'll Need

You will need the following to create the video skill for your Fire TV app:

You will also configure a variety of services within the Appstore Developer Console, the Alexa Developer Console, and AWS. These services include IAM, Lambda, Cloudwatch, Security Profile, and more.

High-level Workflow

At a high-level, to integrate a video skill for your Fire TV app, you first create a video skill in the Alexa Developer Console and associate it with a Lambda function on AWS. When users interact with your app through voice, Alexa voice services in the cloud convert the user's commands into JSON objects, called directives.

Your video skill (using the Video Skill API) receives these directives from Alexa and sends them to your Lambda function. Your Lambda function inspects the request and takes any necessary actions in your app (such as returning results or initiating playback). The Lambda function uses Amazon Device Messaging (ADM), a push notification service, to communicate with your app.

Detailed Workflow

The previous section showed how video skills work at a high level. Now let's walk through the video skill workflow with more detail and granularity. The following diagram shows the video skill workflow on Fire TV:

Video Skill Diagram and Workflow
Video Skill Diagram and Workflow

Alexa-enabled device listens for natural language commands

On Fire TV, Alexa listens for natural language commands from users. Supported utterances (as they're called) include search, play, app launch, channel change, and transport control commands. The Alexa-enabled device sends these commands to Alexa in the cloud.

Alexa processes phrases and generates out directives

In the cloud, Alexa processes the user's utterances using automatic speech recognition and converts the speech to text. Alexa also processes the commands with natural language understanding to recognize the intent of the text. (As an app developer, you get all of this language processing and interpretation for free.)

Directives are passed to Lambda through the Video Skill API

The output from Alexa in the cloud, which handles the parsing and interpretation of the user's utterances/commands, is a "directive." A directive is a set of data and instructions, expressed as a JSON object, that provides direction on how to respond to the user's utterances. For example, when a user says "Play Bosch," Alexa converts this into a "Play directive" that has a specific JSON structure, like this:

{
  "directive": {
    "payload": {
      "entities": [
        {
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.858df979-c070-5533-9b1f-ecae15e9f139",
          "value": "Bosch",
          "externalIds": {
            "avc_vending_de": "amzn1.dv.gti.e0a9f6b7-ca7e-dc0c-c80e-f5801c580da8",
            "ENTITY_ID": "amzn1.p11cat.merged-video.858df979-c070-5533-9b1f-ecae15e9f139",
            "avc_vending_us": "amzn1.dv.gti.56a9f78c-4cfe-36f0-663d-9104c6dd6595",
            "asin_row_na": "B01M32CYV3",
            "asin_row_fe": "B01MCYRKGY",
            "avc_vending_jp": "amzn1.dv.gti.fea9f575-39fd-7a77-622b-a400f9b511f8",
            "asin_us": "B00S45ZDVE",
            "avc_vending": "amzn1.dv.gti.8cac011f-78c3-114b-b3f8-246a48f23ec0",
            "asin_roe_eu": "B01MCYRQHG",
            "imdb": "tt3502248",
            "ontv": "SH018737530000",
            "asin_gb": "B00IGQC64I",
            "asin_row_eu": "B01MDRHYR2",
            "asin_jp": "B014QF5HMU",
            "avc_vending_gb": "amzn1.dv.gti.10a9f690-1c9c-8c4e-5f67-2007ea0c5ceb",
            "tms": "SH018737530000",
            "cravetv": "m32254",
            "asin_de": "B00ZWBWZXW",
            "gti": "amzn1.dv.gti.10a9f690-1c9c-8c4e-5f67-2007ea0c5ceb",
            "ontv_de": "SH026719310000"
          }
        },
        {
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.a63315af-c728-56bc-90bb-0b8cbdcdad86",
          "value": "Bosch",
          "externalIds": {
            "ENTITY_ID": "amzn1.p11cat.merged-video.a63315af-c728-56bc-90bb-0b8cbdcdad86",
            "imdb": "tt2773036"
          }
        },
        {
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.d9ceb2e4-4802-557d-9461-24e19a438aad",
          "value": "Bosch",
          "externalIds": {
            "gvd": "GN2EAWZBASRC1PJ",
            "ENTITY_ID": "amzn1.p11cat.merged-video.d9ceb2e4-4802-557d-9461-24e19a438aad"
          }
        },
        {
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.75f0a242-6a4c-5912-97be-a06a3a0d5e05",
          "value": "Bosch",
          "externalIds": {
            "ENTITY_ID": "amzn1.p11cat.merged-video.75f0a242-6a4c-5912-97be-a06a3a0d5e05",
            "tms": "SH018739470000",
            "ontv_gb": "SH018739470000"
          }
        },
        {
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.ca646a58-8e1d-55f3-acc7-30245f8ac202",
          "value": "Bosch",
          "externalIds": {
            "gvd": "GN794XKHCHGKGJZ",
            "ENTITY_ID": "amzn1.p11cat.merged-video.ca646a58-8e1d-55f3-acc7-30245f8ac202"
          }
        }
      ]
    },
    "header": {
      "payloadVersion": "3",
      "messageId": "20d83e2d-6b20-4590-92ee-f252c2f607a9",
      "namespace": "Alexa.RemoteVideoPlayer",
      "name": "SearchAndPlay",
      "correlationToken": "cf662810-879e-4d8f-afb7-7488b778cd35"
    },
    "endpoint": {
      "cookie": {

      },
      "endpointId": "VSKTV",
      "scope": {
        "token": "452d41e7-8e4a-71ed-e8fb-dd31b126bf2e",
        "type": "BearerToken"
      }
    }
  }
}

The following table lists the kinds of directives you can handle with your Lambda function:

Directive Type API Reference
Play directives Alexa.RemoteVideoPlayer Interface (see the SearchAndPlay section)
Search directives Alexa.RemoteVideoPlayer Interface (see the SearchAndDisplayResults section)
Transport control directives Alexa.PlaybackController Interface
Alexa.SeekController
Channel Navigation directives Alexa.ChannelController Interface
Launch directives Alexa.Launcher
Recording directives Alexa.RecordController
Alexa.VideoRecorder

You can read more details about each directive in Interpreting and Reacting to Directives. (This will give you a sense of your expectations of what you must handle when you integrate a video skill for your Fire TV app.)

Alexa sends these directives to your AWS Lambda function through the Video Skill API.

Lambda processes the directives

Lambda is an AWS service that runs code in the cloud without requiring you to have a server to host the code (serverless computing). Your Lambda function processes these directives delivered by the Video Skill API and then responds with a brief status message.

Your Lambda function can use a variety of programming languages, but the sample Lambda code in this documentation uses Node JS. You are responsible for programming the logic in your Lambda function. In other words, Amazon delivers the directive to your Lambda function, and it is your responsibility to figure out how to incorporate the needed actions in your app.

You have flexibility in the way you process the directives coming to your Lambda. For example:

  • You can code your Lambda function to handle the directives entirely within the Lambda function itself. Your Lambda function might need to query backend databases or other services to do lookups to get needed information. After the Lambda performs the needed action or retrieves the right information, Lambda can send the instruction to your app.
  • Your Lambda function can send the directive directly to your app, and your app can handle the processing of the directive. For example, your app can perform queries or other services to do lookups to get needed information before performing some action, and so on.

Lambda sends instruction to your app through ADM

After your Lambda function processes the incoming directive from Alexa, your Lambda function sends instructions to your Fire TV app through push notifications using Amazon Device Messaging (ADM). Your Fire TV app will integrate the Alexa Client Library, a Java library that assists with voice-enablement for your app. The library helps Alexa prioritize your app when it is active by providing this context to Alexa. The library also helps with authenticating the app with Alexa for automatic skill enablement.

Alexa will ensure that your ADM Registration ID is included in directives sent to your Lambda function. (If you prefer to use a different push notification architecture and cloud service, you can — see Alternatives to ADM.)

Your app acts on incoming message from ADM

Your app receives the instruction and performs the desired result for the user. For example, the action in your app might be to present a movie title to the customer. Again, your Lambda function can perform the processing of the directive, or your app can handle it.

Estimated Development Time

It can take anywhere from several weeks to several months to fully integrate video skills for your Fire TV app. Assuming that your content is already catalog-integrated, the bulk of the development work for the video skills involves creating logic to handle the incoming directives from Lambda function.

The process for integrating a video skill for your Fire TV app is broken out into a series of steps. See Integration Steps in "Process Overview for Creating Video Skills on Fire TV" for details.

You can complete the initial integration steps (steps Step 1: Create Your Video Skill and Lambda Function and Step 2: Enable your Video Skill on an Echo Device and Test), which will allow you to see the directives sent from Alexa to your Lambda function in the cloud, in about two hours. Seeing the directives will give you a better sense of the scope of the implementation.

Naming Conventions: "Video Skills" versus "Video Skills Kit"

You might regularly encounter the term "Video Skills Kit" or "VSK." Video Skills Kit (VSK) and video skills refer to the same thing. "Video Skills Kit" appeared in communications and caught on with early adoption partners. However, the Alexa Skills Kit supports many different types of skills: flash briefing skills, smart home skills, music skills, meetings skills, cooking skills, video skills, and more. (Each skill within the Alexa Skills Kit isn't its own kit.)

Video skills for Fire TV apps refers to implementing voice-interactivity through the Video Skill API with your Fire TV app. Though you will still see "Video Skills Kit" or VSK used here and there in code, sample apps, user interfaces, or other places, to reduce confusion we are phasing out this term in favor of video skills for Fire TV apps.

Other Implementations for Video Skills

In addition to creating video skills for Fire TV apps, you can also create video skills for multimodal devices such as Echo Show. Since the same developers who build Fire TV apps would also likely build integrations with multimodal devices, we've consolidated the documentation together here. Multimodal devices such as Echo Show (and Echo Show mode on third-party devices) use an "app-less" framework that leverages your same Amazon catalog integration along with your existing HTML5 Web player for playback. Multimodal devices also provide some existing templates for rendering browse/search on device. For more details, see Video Skills for Multimodal Devices Overview.

If you're a device manufacturer building set-top boxes, consoles, and smart TVs (living room entertainment devices), you can build video skills directly into these devices to allow customers to launch apps, navigate channels, and more. The video skill implementation with devices involves leveraging the off-the-shelf Gracenote catalog for live TV and video-on-demand content, building a Lambda to support play, search, navigate, and record functionality, ensuring your device software is sending state information, handling cloud-to-device communication, and more. Documentation for device manufacturers implementing Alexa video skills appears within the main Alexa Skills Kit navigation. For more information, see Understand the Video Skill API.

Next Steps

To get started creating a video skill for your Fire TV app, go to Process Overview for Creating Video Skills for Fire TV Apps.