Video Skills Kit (VSK) for Fire TV Apps Overview

The Video Skills Kit (VSK) for Fire TV apps allow customers to use natural language commands to search for your app's content, launch your app, control media playback, change the channel, and more. For example, the VSK can enable customers to say phrases like "Play Bosch from [app name]" or "Play Bosch," and your app will play the media.

To implement the VSK for your Fire TV app, you primarily implement the Alexa Video Skill API from the Alexa Skills Kit. However, the integration into Android-based streaming media apps for Fire TV will involve additional APIs and services, including Amazon Device Messaging (ADM), AWS Lambda, AWS IAM, Login with Amazon, Cloudwatch, Node JS, Alexa Client Library, and more. Incorporating the VSK for your Fire TV app gives customers the richest voice experience with your content, driving up the levels of engagement and discovery for your content.

Capabilities Provided by the VSK for Fire TV Apps

Integrating the VSK with your Fire TV app gives customers the following capabilities:

  • App launching: When a customer asks to play or search for specific content, Alexa automatically launches the correct Fire TV app. When customers say "Alexa, open <app name>," they are directed to the app’s homepage. The video skill automatically enables the Alexa Video Skill API to launch the app.
  • Quick play: Customers can ask Alexa to play video by saying, "Alexa, play <show name> " or "Alexa, play <show name> on <app name>. Alexa routes the user to the correct app with that content, and Fire TV begins playback automatically (rather than just going to the detail page).
  • Search: Customers can ask Alexa to perform universal searches for content by saying "Alexa, find <show name>." Searches like this, which don't limit the scope to an app, are called "universal searches," since they look for the content across all catalog-integrated Fire TV apps. Searches that limit their scope to a specific app are called "local searches." Customers can also perform local searches by saying "Alexa, find <show name> on <app name>" or "Alexa, find <genre> on <app name>."
  • Transport Controls: Customers can control playback via voice through utterances such as "Alexa, fast forward", "Alexa, fast forward 5 minutes", "Alexa, next", "Alexa, previous", as well as rewind, pause, resume, and stop.
  • Channel Change: For apps that offer live TV functionality, customers can switch between channels through utterances such as "Alexa, tune to <app name>".

A frequent interaction might be as follows:

A customer asks Alexa to play a specific TV show
A customer asks Alexa to play a specific TV show
Through the VSK integration with your Fire TV app, Alexa finds and plays the content.
Through the video skill enabled in your app, Alexa finds and plays the content.

Note that you need only implement the logic for capabilities that are available in your app. For example, if channels aren't available in your app, you wouldn't need to implement Channel Change behaviors with the VSK.

Overall, voice capabilities make it easier for customers to discover and play your content. Apps with movies and TV shows work especially well with video skills. Including voice interactivity with your app encourages customers to engage more frequently with your content.

Not only does incorporating the VSK into your Fire TV app increase engagement with your app, voice interactions are becoming a standard expectation for more and more devices. For more background on the ways voice interactions increase app usage by simplifying the experience, see Getting into the Voice Mindset from the AWS Training and Certification library.

Prerequisite: Catalog Integration

To incorporate the VSK for Fire TV, your app must be catalog-integrated. Catalog integration refers to the process of describing your app's media according to Amazon's Catalog Data Format (CDF), which is an XML schema, and regularly uploading your catalog into an S3 bucket following the processes described in

Catalog integration is restricted to apps that have long-form movies or episodic TV shows that are significant enough to be integrated in and matched to IMDb, Amazon Video, or Gracenote. If your catalog consists of content that might not be included in these sources, reach out to your Amazon Business contact for guidance.

If you don't qualify for catalog integration, then you cannot implement the VSK with your Fire TV app. However, you can still incorporate some voice interactivity with your app through two related technologies:

Note that if you implement the video skill for your Fire TV app, the skill's capabilities will automatically include transport controls. In-app scrolling and selection is available by default with the VSK but can be further customized through the KeypadController API. (In short, when you implement the Alexa video skill, you don't need to worry about any other voice integration efforts. The video skill provides the deepest level of voice enablement.)

Expectations in Handling Directives

As you develop your VSK integration, you should understand what directives your Fire TV app will receive and how you're expected to react to them. A "directive" is a set of data and instructions, expressed in JSON, sent from Alexa to a video skill. For example, the directive might to search for a TV show or play a movie. The video skill sends these directives to your Lambda function, where you're expected to process them with logic in your app. You can read more details about each directive in Step 7: Interpreting and Reacting to Directives.

Supported Countries

VSK for Fire TV is not supported in every country. If you live in a country where video skills aren't supported, you cannot integrate the VSK.

Additionally, the AWS regions you must use for your Lambda function are strictly enforced rather than optional. For example, if you're in the UK, you must use the EU (Ireland) region in AWS for your Lambda function.

For a detailed list of countries and support, see Supported Countries for Video Skills on Devices. See also AWS Regions and Video Skills in that same topic.

What You'll Need

You will need the following to integrate the VSK with your Fire TV app:

You will also configure a variety of services within the Appstore Developer Console, the Alexa Developer Console, and AWS. These services include IAM, Lambda, Cloudwatch, Security Profile, and more.

High-level Workflow

At a high-level, to integrate the VSK for your Fire TV app, you first create a video skill in the Alexa Developer Console and associate it with a Lambda function on AWS. When users interact with your app through voice, Alexa voice services in the cloud convert the user's commands into JSON objects, called directives.

The Video Skill API sends these directives from Alexa to your Lambda function. Your Lambda function inspects the request and takes any necessary actions in your app (such as returning results or initiating playback). The Lambda function uses Amazon Device Messaging (ADM), a push notification service, to communicate with your app.

Detailed Workflow

The previous section showed how the VSK works at a high level. Now let's walk through the video skill workflow with more detail and granularity. The following diagram shows the video skill workflow on Fire TV:

Video skill diagram and workflow for Fire TV apps
Video skill diagram and workflow for Fire TV apps

Alexa-enabled device listens for natural language commands

On Fire TV, Alexa listens for natural language commands from users. Supported utterances (as they're called) include search, play, app launch, channel change, and transport control commands. The Alexa-enabled device sends these commands to Alexa in the cloud.

Alexa processes phrases and generates out directives

In the cloud, Alexa processes the user's utterances using automatic speech recognition and converts the speech to text. Alexa also processes the commands with natural language understanding to recognize the intent of the text. (As an app developer, you get all of this language processing and interpretation for free.)

Directives are passed to Lambda through the Video Skill API

The output from Alexa in the cloud, which handles the parsing and interpretation of the user's utterances/commands, is a "directive." A directive is a set of data and instructions, expressed as a JSON object, that provides direction on how to respond to the user's utterances. For example, when a user says "Play Bosch," Alexa converts this into a "Play directive" that has a specific JSON structure, like this:

  "directive": {
    "payload": {
      "entities": [
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.858df979-c070-5533-9b1f-ecae15e9f139",
          "value": "Bosch",
          "externalIds": {
            "avc_vending_de": "amzn1.dv.gti.e0a9f6b7-ca7e-dc0c-c80e-f5801c580da8",
            "ENTITY_ID": "amzn1.p11cat.merged-video.858df979-c070-5533-9b1f-ecae15e9f139",
            "avc_vending_us": "amzn1.dv.gti.56a9f78c-4cfe-36f0-663d-9104c6dd6595",
            "asin_row_na": "B01M32CYV3",
            "asin_row_fe": "B01MCYRKGY",
            "avc_vending_jp": "amzn1.dv.gti.fea9f575-39fd-7a77-622b-a400f9b511f8",
            "asin_us": "B00S45ZDVE",
            "avc_vending": "amzn1.dv.gti.8cac011f-78c3-114b-b3f8-246a48f23ec0",
            "asin_roe_eu": "B01MCYRQHG",
            "imdb": "tt3502248",
            "ontv": "SH018737530000",
            "asin_gb": "B00IGQC64I",
            "asin_row_eu": "B01MDRHYR2",
            "asin_jp": "B014QF5HMU",
            "avc_vending_gb": "amzn1.dv.gti.10a9f690-1c9c-8c4e-5f67-2007ea0c5ceb",
            "tms": "SH018737530000",
            "cravetv": "m32254",
            "asin_de": "B00ZWBWZXW",
            "gti": "amzn1.dv.gti.10a9f690-1c9c-8c4e-5f67-2007ea0c5ceb",
            "ontv_de": "SH026719310000"
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.a63315af-c728-56bc-90bb-0b8cbdcdad86",
          "value": "Bosch",
          "externalIds": {
            "ENTITY_ID": "amzn1.p11cat.merged-video.a63315af-c728-56bc-90bb-0b8cbdcdad86",
            "imdb": "tt2773036"
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.d9ceb2e4-4802-557d-9461-24e19a438aad",
          "value": "Bosch",
          "externalIds": {
            "gvd": "GN2EAWZBASRC1PJ",
            "ENTITY_ID": "amzn1.p11cat.merged-video.d9ceb2e4-4802-557d-9461-24e19a438aad"
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.75f0a242-6a4c-5912-97be-a06a3a0d5e05",
          "value": "Bosch",
          "externalIds": {
            "ENTITY_ID": "amzn1.p11cat.merged-video.75f0a242-6a4c-5912-97be-a06a3a0d5e05",
            "tms": "SH018739470000",
            "ontv_gb": "SH018739470000"
          "type": "Video",
          "uri": "entity:\/\/provider\/program\/amzn1.p11cat.merged-video.ca646a58-8e1d-55f3-acc7-30245f8ac202",
          "value": "Bosch",
          "externalIds": {
            "gvd": "GN794XKHCHGKGJZ",
            "ENTITY_ID": "amzn1.p11cat.merged-video.ca646a58-8e1d-55f3-acc7-30245f8ac202"
    "header": {
      "payloadVersion": "3",
      "messageId": "20d83e2d-6b20-4590-92ee-f252c2f607a9",
      "namespace": "Alexa.RemoteVideoPlayer",
      "name": "SearchAndPlay",
      "correlationToken": "cf662810-879e-4d8f-afb7-7488b778cd35"
    "endpoint": {
      "cookie": {

      "endpointId": "VSKTV",
      "scope": {
        "token": "452d41e7-8e4a-71ed-e8fb-dd31b126bf2e",
        "type": "BearerToken"

The following table lists the kinds of directives you can handle with your Lambda function:

Directive Description
RemoteVideoPlayer Interface Directives – SearchAndPlay Sent when users ask Alexa to play specific video content.
RemoteVideoPlayer Interface Directives –SearchAndDisplayResults Sent when users ask Alexa to search for video content.
PlaybackController interface directives Sent when users request to play, stop, and navigate playback for video content.
SeekController interface directives Sent when users request to fast-forward (or skip) or rewind to a specific duration.
ChannelController interface directives Sent when users request to change the channel
RecordController interface directives Sent when users request to start or stop recordings.
VideoRecorder interface directives Sent when users request to search, cancel, or delete recordings.
KeypadController interface directives Sent when users request to scroll right or left, page up or down, or select the item in focus.

(This will give you a sense of your expectations of what you must handle when you integrate the VSK with your Fire TV app.)

Alexa sends these directives to your AWS Lambda function through the Video Skill API.

Lambda processes the directives

Lambda is an AWS service that runs code in the cloud without requiring you to have a server to host the code (serverless computing). Your Lambda function processes these directives delivered by the Video Skill API and then responds with a brief status message.

Your Lambda function can use a variety of programming languages, but the sample Lambda code in this documentation uses Node JS. You are responsible for programming the logic in your Lambda function. In other words, Amazon delivers the directive to your Lambda function, and it is your responsibility to figure out how to incorporate the needed actions in your app.

You have flexibility in the way you process the directives coming to your Lambda. For example:

  • You can code your Lambda function to handle the directives entirely within the Lambda function itself. Your Lambda function might need to query backend databases or other services to do lookups to get needed information. After the Lambda performs the needed action or retrieves the right information, Lambda can send the instruction to your app.
  • Your Lambda function can send the directive directly to your app, and your app can handle the processing of the directive. For example, your app can perform queries or other services to do lookups to get needed information before performing some action, and so on.

Lambda sends instruction to your app through ADM

After your Lambda function processes the incoming directive from Alexa, your Lambda function sends instructions to your Fire TV app through push notifications using Amazon Device Messaging (ADM). Your Fire TV app will integrate the Alexa Client Library, a Java library that assists with voice-enablement for your app. The library helps Alexa prioritize your app when it is active by providing this context to Alexa. The library also helps with authenticating the app with Alexa for automatic skill enablement.

Alexa will ensure that your ADM Registration ID is included in directives sent to your Lambda function. (If you prefer to use a different push notification architecture and cloud service, you can — see Alternatives to ADM.)

Your app acts on incoming message from ADM

Your app receives the instruction and performs the desired result for the user. For example, the action in your app might be to present a movie title to the customer. Again, your Lambda function can perform the processing of the directive, or your app can handle it.

Estimated Development Time

It can take anywhere from several weeks to several months to fully integrate the VSK for your Fire TV app. Assuming that your content is already catalog-integrated, the bulk of the development work for the VSK involves creating logic to handle the incoming directives from Lambda function.

The process for integrating the VSK for your Fire TV app is broken out into a series of steps. See Integration Steps in "Process Overview for Implementing the VSK for Fire TV" for details.

You can complete the initial integration steps (steps Step 1: Create Your Video Skill and Lambda Function and Step 2: Enable your Video Skill on an Echo Device and Test), which will allow you to see the directives sent from Alexa to your Lambda function in the cloud, in about two hours. Seeing the directives will give you a better sense of the scope of the implementation.

Other Implementations for the VSK

In addition to implementing the VSK for your Fire TV app, you can also implement the VSK with multimodal devices such as Echo Show. Multimodal devices such as Echo Show (and Echo Show mode on third-party devices) use an "app-less" framework that leverages your same Amazon catalog integration along with your existing HTML5 Web player for playback. Multimodal devices also provide some existing templates for rendering browse/search on device. For more details, see Video Skills Kit for Multimodal Devices Overview.

If you're a device manufacturer building set-top boxes, consoles, and smart TVs (living room entertainment devices), you can implement the VSK directly into these devices to allow customers to launch apps, navigate channels, and more. The VSK implementation with devices involves leveraging the off-the-shelf Gracenote catalog for live TV and video-on-demand content, building a Lambda to support play, search, navigate, and record functionality, ensuring your device software is sending state information, handling cloud-to-device communication, and more.

VSK versus Custom Skills with Screen Displays

The VSK is intended for video providers whose catalog content is often in IMDb (or for device manufacturers making their devices voice interactive). The implementation involves handling directives from Alexa with your own Fire TV app or video service, so that you can support requests such as “Alexa, play Interstellar.”

In contrast, if you just want to provide accompanying visuals for your Alexa skill (e.g., some images, short video clips, or text displayed on a screen), you create a custom skill (rather than the VSK) and render the visual experiences on display templates using the Alexa Presentation Language (APL). For example, you might want to show text or images related to a quiz skill on an Echo Show screen. If that’s what you’re trying to build (instead of the more involved interactive voice experience with your video content that leverages the Video Skill API), then see Create Skills for Alexa-Enabled Devices with a Screen. The implementation process for custom skills with screen displays is simpler and does not require extensive developer expertise.


Alexa introduces many new terms that might be unfamiliar. You can find definitions in the Glossary.

Next Steps

To get started implementing the VSK for your Fire TV app, go to Process Overview for Creating Video Skills for Fire TV Apps.