Developer Console

Video Skills Kit (VSK) for Fire TV Apps Overview

The Video Skills Kit (VSK) for Fire TV apps allow customers to use natural language commands to search for your app's content, launch your app, control media playback, change the channel, and more. For example, when you implement the VSK in your app, customers can say phrases like "Play Bosch from [app name]" or "Play Bosch," and your app will play the media.

To implement the VSK for your Fire TV app, you primarily implement the Alexa Video Skill API. However, the integration will also involve additional APIs and services, including Amazon Device Messaging (ADM), AWS Lambda, AWS IAM, Login with Amazon, Node JS, Alexa Client Library, and more. Incorporating the VSK for your Fire TV app gives customers the richest voice experience with your app, driving up the levels of engagement and discovery for your content.

Capabilities Provided by the VSK for Fire TV Apps

Integrating the VSK with your Fire TV app gives customers the following capabilities:

  • App launching: When a customer asks to play or search for specific content, Alexa launches the correct Fire TV app. When customers say "Alexa, open <app name>," they are directed to the app's homepage. The video skill automatically enables the Alexa Video Skill API to launch the app.
  • Quick play: Customers can ask Alexa to play video by saying, "Alexa, play <show name> " or "Alexa, play <show name> on <app name>." Alexa routes the user to the correct app with that content, and Fire TV begins playback automatically (rather than just going to the detail page).
  • Search: Customers can ask Alexa to perform universal searches for content by saying "Alexa, find <show name>." Searches that don't limit the scope to an app are called "universal searches" because they look for the content across all catalog-integrated Fire TV apps. Searches that limit their scope to a specific app are called "local searches." A local search might limit this scope with the phrase, "Alexa, find <show name> on <app name>" or "Alexa, find <genre> on <app name>."
  • Transport Controls: Customers can control playback by saying "Alexa, fast forward," "Alexa, fast forward 5 minutes," "Alexa, next," "Alexa, previous," as well as "Alexa, rewind," "Alexa, pause," "Alexa, resume," and "Alexa, stop."
  • Channel Change: For apps that offer live TV functionality, customers can switch between channels by saying "Alexa, tune to <app name>".

A frequent interaction might be as follows:

A customer asks Alexa to play a specific TV show
A customer asks Alexa to play a specific TV show
Through the VSK integration with your Fire TV app, Alexa finds and plays the content
Through the video skill enabled in your app, Alexa finds and plays the content

Note that you need only implement the logic for capabilities that are available in your app. For example, if channels aren't available in your app, you wouldn't need to implement channel change behaviors with the VSK.

Overall, voice capabilities make it easier for customers to discover and play your content. Apps with movies and TV shows work especially well with video skills. Including voice interactivity with your app encourages customers to engage more frequently with your content.

Not only does incorporating the VSK into your Fire TV app increase engagement, voice interactions are becoming a standard expectation for many devices, especially with streaming media players. For more background on the ways voice interactions increase app usage by simplifying the experience, see Getting into the Voice Mindset from the AWS Training and Certification library.

Prerequisite: Catalog Integration

To incorporate the VSK for Fire TV, your app must be catalog-integrated. Catalog integration refers to the process of describing your app's media according to Amazon's Catalog Data Format (CDF), which is an XML schema, and regularly uploading your catalog into an S3 bucket.

Catalog integration is restricted to apps that have long-form movies or episodic TV shows significant enough to be integrated in and matched to IMDb, Amazon Video, or Gracenote. If your catalog consists of content that might not be included in these sources, reach out to your Amazon business contact for guidance.

If you don't qualify for catalog integration, you cannot implement the VSK with your Fire TV app. However, you can still incorporate some voice interactivity with your app through two related technologies:

Note that if you implement the video skill for your Fire TV app, you don't need to worry about implementing any other voice integration technologies. The VSK's capabilities will automatically include the same transport control functionality as with MediaSession and in-app voice scrolling and selection.

Expectations in Handling Directives

As you develop your VSK integration, you should understand what directives your Fire TV app will receive and how you're expected to react to them. A "directive" is a set of data and instructions, expressed in JSON, sent from Alexa to your Lambda function. For example, the directive might be to search for a TV show or play a movie.

The video skill sends these directives to your Lambda function, and there you're expected to process the directive with logic either in your Lambda function or app. You can read more details about each directive in the API Reference, specifically in Step 9: Interpret and React to Alexa Directives.

Supported Countries

VSK for Fire TV is not supported in every country. If you're developing an app for a country where video skills aren't yet supported, you cannot integrate the VSK into your app. Also, if you live in a VSK-unsupported country but are developing an app for a VSK-supported country, you will need to be added to an allow list to see the necessary video skill configuration options in the Alexa console. Reach out to your Amazon representative for details.

Additionally, note that you must create your Lambda in a specific AWS region depending on your location. This region is strictly enforced rather than optional. For example, if you're in the UK, you must use the EU (Ireland) region in AWS for your Lambda function. For a detailed list of supported countries and required AWS regions, see Supported Countries for Video Skills on Devices.

What You'll Need

You will need the following to integrate the VSK with your Fire TV app:

You will also configure a variety of services within the Appstore Developer Console, the Alexa Developer Console, and AWS. These services include IAM, Lambda, Cloudwatch, Login with Amazon Security Profile, and more.

High-level Workflow

At a high-level, to integrate the VSK for your Fire TV app, you first create a video skill in the Alexa Developer Console and then associate it with a Lambda function on AWS. When users interact with your app through voice, Alexa voice services in the cloud convert the user's commands into JSON objects, called directives.

Your video skill sends these directives to your Lambda function. Your Lambda function inspects the request, performs any needed lookups or other processing, and then sends the needed information to your app. The Lambda function uses Amazon Device Messaging (ADM), a push notification service, to communicate with your app. A security profile associated with your app authorizes the communication between your app and ADM.

After receiving the communication from your Lambda function, your app would then show a list of results or initial playback of the requested media as requested.

Detailed Workflow

The previous section explained how the VSK works at a high level. Now let's walk through the video skill workflow with more detail and granularity. The following diagram shows the video skill workflow on Fire TV:

Video skill diagram and workflow for Fire TV apps
VSK workflow for Fire TV apps

Alexa-enabled device listens for natural language commands

On Fire TV, Alexa listens for natural language commands from users. Supported utterances (phrases that Alexa understands) include search, play, app launch, channel change, and transport control commands. The Alexa-enabled device sends these commands to Alexa in the cloud.

Alexa processes phrases and generates out directives

In the cloud, Alexa processes the user's utterances using automatic speech recognition and converts the speech to text. Alexa also processes the commands with natural language understanding to recognize the intent of the text. (As an app developer, you get all of this language processing and interpretation for free.)

Directives are passed to Lambda through the Video Skill API

The output from Alexa in the cloud, which handles the parsing and interpretation of the user's utterances, is a directive. A directive is a set of data and instructions, expressed as a JSON object. For example, when a user says "Watch Big Buck Bunny," Alexa converts this utterance into a SearchAndPlay directive that has a specific JSON structure, like this:

    "directive": {
        "payload": {
            "entities": [
                    "type": "Video",
                    "uri": "entity://provider/program/amzn1.p11cat.merged-video.8a42b984-28c2-5c09-bd24-8d924e004d3f",
                    "value": "Big Buck Bunny",
                    "externalIds": {
                        "hawaii_us": "tt1254207",
                        "ENTITY_ID": "amzn1.p11cat.merged-video.8a42b984-28c2-5c09-bd24-8d924e004d3f",
                        "imdb": "tt1254207",
                        "tms": "MV006850300000"
                    "type": "Video",
                    "uri": "entity://provider/program/amzn1.p11cat.merged-video.4176eed9-eb18-546a-b934-314f50abe8db",
                    "value": "Big Buck Bunny",
                    "externalIds": {
                        "ENTITY_ID": "amzn1.p11cat.merged-video.4176eed9-eb18-546a-b934-314f50abe8db"
                    "type": "Video",
                    "uri": "entity://provider/program/amzn1.p11cat.merged-video.1ef9c397-544b-5632-a0d3-9b6439113616",
                    "value": "Big Buck Bunny",
                    "externalIds": {
                        "ENTITY_ID": "amzn1.p11cat.merged-video.1ef9c397-544b-5632-a0d3-9b6439113616",
                        "tms": "12631647"
                    "type": "Video",
                    "uri": "entity://provider/program/amzn1.p11cat.merged-video.5cdb7c5c-8771-55cd-b552-215e131223f1",
                    "value": "Big Buck Bunny",
                    "externalIds": {
                        "ENTITY_ID": "amzn1.p11cat.merged-video.5cdb7c5c-8771-55cd-b552-215e131223f1"
                    "type": "Video",
                    "uri": "entity://provider/program/amzn1.p11cat.merged-video.3cabe805-968e-5001-813c-f46b5b1069d7",
                    "value": "Big Buck Bunny",
                    "externalIds": {
                        "ENTITY_ID": "amzn1.p11cat.merged-video.3cabe805-968e-5001-813c-f46b5b1069d7",
                        "tms": "SH023726740000"
        "header": {
            "payloadVersion": "3",
            "messageId": "72dfff1c-17df-44c5-acbb-19f491e87609",
            "namespace": "Alexa.RemoteVideoPlayer",
            "name": "SearchAndPlay",
            "correlationToken": "1bb6264b-e248-4087-901e-30c3462082b7"
        "endpoint": {
            "endpointId": "1736bf8bd3091561##amzn1-ask-skill-4c43ae24-ee76-4a78-a189-cc06b64d1be8##development##com-fireappbuilder-android-football-streamz",
            "cookie": {
                "VSKClientVersion": "1.4.5",
                "deviceType": "A2LWARUGJLBYEW",
                "appPackageName": "",
                "deviceId": "G070L809716314UB",
                "appName": "Sample Alexa VSK Fire TV App",
                "applicationInstanceId": "amzn1.adm-registration.v3.Y29tLmFtYXpvbi5EZXZpY2VNZXNzYWdpbmcuUmVnaXN0cmF0aW9uSWRFbmNyeXB0aW9uS2V5ITEhYUtudnpaOU1xYlV5aU04NElIdU80a3FYa29lVVFDbE5oa2QzM3FhL3hPUzFMaTNmOXBhTkZPeTVaUmFYK3RaU01Cc2Q4b0U0ZzVkOVdhZDR0TVIyb2UxMitUd3dwL0ZEaGpKMkN3bXJhUnUvNThOa0VCRmg1TzYrVmxGN1ZadWlwWmpZZnhEeU1USW1NY1d2MGZYZVMyVDRjVVZSdGtrMWJoQ1FNWEoyQlpRbVNBUmM1V2R5dG5TWUhJZHNwNHg3TzM0MExwQzh4NlhtZlpJY2lpZS9IZktpM0xDYkNFUHlWUTJYU2ZJdVZXNGk3T0c2T0xpWDVlTkl3YXVVZjAyd3JTWGpHVGJrMHRNYU5DcHQ4NGhBVVphQnBHR3dCclVDWGFjcURrUWhMWnd2WjZtZEhqcjNOSkhHd0RDNWt3UGdDWlVRZXRQUkVNVnNqNldlNWtMZ3A0VDdubUZ6SklSeStiSjkxMFEveVh3PSFINFJjeWd2djlCQ2Q1c3NocEptVU5RPT0"
            "scope": {
                "token": null,
                "type": "BearerToken"

In this case, the video skill's catalog is called "hawaii_us." The first item in the entities array contains a reference to this media.

The following table lists the kinds of directives you can handle with your Lambda function:

Directive Description
RemoteVideoPlayer - SearchAndPlay Sent when users ask Alexa to play specific video content.
RemoteVideoPlayer - SearchAndDisplayResults Sent when users ask Alexa to search for video content.
PlaybackController Sent when users request to play, stop, and navigate playback for video content.
SeekController Sent when users request to fast-forward (or skip) or rewind to a specific duration.
ChannelController Sent when users request to change the channel
RecordController Sent when users request to start or stop recordings.
VideoRecorder Sent when users request to search, cancel, or delete recordings.
KeypadController Sent when users request to scroll right or left, page up or down, or select the item in focus.

In your Lambda and app, you declare a list of the capabilities you support in your response to the Discover directive. You also list the capabilities in your app when you integrate the Alexa Client Library.

The capabilities you declare determine which directives Alexa will send to your Lambda. For example, if you don't declare ChannelController capabilities, Alexa won't send your Lambda any ChannelController directives.

Lambda processes the directives

Lambda is an AWS service that runs code in the cloud without requiring you to have a server to host the code (serverless computing). Your Lambda function processes these directives delivered by the Video Skill API and then responds with a brief status message.

Your Lambda function can use a variety of programming languages, but the sample Lambda code in this documentation uses Node JS. You are responsible for programming the logic in your Lambda function. In other words, Amazon delivers the directive to your Lambda function, and it is your responsibility to figure out how to incorporate the needed actions in your app.

You have flexibility in the way you process the directives coming to your Lambda. You can process the directives in two main ways:

  • You can code your Lambda function to handle the directives entirely within the Lambda function itself. Your Lambda function might need to query backend databases or other services to do lookups to get needed information. After the Lambda performs the needed action or retrieves the right information, Lambda can send the instruction to your app.
  • Your Lambda function can pass along the directive directly to your app, and your app can handle the processing of the directive. For example, your app can perform queries or other services to do lookups to get needed information before performing some action, and so on.

Lambda sends instruction to your app through ADM

After your Lambda function processes the incoming directive from Alexa, your Lambda function sends instructions to your Fire TV app through push notifications using Amazon Device Messaging (ADM). Your Fire TV app will incorporate the Alexa Client Library, a Java library that assists with voice-enablement for your app. The library helps Alexa prioritize your app when it is active by providing this context to Alexa. The library also helps with authenticating the app with Alexa for automatic skill enablement.

Alexa will ensure that your ADM Registration ID is included in directives sent to your Lambda function. (If you prefer to use a different push notification architecture and cloud service, you can — see Alternatives to ADM.)

Your app acts on incoming message from ADM

Your app receives the instruction and performs the desired result for the user. For example, the action in your app might be to present a movie title to the customer. Again, your Lambda function can perform the processing of the directive directly, or your app can handle it.

Estimated Development Time

It can take anywhere from several weeks to several months to fully integrate the VSK for your Fire TV app. Assuming that your content is already catalog-integrated, the bulk of the development work for the VSK involves creating logic to handle the incoming directives that your Lambda function receives.

The process for integrating the VSK for your Fire TV app is segmented out into a series of steps. See Integration Steps in "Process Overview for Implementing the VSK for Fire TV" for details.

You will first complete an integration using a sample app (which already has the VSK integrated) and sample Lambda function. This initial integration will take about half a day (depending on your familiarity with Android, AWS, and the Amazon Developer Console). This simple integration will allow you to see the directives sent from Alexa to your Lambda function in the cloud. Seeing the directives will give you a better sense of the scope of the implementation.

After you walk through the integration with the sample app, you'll need to perform the same integration steps with your real app.

Other Implementations for the VSK

In addition to implementing the VSK for your Fire TV app, you can also implement the VSK with multimodal devices such as Echo Show. Multimodal devices such as Echo Show use an "app-less" framework that leverage your same Amazon catalog integration along with your existing HTML5 Web player for playback. Multimodal devices also provide some existing templates for rendering browse and search pages on the device. For more details, see Video Skills Kit for Multimodal Devices Overview.

If you're a device manufacturer building set-top boxes, consoles, and smart TVs (called "living room entertainment devices"), you can implement the VSK directly into these devices to allow customers to launch apps, navigate channels, and more. The VSK implementation with devices involves leveraging the off-the-shelf Gracenote catalog for live TV and video-on-demand content, building a Lambda to support play, search, and navigation functionality, ensuring your device software is sending state information, handling cloud-to-device communication, and more. For more details, see Video Skills for Living Room Entertainment Devices.

VSK versus Custom Skills with Screen Displays

The VSK is intended for video providers whose catalog content is often in IMDb (or similar per the locale) or for device manufacturers making their devices voice interactive. The implementation involves handling directives from Alexa with your own Fire TV app or video service so that you can support requests such as “Alexa, play Interstellar.”

In contrast, if you just want to provide accompanying visuals for your Alexa skill (for example, some images, short video clips, or text displayed on a screen), you create a custom skill (rather than a video skill) and render the visual experiences with display templates using the Alexa Presentation Language (APL). For example, you might want to show text or images related to a quiz skill on an Echo Show screen. If that's what you're trying to build (instead of the more involved interactive voice experience with your video content that leverages the VSK), see Create Skills for Alexa-Enabled Devices with a Screen. The implementation process for custom skills with screen displays is simpler and does not require extensive developer expertise.


Alexa introduces many new terms that might be unfamiliar. You can find definitions of terms in the Glossary.

Next Steps

To get started implementing the VSK for your Fire TV app, go to Process Overview for Creating Video Skills for Fire TV Apps.