Overview for Voice-enabling Your App and Content
With the release of Fire TV Cube, customers can interact with their TVs in a hands-free way (an interaction referred to as "far-field control"). They can ask Alexa to play content, search for content, control playback, and change channels on their Fire TV using voice. Even without Fire TV Cube, if users link their Alexa-enabled device (such as an Echo or Echo Dot) with Fire TV, they can also interact with content and control playback in a similar hands-free way. Or using the Alexa Voice Remote, customers can interact with their TV by pressing down the voice button.
To support voice interactions on Fire TV, it's becoming increasingly important that you voice-enable your apps. There are several techniques for voice-enabling your app: Video Skills Kit, Media Session API, and In-App Voice Scrolling and Selection.
Integrating with Alexa introduces some terms that might be unfamiliar. The following glossary defines some of these terms.
- Alexa Skill
- A capability or ability of Alexa. Alexa provides a set of built-in skills (such as playing music), and developers can use the Alexa Skills Kit to give Alexa new skills. A skill includes both the code (in the form of a cloud-based service) and the configuration provided on the developer console.
- Alexa Voice Remote
- A remote control for Fire TV that offers a voice button. Interacting with this voice-enabled remote (even if you're far away from your TV) is still considered "near field" control because you're near the microphone array of the remote control.
- far field
- Far away (and hands free) voice interactions with the device.
- Fire TV Cube
- The first Fire TV device offering a hands-free TV experience (far field control). Without a remote, you can use your voice to access, launch, and control content, turn on your TV and AV devices, switch inputs, adjust the volume, search for content, and more from a distance. See Device Specifications for Fire TV for details.
- An index of the media on Fire TV. Integrating your media catalog with Amazon allows your content to be discovered and launched from Amazon devices. the Fire TV home screen when users search for it (either through voice or text search). See Getting Started with Catalog Ingestion for details.
- AWS Lambda
- An AWS compute service that runs your code in response to events and automatically manages the compute resources for you. This lets you run code (referred to as a Lambda function) in the cloud without managing servers. The code for your skill must be hosted as a Lambda function and is required for smart home skills. You can also choose to use a Lambda function for the service for a custom skill. AWS Lambda is a service offering by Amazon Web Services.
- AWS Lambda function
- The code uploaded to AWS Lambda. Lambda supports coding in Node.js, Java, Python, or C#. A smart home skill must be implemented as a Lambda function. You can also choose to use a Lambda function for the service for a custom skill.
- local search
- A search for content within a specific catalog-integrated app on Fire TV.
- near field
- Voice interactions that are near or close to the device.
- transport controls
- Playback controls while watching media (Play, Pause, Stop, Rewind, Fast-forward, etc.). Also called "media controls."
- universal search
- A search for content across all catalog-integrated apps on Fire TV. You can initiate a universal search using voice or text. All voice searches by default are universal searches. Any search using the search button within the Fire TV UI (rather than using the search provided within a specific app) is also a universal search.
- The words the user says to Alexa to convey what they want to do, or to provide a response to a question Alexa asks.
- Video Skills Kit
- Refers to a toolkit of files (e.g., sample apps, sample Lambda code, etc.) referred to when implementing the Video Skill API. See Video Skill API.
- Video Skill API
- A set of APIs that enable the far-field control of video devices and streaming services using an Alexa-enabled device. See Video Skill API for details.
Video Skills Kit
The bundle of Lambda files and other code to integrate the Alexa Video Skill API is referred to as the "Video Skills Kit," or VSK. The VSK integrates the Alexa Video Skill API, which provides the deepest integration of your app and its content with Fire TV. Integrating the Video Skills Kit gives users the following capabilities:
- App launching: When a customer asks to play or search for specific content, Alexa automatically launches the correct Fire TV app. When customers say "Alexa, open <app name>," they are directed to the app’s homepage. The Video Skill API automatically enables the Alexa video skill to launch the app.
- Quick play: Customers can ask Alexa to play video by saying, "Alexa, play <show name> " or "Alexa, play <show name> on <app name>. Alexa routes the user to the correct app with that content, and Fire TV begins playback automatically (rather than just going to the detail page).
- Search: Customers can ask Alexa to search for content by saying "Alexa, find <show name>." Searches that don't limit the scope to an app are called "universal searches" — they look for the content across all catalog-integrated Fire TV apps. Customers can also limit their search within the scope of an app, saying "Alexa, find <show name> on <app name>" or "Alexa, find dramas on <app name>." Searches that include an app are called "local searches." When customers specify the app they’d like to use, Fire TV takes them to the search functionality within the app.
- Transport Controls: Customers can control playback via voice through utterances such as "Alexa, fast forward", "Alexa, fast forward 5 minutes", "Alexa, next", "Alexa, previous", as well as rewind, pause, resume, and stop.
- Channel Change: For apps that offer live TV functionality, customers can switch between channels through utterances such as "Alexa, tune to <app name>".
By integrating the Video Skill API, customers will experience less friction in getting to your content, which will improve the customer experience. It will make it easier for customers to discover and play your content — which will encourage customers to watch more.
Note that even though your content becomes universally available on Fire TV, you still control what content is played and displayed on search results.
Integrating the Video Skills Kit involves the following:
- Creating an Alexa video skill
- Creating a new AWS Lambda function
- Including the Alexa Video Skill Client Library into your Fire TV App
- Integrating your catalog into the Amazon Catalog (if you have video-on-demand content)
Because the documentation is still evolving, some of it currently resides in a password-protected space rather than openly online. Your solution architect can evaluate whether Video Skill API is right for you and, if so, give you access to the documentation and sample code.
Completing the Video Skill API integration can take approximately one month for development work and several weeks for certification, in addition to the time needed to integrate your video-on-demand content into the Amazon Catalog. Reach out to Amazon to see if you qualify for Video Skills Kit for Fire TV.
For details on getting your video on Echo Show, see Understand Video Skill Integration for Echo Show.
Media Session API
If you don't have the bandwidth or resources to implement the Video Skill API, or your planned implementation is some months into the future, you can voice-enable the media playback controls in your app using the Media Session API. Media Session is an Android API that allows streaming applications to receive media commands, and it's the recommended best practice for handling events from remote controls, Bluetooth, ADB, the Fire TV companion app, and more.
Integrating Media Session allows customers to say commands such as "Play," "Pause," "Rewind," etc, during media playback. These commands work in both near field and far field devices. Media Session won't let users perform the more advanced voice controls described in the Video Skill API, where you can launch apps, search for content, and more through voice. Media Session integration just voice-enables the playback controls.
If you've already implemented Media Session in your app (most developers have), there's little to no extra work to voice-enable Media Session. You just add a special Alexa permission to your app manifest. Full details are available here: Voice-enabled Transport Controls through the Android Media Session API.
If you're new to Media Session and don't yet have it integrated into your app yet, see this tutorial: Implementing Voice Control with the Media Session API on Amazon Fire TV.
In-app Scrolling and Selection
Fire TV Cube allows users to perform scrolling and selection using common Alexa phrases. The in-app voice scrolling and selection works by mapping D-pad navigation events to your voice commands. D-pad refers to the remote control's directional keypad, which is used to scroll right, left, up, and down. Alexa converts these voice commands into D-pad navigation events that are sent to the app.
In-app scrolling and selection is a feature that Amazon manually activates on the back-end for apps, after ensuring that the app will support the commands. Amazon is gradually increasing the number of apps with scrolling and selection enabled. For more details, see In-App Voice Scrolling and Selection.