Voice-enabled Transport Controls through the Android Media Session API
Some apps on Amazon Fire TV support voice-enabled playback controls. For example, when customers watch a movie on Amazon Video, they can say "Alexa, play" to start the media, "Alexa, pause" to pause the media, or "Alexa fast-forward" to jump ahead in the media's timeline. You can enable your app with these voice commands and more by integrating the Android Media Session API.
- Video Tutorial
- Requirements for Voice-enabled Media Playback
- About Android Media Session
- Supported Voice Commands
- Media Session Integration vs Video Skill API
- Integrating with Media Session for Voice-enabled Media Playback
- Enhanced Experience for Voice-Enabled Playback Controls using Media Session (Optional)
The following video provides an overview and tutorial for integrating the Media Session API on Fire TV:
The video includes a demo of the Media Session API in action (with a user saying transport control utterances such as "Fast-forward" and "Rewind" to a Fire TV), as well as detailed code examples for integrating Media Session API from scratch. You can also view the video slides here. The code examples and steps align more with the beginning-to-end code detail explained in this accompanying blog post.
In the sections below, the documentation is more abbreviated and assumes that you have already implemented the Media Session API in your app. If you haven't already implemented Media Session, see the more thorough code examples in the video and post.
Requirements for Voice-enabled Media Playback
To integrate Android Media Session into your app and enable voice commands, you must have a media app for Fire TV or Fire tablets that supports API level 21 or higher.
Note that the documentation here provides instructions for Java-based Android apps, not for web apps. You can integrate Media Session APIs with web apps (through an Android wrapper such as Cordova), but the instructions for doing so are outside the scope of this documentation.
Also, this documentation assumes you are an Android developer working in code. If you need an easier way to voice-enable media playback in your app, you can instead explore Fire App Builder, which includes Media Session integration by default. (Fire App Builder is a framework for building streaming media apps.)
About Android Media Session
Android provides an API called Media Session that allows applications to receive media commands. Alexa on Fire TV makes use of the Media Session API to send commands to local media applications. Events from remotes, keyboards, headset, adb, and so on are also sent to the application through the Media Sessions APIs if the application implements it.
Integrating Media Session into your Fire TV app gives your app a number of advantages:
- Media Session provides robust voice-enabled playback controls for your media.
- Media Session allows developers to handle events from remotes, keyboards, headset, adb, etc., for both Fire OS 5 and Fire OS 6 in a single application. (Other APIs require different implementations for Fire OS 5 and Fire OS 6.)
- If you integrate Media Session into your Fire TV app and add the appropriate permissions, your app will get support for voice commands. Your app will support both near-field voice commands (speaking through the microphone on your remote) and far-field voice commands (speaking to an Alexa-enabled device, such as an Echo, that is paired with Fire TV).
- Media Session is the recommended implementation for voice-enabling playback controls (Play, Pause, Fast Forward, Rewind, Previous, Next, Restart) on all Fire OS devices. (Fire OS is used on Fire TV and Fire Tablets)
For more background on Media Session, see these topics in the Android documentation:
Supported Voice Commands
The following voice commands are supported by Media Session on Fire TV devices.
|Command Type||Sample Voice Command||Description|
|Plays/Resumes the media.|
|Pauses the media.|
|Fast-forward||"Alexa, fast-forward 30 seconds"
|Fast-forwards the media by the given duration. If no duration is provided, the default is 10 seconds.|
|Rewind||"Alexa, rewind 30 seconds"
|Rewinds the media by the given duration. If no duration is provided, the default is 10 seconds.|
|Next||"Alexa, next"||Triggers the next command and plays the next media in the playlist.|
|Previous||"Alexa, previous"||Triggers the previous command and plays the previous media in the playlist.|
|Restart||"Alexa, restart"||Seeks back to beginning of the media.|
Media Session Integration vs Video Skill API
A more in-depth voice integration with your app is possible with Video Skill API integration. The Video Skill API not only provides transport controls (Play, Pause, Fast-forward, etc.) with media playback, the Video Skill API also lets users perform voice searches for your app's media from anywhere in Fire TV (not just inside your app). Video Skill API integration does require your media to be in the Amazon Catalog. You can learn more in Voice-enabling Your App and Content.
Transitioning to the Video Skill API later won't require you to remove the Media Session integration steps you follow here. When both Media Session and the Video Skill API are integrated into the same app, Fire TV will use the Media Session code for transport control commands and the Video Skill API commands for content searches and channel changes.
If your app requires support for additional playback control use cases not covered by Media Session, then you would use the Video Skill API for playback controls through
SeekController directives. In this case, Media Session code will be bypassed for playback controls. (Documentation for integrating the Video Skill API with Fire TV is forthcoming.)
Integrating with Media Session for Voice-enabled Media Playback
The following details highlight important aspects of Media Session integration for voice-enabled media playback functionality in your Fire TV app.
Add The Media Session Permission
To enable voice command and control to work with your application, add the permission
com.amazon.permission.media.session.voicecommandcontrol to your application manifest:
<manifest xmlns:android="http://schemas.android.com/apk/res/android"android:versionCode="2147483647"android:versionName="1.0" > ..... <uses-permission android:name="com.amazon.permission.media.session.voicecommandcontrol" /> ......
Set the Playback Capabilities
For Alexa to know what is supported by the app's media session, a set of actions through the Media Session must be provided. The standard supported actions are listed under Android’s PlaybackState documentation.
The actions supported by Alexa for voice control are listed below.
||Includes both the ACTION_PLAY and ACTION_PAUSE actions. See below for description. TIP: For simplicity, consider using this single command instead of the separate
||Jumps to a specific time in playback. If the time is a negative time, the time is defaulted to 0.|
||Skip to the next "episode" or "track"|
||Skip to the previous "episode" or "track"|
If an action isn't provided, and a user triggers an unsupported command (such as "Jump to the last episode"), Alexa responds with something like, "Sorry, that command is not supported."
STOP. This is because Alexa treats
PAUSEas the same command, and a command to pause the media comes from Alexa whenever the user assigns a
STOPcommand. To support an action through a non-voice-enabled transport control, you should still implement the
onStop()method in the Media Session callback.
Set the Media Session Callback Actions
The following Media Session callback commands are triggered when a voice command initiates one of the following actions.
Note that these are the callbacks that are used by Alexa voice commands, but implementing additional callback for all other applicable use cases is highly encouraged (Rewind, Fast Forward, etc.).
|Action||MediaSession.Callback Signature||Description||Supported Voice Command|
||Resume playback.||"Alexa, play"
||Pause playback.||"Alexa, pause"
|Skip to Next||
||Skip to the next unit of media. This could be a TV episode or song.||"Alexa, next"|
|Skip to Previous||
||Skip to the previous unit of media.||"Alexa, previous"|
||Seek to the given time (in ms). If users say relative time, the time to seek to is calculated and passed back in this callback. Note that absolute time is not supported. (For example, "Alexa, fast-forward to 2 hours and 15 minutes" will not set the current position of the playback to 2 hours 15 minutes.)
For restart events, a seek to 0 is passed in the callback.
|"Alexa, rewind 5 minutes"
"Alexa, fast-forward 5 minutes"
"Alexa, fast-forward" (default is 10 seconds)
"Alexa, rewind" (default is 10 seconds)
Enhanced Experience for Voice-Enabled Playback Controls using Media Session (Optional)
The Android Media Session documentation recommends that apps honor the media session commands only when they have audio focus. This means that when media is playing and the user says "Alexa, pause", after the voice chrome dismisses, the media resumes for a second before getting the media session pause command and then pauses. This results in a blip (in this case, a momentary playback experience) seen when the playback state changes via voice.
The Fire TV Media Session voice command implementation allows you to opt-in and match the experience of the internal Fire TV applications that do not exhibit the blip. The following details provide information on how to achieve this.
Add The Opt-In Permission
To opt-in to the Fire TV media session voice command implementation, add the permission
com.amazon.voice.supports_background_media_session to your application manifest.
<manifest> ... <application> ... <meta-data android:name="com.amazon.voice.supports_background_media_session" android:value="true" /> ... </application> ... </manifest>
Update App to Honor Media Session Commands Before Getting AudioFocus
After you opt-in to the Fire TV media session voice command implementation, the media session commands will be sent to your app before the voice chrome (the blue line that shows up at the top of the screen when you say "Alexa") loses audio focus. The app will need to handle the media session commands received before it gets audio focus back.
For example, if the app had an internal state of "playing media" before the voice chrome was invoked and you said "Alexa, Pause", the pause command would be sent to the app and the app would need to change the internal state to "paused." After the app regains audio focus, the app should query the internal state and, since it is already paused, should not resume the media.