Optimization to Enable Voice Interactivity with Media Playback through Media Session

Some apps on Amazon Fire TV support voice-enabled playback controls. For example, when customers watch a movie on Amazon Video, they can say "Alexa, play" to start the media, "Alexa, pause" to pause the media, or "Alexa fast-forward" to jump ahead in the media's timeline. You can enable your app with these voice commands and more by integrating the Android Media Session API.

Requirements for Voice-enabled Media Playback

To integrate Android Media Session into your app and enable voice commands, you must have a media app for Fire TV or Fire tablets that supports API level 21 or higher.

Note that the documentation here provides instructions for Java-based Android apps, not for web apps. You can integrate Media Session APIs with web apps (through an Android wrapper such as Cordova), but the instructions for doing so are outside the scope of this documentation.

Also, this documentation assumes you are an Android developer working in code. If you need an easier way to voice-enable media playback in your app, you can instead explore Fire App Builder, which includes Media Session integration by default. (Fire App Builder is a framework for building streaming media apps.)

About Android Media Session

Android provides an API called Media Session that allows applications to receive media commands. Alexa on Fire TV makes use of the Media Session API to send commands to local media applications. Events from remotes, keyboards, headset, adb, and so on are also sent to the application through the Media Sessions APIs if the application implements it.

Integrating Media Session into your Fire TV app gives your app a number of advantages:

  • Media Session provides robust voice-enabled playback controls for your media.
  • Media Session allows developers to handle events from remotes, keyboards, headset, adb, etc., for both Fire OS 5 and Fire OS 6 in a single application. (Other APIs require different implementations for Fire OS 5 and Fire OS 6.)
  • If you integrate MediaSession into your Fire TV app and add the appropriate permissions, your app will get support for voice commands. Your app will support both near-field voice commands (speaking through the microphone on your remote) and far-field voice commands (speaking to an Alexa-enabled device, such as an Echo, that is paired with Fire TV).
  • MediaSession is the recommended implementation for voice-enabling playback controls (Play, Pause, Fast Forward, Rewind, Previous, Next, Restart) on all Fire OS devices. (Fire OS is used on Fire TV and Fire Tablets)

For more background on MediaSession, see these topics in the Android documentation:

Supported Voice Commands

The following voice commands are supported by MediaSession on Fire TV devices.

Command Type Sample Voice Command Description
Play "Alexa, play"

"Alexa, resume"
Plays/Resumes the media.
Pause "Alexa, pause"

"Alexa, stop"
Pauses the media.
Fast-forward "Alexa, fast-forward 30 seconds"

"Alexa, fast-forward"
Fast-forwards the media by the given duration. If no duration is provided, the default is 10 seconds.
Rewind "Alexa, rewind 30 seconds"

"Alexa, rewind"
Rewinds the media by the given duration. If no duration is provided, the default is 10 seconds.
Next "Alexa, next" Triggers the next command and plays the next media in the playlist.
Previous "Alexa, previous" Triggers the previous command and plays the previous media in the playlist.
Restart "Alexa, restart" Seeks back to beginning of the media.

MediaSession Integration vs Video Skill API

A more in-depth voice integration with your app is possible with Video Skill API integration. The Video Skill API not only provides transport controls (Play, Pause, Fast-forward, etc.) with media playback, the Video Skill API also lets users perform voice searches for your app's media from anywhere in Fire TV (not just inside your app). Video Skill API integration does require your media to be in the Fire TV catalog. You can learn more in Voice-enabling Your App and Content.

Transitioning to the Video Skill API later won't require you to remove the MediaSession integration steps you follow here. When both MediaSession and the Video Skill API are integrated into the same app, Fire TV will use the MediaSession code for transport control commands and the Video Skill API commands for content searches and channel changes.

If your app requires support for additional playback control use cases not covered by MediaSession, then you would use the Video Skill API for playback controls through PlaybackController and SeekController directives. In this case, MediaSession code will be bypassed for playback controls. (Documentation for integrating the Video Skill API with Fire TV is forthcoming.)

Integrating with MediaSession for Voice-enabled Media Playback

The Android MediaSession documentation must be primarily referenced for information on how to build video apps and audio apps using MediaSession.

The following details highlight important aspects of MediaSession integration for voice-enabled media playback functionality in your Fire TV app.

Add The MediaSession Permission

To enable voice command and control to work with your application, add the permission com.amazon.permission.media.session.voicecommandcontrol to your application manifest:

<manifest xmlns:android="http://schemas.android.com/apk/res/android"android:versionCode="2147483647"android:versionName="1.0" >

.....
<uses-permission android:name="com.amazon.permission.media.session.voicecommandcontrol" />
......

Set the Playback Capabilities

For Alexa to know what is supported by the app's media session, a set of actions through the MediaSession must be provided. The standard supported actions are listed under Android’s PlaybackState documentation.

The actions supported by Alexa for voice control are listed below.

Action Description
ACTION_PLAY_PAUSE Includes both the ACTION_PLAY and ACTION_PAUSE actions. See below for description. TIP: For simplicity, consider using this single command instead of the separate ACTION_PLAY and ACTION_PAUSE commands. (However, it's not required and both approaches will work.)
ACTION_PLAY Resume playback
ACTION_PAUSE Pause playback
ACTION_SEEK_TO Jumps to a specific time in playback. If the time is a negative time, the time is defaulted to 0.
ACTION_SKIP_TO_NEXT Skip to the next "episode" or "track"
ACTION_SKIP_TO_PREVIOUS Skip to the previous "episode" or "track"

If an action isn't provided, and a user triggers an unsupported command (such as "Jump to the last episode"), Alexa responds with something like, "Sorry, that command is not supported."

Set the Media Session Callback Actions

The following MediaSession callback commands are triggered when a voice command initiates one of the following actions.

Note that these are the callbacks that are used by Alexa voice commands, but implementing additional callback for all other applicable use cases is highly encouraged (Rewind, Fast Forward, etc.).

Action MediaSession.Callback Signature Description Supported Voice Command
Play onPlay() Resume playback. "Alexa, play"

"Alexa, resume"
Pause onPause() Pause playback. "Alexa, pause"

"Alexa, stop"
Skip to Next onSkipToNext() Skip to the next unit of media. This could be a TV episode or song. "Alexa, next"
Skip to Previous onSkipToPrevious() Skip to the previous unit of media. "Alexa, previous"
Seek To onSeekTo(long pos) Seek to the given time (in ms). If users say relative time, the time to seek to is calculated and passed back in this callback. Note that absolute time is not supported. (For example, "Alexa, fast-forward to 2 hours and 15 minutes" will not set the current position of the playback to 2 hours 15 minutes.)

For restart events, a seek to 0 is passed in the callback.
"Alexa, rewind 5 minutes"

"Alexa, fast-forward 5 minutes"

"Alexa, fast-forward" (default is 10 seconds)

"Alexa, rewind" (default is 10 seconds)

"Alexa, restart"

Enhanced Experience for Voice-Enabled Playback Controls using Media Session (Optional)

The Android Media Session documentation recommends that apps honor the media session commands only when they have audio focus. This means that when media is playing and the user says "Alexa, pause", after the voice chrome dismisses, the media resumes for a second before getting the media session pause command and then pauses. This results in a blip (in this case, a momentary playback experience) seen when the playback state changes via voice.

The Fire TV Media Session voice command implementation allows you to opt-in and match the experience of the internal Fire TV applications that do not exhibit the blip. The following details provide information on how to achieve this.

Add The Opt-In Permission

To opt-in to the Fire TV media session voice command implementation, add the permission com.amazon.voice.supports_background_media_session to your application manifest.

<manifest>
    ...
    <application>
        ...
        <meta-data android:name="com.amazon.voice.supports_background_media_session" android:value="true" />
        ...
    </application>
    ...
</manifest>

Update App to Honor MediaSession Commands Before Getting AudioFocus

After you opt-in to the Fire TV media session voice command implementation, the media session commands will be sent to your app before the voice chrome (the blue line that shows up at the top of the screen when you say "Alexa") loses audio focus. The app will need to handle the media session commands received before it gets audio focus back.

For example, if the app had an internal state of "playing media" before the voice chrome was invoked and you said "Alexa, Pause", the pause command would be sent to the app and the app would need to change the internal state to "paused." After the app regains audio focus, the app should query the internal state and, since it is already paused, should not resume the media.