Stream Long-Form Audio with AudioPlayer

You can add long-form audio, such as podcasts, news stories, and live streams, and monitor playback to a custom skill by using the AudioPlayer interface. You provide a URL to the audio stream, and Alexa plays the audio to the user. Along with the audio, you can provide a background image to show on Alexa-enabled devices with a screen. You can send audio directives to play and stop the audio. And, Alexa can provide your skill with information about the playback state, such as when playback starts and stops, when the track is near complete, or when the user pauses the audio.

Complete the following steps to add the AudioPlayer interface to your skill.

For other audio options for custom skills, see Add Audio to a Custom Skill.

Example utterances for long-form audio

An ideal audio skill should play error-free and uninterrupted audio files. The skill should fulfill the customer request and play content relevant to the skill description. The following example shows a high-quality experience with an audio skill.

User: Alexa, open My Radio Player.

My Radio Player: Welcome to My Radio Player. To listen to live music, say, "Play live music". To view all available playlists, say, "View play lists".
User: Play live music.

The skill plays live music.

Playback control

When your skill sends a Play directive to begin playback, Alexa plays the audio stream at the specified URL. During audio streaming, users can control playback without the skill invocation name. In the response that includes the Play directive, set the shouldEndSession flag to true to end the session. If you set this flag to false, Alexa sends the stream to the device for playback, and then immediately pauses the stream to listen for the user's response.

Your skill should persist information about the audio stream and the skill session so that the context object to get details, such as the applicationId and userId. Amazon recommends that your skill persist attributes related to the skill session, such as the audio stream file and userId.

If the skill session ends during audio streaming, Alexa remembers that your skill started the audio stream and sends voice and tap playback requests to your skill. However, if the user does one of the following actions, Alexa no longer remembers that your skill played the previous audio stream and the user must use the skill name again:

  • Invokes audio playback with a different skill.
  • Invokes another service that streams audio, such as the built-in music service or a Flash Briefing.
  • Reboots the device.

The following example for a custom skill called, "My Radio Player," defines an intent PlayLatestEpisode mapped to the sample utterance "play the latest episode."

User: Alexa, ask My Radio Player to play the latest episode.

Alexa opens a new skill session and sends the My Radio Player skill the normal PlayLatestEpisode.
My Radio Player sends a Play directive. The skill session closes and audio begins playing.

User: Alexa, next. (No invocation name used.)

Alexa opens a new skill session and sends the My Radio Player skill AMAZON.NextIntent.
My Radio Player takes the appropriate action for "next" and closes the skill session.

User: Alexa, pause. (Again, no invocation name.)

Alexa opens a new skill session and sends the AMAZON.PauseIntent to the skill.
My Radio Player sends a Stop directive, and then closes the skill session. Alexa stops the audio streaming.

At this point the audio isn't playing and there is no current session. However, the Alexa service continues to track "My Radio Player" as the last skill that streamed audio. As long as the device remains on and the user doesn't use any other audio streaming skills or services, the next example can take place at a later time without the skill name invocation.

User: Alexa, resume. (No invocation name used.)

Alexa opens a new skill session and sends the AMAZON.ResumeIntent to the My Radio Player skill.
My Radio Player determines the previously track and sends a new Play directive to restart playback.

Audio player on Alexa-enabled devices with a screen

By default, during audio streaming, Alexa-enabled devices with a screen show an audio player with a plain background and the skill name. You can customize the screen by including album art, a background image, track title, and subtitle metadata with the Play directive. For both the default and custom backgrounds, when the user taps the screen, the screen shows tap controls, such as next , previous , and pause .


To use the AudioPlayer interface, your custom skill must meet the following prerequisites.

Audio stream URL requirements

To use the AudioPlayer interface, your audio stream URL must meet the following requirements:

  • You must host the audio file at an Internet-accessible HTTPS endpoint on port 443.
  • The web server must present a valid and trusted SSL certificate. Self-signed certificates aren't allowed. Content hosting services, such as Amazon S3, provide valid and trusted SSL certificates.
  • If the stream is a playlist container that references additional streams, you must host each stream within the playlist at an Internet-accessible HTTPS endpoint on port 443 with a valid and trusted SSL certificate.
  • Your audio file must be in one of the following formats: AAC/MP4, MP3, PLS, M3U/M3U8, HLS.
  • Your audio stream must support bit rates of 16 – 384 KB per second.

Image requirements and recommendations

To customize the background on Alexa-enabled devices with a screen, your image must meet the following requirements and recommendations:

  • You must host the image at an Internet-accessible HTTPS endpoint and the image must be available 24 hours a day seven days a week.
  • The image must be in JPEG or PNG format, with the appropriate file extensions.
  • (Recommended) For best results, make sure that images are transparent. Images with a transparent background work well on a wide range of shapes and sizes.
  • The image size must be the minimum recommended size. If you provide a smaller image, the device must scale the image, which can make the image appear blurry.
  • The image size must not exceed 3 MB. If you send multiple images in a response, the combined image size must not exceed 3 MB.
  • (Recommended) Keep image sizes small to reduce latency and provide a better customer experience.
  • (Recommended) For best results, use a square or rectangle image. If the image isn't square, it might display with extra black space on the device. The Echo Spot crops the image to a circle shape.
  • Apply a 70 percent opacity black layer for optimal contrast between the image and text.
  • (Recommended) Use background images with slight patterns or gradients to provide a consistent, high-quality appearance.

Steps to add long-form audio to your skill

Complete the following steps to add the AudioPlayer interface to your custom skill.

  1. Enable the AudioPlayer interface.
  2. Implement AudioPlayer directives and requests.
  3. Implement intents for audio playback.
  4. Support audio on Alexa-enabled devices with a screen.

Step 1: Enable the audio player interface

You configure your skill to indicate that your skill implements the interface AudioPlayer interface.

To enable the audio player interface in the Alexa developer console

  1. Sign in to the Alexa developer console.
  2. From the skill list, locate your custom skill, and then, in the dropdown under ACTIONS, select Edit.
  3. In the left pane, click CUSTOM, and then click Interfaces.
  4. To enable the AudioPlayer interface, toggle the Audio Player option, and then click Save Interfaces.
    The console adds the required built-in intents for audio playback to your interaction model.
  5. To rebuild your custom interaction model, on the Build page, click Build Model.

Step 2: Implement AudioPlayer directives and requests

Implement the following AudioPlayer interfaces in your custom skill to start and stop long-form audio streaming.

Include the following directives in a response to Alexa:

Handle the following requests that Alexa sends to report playback status of the audio stream:

  • AudioPlayer.PlaybackStarted – Sent to your skill when Alexa starts the audio stream specified in a Play directive. This directive lets your skill verify that playback began successfully.
  • AudioPlayer.PlaybackFinished – Alexa notifies your skill when the stream comes to an end on its own.
  • AudioPlayer.PlaybackStopped – Sent when Alexa stops playing an audio stream in response to a voice request or an AudioPlayer directive.
  • AudioPlayer.PlaybackNearlyFinished – Alexa notifies your skill when the currently playing stream is nearly complete and the device is ready to receive a new stream.
  • AudioPlayer.PlaybackFailed – Alexa notifies your skill when an error occurred when your skill attempted to play a stream.

Step 3: Implement intents for audio playback

In your skill code, implement the following required built-in intents to pause and resume audio:

  • AMAZON.PauseIntent
  • AMAZON.ResumeIntent

In addition, Amazon recommends that you implement the following built-in intents for playback control:

  • AMAZON.CancelIntent
  • AMAZON.LoopOffIntent
  • AMAZON.LoopOnIntent
  • AMAZON.NextIntent
  • AMAZON.PreviousIntent
  • AMAZON.RepeatIntent
  • AMAZON.ShuffleOffIntent
  • AMAZON.ShuffleOnIntent
  • AMAZON.StartOverIntent

If your skill is playing audio, or was playing audio most recently, Alexa sends these intents to your skill. Your skill code should handle the intents without error. If any of these intents don't apply to your skill, handle the intent in a graceful way. For example, for a podcast skill, on receipt of the AMAZON.ShuffleOnIntent intent, your skill might return, "I can't shuffle a podcast." Or, version 1.0 of a music skill that doesn't support playlists and shuffling might return, "Sorry, I can't shuffle music yet."

You need an Echo device to test the playback requests from Alexa. The Alexa simulator doesn't render audio playback, but the Skill I/O section of the simulator shows the AudioPlayer directives sent from your skill. For details, see Test your skill with the simulator.

Step 4: Support audio on Alexa-enabled devices with a screen

If the user touches the device screen while your skill is streaming audio, an Alexa-enabled device with a screen shows audio tap controls for a short time. These controls provide access to (next), (previous), (pause), and (play) actions. Implement skill code to handle these intents appropriately.

In response to the next, previous, and play actions, Alexa sends your skill one of the following PlaybackController requests:

When the user taps the (pause) control, Alexa stops playback, but doesn't send a request to your skill. However, your skill should still handle PlaybackController.PauseCommandIssued, because other devices, such as hardware remotes, do send this intent.

Sample code

To get started, review the following audio player skill code samples on GitHub:

Was this page helpful?

Last updated: Jan 26, 2024