AudioPlayer Interface Reference


The AudioPlayer interface provides directives and requests for streaming audio and monitoring playback progression. Your skill can send directives to start and stop the playback. The Alexa service sends your skill AudioPlayer requests to give you information about the playback state, such as when the track is nearly finished, or when playback starts and stops. Alexa also sends PlaybackController requests in response to hardware buttons, such as on a remote control or the next and previous tap controls on Alexa-enabled devices with a screen.

For more details about the audio player, see Stream Long-Form Audio with AudioPlayer.

Directives and requests

The AudioPlayer interface includes the following directives responses and request types. You include the directives in a response to Alexa to start and stop an audio stream. Alexa sends the requests to notify your skill about changes to the playback state.

Interface Description Type

AudioPlayer.Play

Requests Alexa to stream the specified audio file.

Directive

AudioPlayer.Stop

Requests Alexa to stop the current audio stream.

Directive

AudioPlayer.ClearQueue

Requests Alexa to clears the queue of all audio streams.

Directive

AudioPlayer.PlaybackStarted

Notifies your skill that Alexa started the audio stream specified in a Play directive. This directive lets your skill verify that playback began successfully.

Request

AudioPlayer.PlaybackFinished

Notifies your skill when the stream comes to an end on its own.

Request

AudioPlayer.PlaybackStopped

Sent when Alexa stops playing an audio stream in response to a voice request or an AudioPlayer directive.

Request

AudioPlayer.PlaybackNearlyFinished

Notifies your skill when the current stream is nearly complete and the device is ready to receive a new stream.

Request

AudioPlayer.PlaybackFailed

Notifies your skill when an error occurred when attempting to play a stream.

Request

Play directive

Send Alexa a request to stream the audio file identified by the specified audioItem. Use the playBehavior parameter to indicate whether to play the stream immediately or to add the stream to the queue. Add the Play directive in your response to Alexa. Include the directive in the directives array in your response.

When you send a Play directive, set the shouldEndSession flag in the response object to true to end the session. If you set this flag to false, Alexa sends the stream to the device for playback, and then pauses the stream to listen for the user's response.

Example directive response

The following example shows a directive entry in your response. For the full response format, see Response Format.

{
  "type": "AudioPlayer.Play",
  "playBehavior": "valid playBehavior value such as ENQUEUE",
  "audioItem": {
    "stream": {     
      "url": "://cdn.example.com/url-of-the-stream-to-play",
      "token": "opaque token representing this stream",
      "expectedPreviousToken": "opaque token representing the previous stream",
      "offsetInMilliseconds": 0,
      "captionData":{
         "content": "WEBVTT\n\n00:00.000 --> 00:02.107\n<00:00.006>My <00:00.0192>Audio <00:01.232>Captions.\n",
         "type": "WEBVTT"
      }
   },
    "metadata": {
      "title": "title of the track to display",
      "subtitle": "subtitle of the track to display",
      "art": {
        "sources": [
          {
            "url": "://cdn.example.com/url-of-the-album-art-image.png"
          }
        ]
      },
      "backgroundImage": {
        "sources": [
          {
            "url": "://cdn.example.com/url-of-the-background-image.png"
          }
        ]
      }
    }
  }
}

Directive parameters

Parameter Description Type Required

type

Set to AudioPlayer.Play.

String

Yes

playBehavior

Describes playback behavior. Accepted values:

  • REPLACE_ALL: Immediately begin playback of the specified stream, and replace current and enqueued streams.
  • ENQUEUE: Add the specified stream to the end of the current queue. This request doesn't impact the currently playing stream.
  • REPLACE_ENQUEUED: Replace all streams in the queue. This request doesn't impact the currently playing stream.

String

Yes

audioItem

Contains an object providing information about the audio stream to play.

Object

Yes

audioItem.stream

Contains an object representing the audio stream to play.

Object

Yes

audioItem.stream.url

Identifies the location of audio content at a remote `` location on port 443. For more details, see Audio stream URL requirements.

String

Yes

audioItem.stream.token

Opaque token that identifies the audio stream.
Maximum size: 1024 characters. Token should not contain Undefined Customer Data (UCD).

Use the token in the following cases:

String

Yes

audioItem.stream.
expectedPreviousToken

An opaque token that represents the expected previous stream. This should match the value of audioItem.stream.token for the previous stream.

This property is required and allowed only when the playBehavior is ENQUEUE. This is used to prevent potential race conditions if requests to progress through a playlist and change tracks occur at the same time. For details, see Playlist Progression with ENQUEUE.

String

Yes (when playBehavior is ENQUEUE)

audioItem.stream. offsetInMilliseconds

The timestamp in the stream from which Alexa should begin playback. Set to 0 to start playing the stream from the beginning. Set to any other value to start playback from that associated point in the stream.

Long

Yes

audioItem.stream. captionData

An object with two fields, content and type. Use these fields to provide captions for the associated audio attachment on any compatible device with a screen. A captionData object doesn't exist until it's provided with content and type. Devices assert support for AudioPlayer version 1.1 or later through the Capabilities API.

Object

No

audioItem.stream .captionData.type

The format of the string in the content field.
Supported formats: WEBVTT

String

No

audioItem.stream .captionData.content

The time-encoded caption text.
Supported formats: WEBVTT

String

No

audioItem.metadata

Information about the audio displayed on the Alexa-enabled device with a screen. The information isn't shown in the Alexa app.

If you don't include this object, Alexa shows the skill name on a gray background by default.

This entire object is optional. However, if you do include audioItem.metadata, provide all four metadata properties (title, subtitle, art, and backgroundImage).

Associate each new metadata item with a different audioItem.stream.token.

For more details, see Guidelines for images for Alexa-enabled devices with a screen.

The metadata displays on devices with screens regardless of whether you include the Display interface.

Object

No

audioItem.metadata.title

The title text to display. This is typically used for the audio track title.

String

No

audioItem.metadata.subtitle

Subtitle text to display, such as a category or an artist name.

String

No

audioItem.metadata.art

An Image object representing the album art to display. This object uses the same format as images used in the Display interface templates.

On the Echo Show or Fire TV Cube, this is the smaller square image. On the Echo Spot, this is cropped into the circle shape and displayed as the background.

For best results, follow the image guidelines and specifications.

Image object

No

audioItem.metadata.
backgroundImage

An Image object representing the background image to display. This object uses the same format as images used in the Display interface templates.

On the Echo Show or Fire TV Cube, backgroundImage is the full background image. The image isn't shown on the Echo Spot.

For best results, follow the image guidelines and specifications.

Image object

No

Playlist progression with ENQUEUE

The audioItem.stream.expectedPreviousToken property is required if playBehavior is ENQUEUE to handle situations in which requests to progress through a playlist and change tracks happen at the same time. The value of audioItem.stream.expectedPreviousToken should match the audioItem.stream.token property provided with the previous stream.

For example:

  1. The skill is streaming track 2 in a playlist of several tracks.
  2. The user says "Alexa, go back," which sends an AMAZON.PreviousIntent.
  3. At about the same time, track 2 is nearly finished, so Alexa sends a PlaybackNearlyFinished request.
  4. The skill handles the AMAZON.PreviousIntent first and sends a new Play directive with track 1. This track begins playing. The already-sent PlaybackNearlyFinished request is now outdated, since it assumed that track 2 was playing.
  5. The skill handles the now-outdated PlaybackNearlyFinished request and sends a Play directive with track 3, since this is the next track after the originally playing track 2. This request includes expectedPreviousToken set to track 2.
  6. The expectedPreviousToken provided in the directive doesn't match the token for the actively playing stream, so the device ignores this directive.
  7. As track 1 finishes, Alexa sends a PlaybackNearlyFinished request. The skill responds with a Play directive for track 2. This track begins playing once track 1 finishes.

If this check wasn't in place, the directive sent in step 5 would put track 3 on the queue, which would cause the audio to skip from track 1 to track 3 when track 1 finishes.

Guidelines for images for Alexa-enabled devices with a screen

If you provide images in the audioItem.metadata.art and audioItem.metadata.backgroundImage properties, note the following guidelines:

  • When you send a track with new metadata, be sure to also change the audioItem.stream.token property for the track.
  • Your image must meet the requirements for an audio image. For more details, see Image requirements and recommendations.
  • For the audioItem.metadata.art, use a square image for the best results. If the image isn't square, it's displayed with extra black space on the device. Note that the image is cropped to a circle shape on the Echo Spot.
  • The Image object lets you provide multiple image URLs in the source array. As with the Display Interface, the device selects the image with the highest resolution to display.
  • The following properties for a particular image source on the Image object aren't used when displaying the background image and album art for audio and can be left out of the object:
    • contentDescription
    • size
    • widthPixels
    • heightPixels

Stop directive

Stops the current audio playback. Include the directive in the directives array in your response.

Example directive response

The following example shows a directive entry in your response. For the full response format, see Response Format.

{
  "type": "AudioPlayer.Stop"
}

Directive parameters

Parameter Description Type Required
type Set to AudioPlayer.Stop String Yes

ClearQueue directive

Clears the audio playback queue. You can set this directive to clear the queue without stopping the currently playing stream, or clear the queue and stop any currently playing stream. Include the directive in the directives array in your response.

Example directive response

The following example shows a directive entry in your response. For the full response format, see Response Format.

{
  "type": "AudioPlayer.ClearQueue",
  "clearBehavior" : "valid clearBehavior value such as CLEAR_ALL"
}

Directive parameters

Parameter Description Type Required

type

Set to AudioPlayer.ClearQueue.

String

Yes

clearBehavior

Describes the clear queue behavior. Accepted values:

  • CLEAR_ENQUEUED: clears the queue and continues to play the currently playing stream
  • CLEAR_ALL: clears the entire playback queue and stops the currently playing stream (if applicable).

String

Yes

PlaybackStarted request

Sent when Alexa begins playing the audio stream previously sent in a Play directive. This request lets your skill verify that playback started successfully. Also, Alexa sends this request to notify your skill when Alexa resumes playback after pausing it for a voice request.

Example request

{
  "version": "1.0",
  "context": {
    "System": {
      "application": {},
      "user": {},
      "device": {}
    }
  },
  "request": {
    "type": "AudioPlayer.PlaybackStarted",
    "requestId": "unique.id.for.the.request",
    "timestamp": "timestamp of request in format: 2018-04-11T15:15:25Z",
    "token": "token representing the currently playing stream",
    "offsetInMilliseconds": 0,
    "locale": "a locale code such as en-US"
  }
}

Request parameters

Parameter Description Type
type AudioPlayer.PlaybackStarted String
requestId Represents a unique identifier for the specific request. String
timestamp Provides the date and time when Alexa sent the request as an ISO 8601 formatted string. Used to [verify the request when hosting your skill as a web service][hosting-as-web-service#timestamp]. String
token An opaque token that represents the audio stream. You provide this token when sending the Play directive. String
offsetInMilliseconds Identifies a track's offset in milliseconds when the PlaybackStarted request is sent. Long
locale A string indicating the user's locale. For example: en-US. See [supported locale codes][service_ref#request-locale]. String

For the full request format, see Request Format.

Response

Your skill can respond to PlaybackStarted with a Stop or ClearQueue directive.

The response cannot include:

  • Any standard properties such as outputSpeech, card, or reprompt.
  • Any other AudioPlayer directives.
  • Any other directives from other interfaces, such a [Dialog directive][dialog-interface-reference#directives].

PlaybackFinished request

Sent when the stream Alexa is playing comes to an end on its own. If your skill explicitly stops the playback with the Stop directive, Alexa sends PlaybackStopped instead of PlaybackFinished.

Example request

{
  "version": "1.0",
  "context": {
    "System": {
      "application": {},
      "user": {},
      "device": {}
    }
  },
  "request": {
    "type": "AudioPlayer.PlaybackFinished",
    "requestId": "unique.id.for.the.request",
    "timestamp": "timestamp of request in format: 2018-04-11T15:15:25Z",
    "token": "token representing the currently playing stream",
    "offsetInMilliseconds": 0,
    "locale": "a locale code such as en-US"
  }
}

Request parameters

Parameter Description Type
type AudioPlayer.PlaybackFinished String
requestId Represents a unique identifier for the specific request. String
timestamp Provides the date and time when Alexa sent the request as an ISO 8601 formatted string. Used to [verify the request when hosting your skill as a web service][hosting-as-web-service#timestamp]. String
token An opaque token that represents the audio stream. You provide this token when sending the Play directive. String
offsetInMilliseconds Identifies a track's offset in milliseconds when the PlaybackFinished request is sent. Long
locale A string indicating the user's locale. For example: en-US. See [supported locale codes][service_ref#request-locale]. String

Response

Your skill can respond to PlaybackFinished with a Stop or ClearQueue directive.

The response cannot include:

  • Any standard properties such as outputSpeech, card, or reprompt.
  • Any other AudioPlayer directives.
  • Any other directives from other interfaces, such a [Dialog directive][dialog-interface-reference#directives].

PlaybackStopped request

Sent when Alexa stops playing an audio stream in response to one of the following AudioPlayer directives:

  • Stop
  • Play with a playBehavior of REPLACE_ALL.
  • ClearQueue with a clearBehavior of CLEAR_ALL.

This request is also sent if the user makes a voice request to Alexa, since this temporarily pauses the playback. In this case, the playback begins automatically once the voice interaction is complete. If playback stops because the audio stream comes to an end on its own, Alexa sends PlaybackFinished instead of PlaybackStopped.

Example request

{
   "version": "1.0",
  "context": {
    "System": {
      "application": {},
      "user": {},
      "device": {}
    }
  },
  "request": {
    "type": "AudioPlayer.PlaybackStopped",
    "requestId": "unique.id.for.the.request",
    "timestamp": "timestamp of request in format: 2018-04-11T15:15:25Z",
    "token": "token representing the currently playing stream",
    "offsetInMilliseconds": 0,
    "locale": "a locale code such as en-US"
  }
}

Request parameters

Parameter Description Type
type AudioPlayer.PlaybackStopped String
requestId Represents a unique identifier for the specific request. String
timestamp Provides the date and time when Alexa sent the request as an ISO 8601 formatted string. Used to [verify the request when hosting your skill as a web service][hosting-as-web-service#timestamp]. String
token An opaque token that represents the audio stream. You provide this token when sending the Play directive. String
offsetInMilliseconds Identifies a track's offset in milliseconds when the PlaybackStopped request is sent. Long
locale A string indicating the user's locale. For example: en-US. See [supported locale codes][service_ref#request-locale]. String

Response

Your skill can't return a response to PlaybackStopped.

PlaybackNearlyFinished request

Sent when the device is ready to add the next stream to the queue.

To progress through a playlist of audio streams, respond to this request with a Play directive for the next stream and set playBehavior to ENQUEUE or REPLACE_ENQUEUED. This adds the new stream to the queue without stopping the current playback. Alexa begins streaming the new audio item once the currently playing track finishes.

Example request

{
   "version": "1.0",
  "context": {
    "System": {
      "application": {},
      "user": {},
      "device": {}
    }
  },
  "request": {
    "type": "AudioPlayer.PlaybackNearlyFinished",
    "requestId": "unique.id.for.the.request",
    "timestamp": "timestamp of request in format: 2018-04-11T15:15:25Z",
    "token": "token representing the currently playing stream",
    "offsetInMilliseconds": 0,
    "locale": "a locale code such as en-US"
  }
}

Request parameters

Parameter Description Type
type AudioPlayer.PlaybackNearlyFinished String
requestId Represents a unique identifier for the specific request. String
timestamp Provides the date and time when Alexa sent the request as an ISO 8601 formatted string. Used to [verify the request when hosting your skill as a web service][hosting-as-web-service#timestamp]. String
token An opaque token that represents the audio stream that is currently playing. You provide this token when sending the Play directive. String
offsetInMilliseconds Identifies a track's offset in milliseconds when the PlaybackNearlyFinished request is sent. Long
locale A string indicating the user's locale. For example: en-US. See [supported locale codes][service_ref#request-locale]. String

Response

Your skill can respond to PlaybackNearlyFinished with any AudioPlayer directive.

The response cannot include:

  • Any standard properties such as outputSpeech, card, or reprompt.
  • Any other directives from other interfaces, such a [Dialog directive][dialog-interface-reference#directives].

PlaybackFailed request

Sent when Alexa encounters an error when attempting to play a stream.

This request type includes two token properties – one as a property of the request object, and one as a property of the currentPlaybackState object. The request.token property represents the stream that failed to play. The currentPlaybackState.token property can be different if Alexa is playing a stream and the error occurs when attempting to buffer the next stream on the queue. In this case, currentPlaybackState.token represents the stream that was successfully playing.

Example request

{
   "version": "1.0",
  "context": {
    "System": {
      "application": {},
      "user": {},
      "device": {}
    }
  },
  "request": {
    "type": "AudioPlayer.PlaybackFailed",
    "requestId": "unique.id.for.the.request",
    "timestamp": "timestamp of request in format: 2018-04-11T15:15:25Z",
    "token": "token representing the currently playing stream",
    "offsetInMilliseconds": 0,
    "locale": "a locale code such as en-US",
    "error": {
      "type": "error code",
      "message": "description of the error that occurred"
    },
    "currentPlaybackState": {
      "token": "token representing stream playing when error occurred",
      "offsetInMilliseconds": 0,
      "playerActivity": "player state when error occurred, such as PLAYING"
    }
  }
}

Request parameters

Parameter Description Type
type AudioPlayer.PlaybackFailed String
requestId Represents a unique identifier for the specific request. String
timestamp Provides the date and time when Alexa sent the request as an ISO 8601 formatted string. Used to [verify the request when hosting your skill as a web service][hosting-as-web-service#timestamp]. String
token An opaque token provided by the Play directive that represents the stream that failed to play. String
locale A string indicating the user's locale. For example: en-US. See [supported locale codes][service_ref#request-locale]. String
error Contains an object with error information Object
error.type Identifies the specific type of error. For details about each error type, see Playback errors. String
error.message A description of the error the device has encountered. String
currentPlaybackState Contains an object providing details about the playback activity occurring at the time of the error. Object
currentPlaybackState.
token
An opaque token that represents the audio stream currently playing when the error occurred. Note that this may be different from the value of the request.token property. String
currentPlaybackState.
offsetInMilliseconds
Identifies a track's offset in milliseconds when the error occurred. Long
currentPlaybackState.
playerActivity
Identifies the player state when the error occurred: PLAYING, PAUSED, FINISHED, BUFFER_UNDERRUN, or IDLE. String
Error Type Description
MEDIA_ERROR_UNKNOWN An unknown error occurred.
MEDIA_ERROR_INVALID_REQUEST The request is malformed, unauthorized, forbidden, or not found.
MEDIA_ERROR_SERVICE_UNAVAILABLE Alexa was unable to reach the URL for the stream.
MEDIA_ERROR_INTERNAL_SERVER_ERROR Alexa accepted the request, but was unable to process the request as expected.
MEDIA_ERROR_INTERNAL_DEVICE_ERROR There was an internal error on the device.

Response

Your skill can respond to PlaybackFailed with any AudioPlayer directive.

The response cannot include:

  • Any standard properties such as outputSpeech, card, or reprompt.
  • Any other directives from other interfaces, such a [Dialog directive][dialog-interface-reference#directives].

System.ExceptionEncountered request

If a response to an AudioPlayer request causes an error, Alexa sends your skill a System.ExceptionEncountered request. Alexa ignores any directives included in the response to this request.

Example request

{
  "type": "System.ExceptionEncountered",
  "requestId": "unique.id.for.the.request",
  "timestamp": "timestamp of request in format: 2018-04-11T15:15:25Z",
  "locale": "a locale code such as en-US",
  "error": {
    "type": "error code such as INVALID_RESPONSE",
    "message": "description of the error that occurred"
  },
  "cause": {
    "requestId": "unique identifier for the request that caused the error"
  }
}

Request parameters

Parameter Description Type
type System.ExceptionEncountered string
requestId Represents a unique identifier for the specific request. string
timestamp Provides the date and time when Alexa sent the request as an ISO 8601 formatted string. Used to [verify the request when hosting your skill as a web service][hosting-as-web-service#timestamp]. string
locale A string indicating the user's locale. For example: en-US. See [supported locale codes][service_ref#request-locale]. string
error Contains an object with error information object
error.type Identifies the specific type of error (INVALID_RESPONSE, DEVICE_COMMUNICATION_ERROR, INTERNAL_ERROR). string
error.message A description of the error the device has encountered. string
cause.requestId The requestId for the request that caused the error string

Response

Your skill can't return a response to System.ExceptionEncountered request.


Was this page helpful?

Last updated: Jan 19, 2024