About the Alexa Voice Service (AVS) AudioPlayer

Important: Alexa Voice Service (AVS) developer tools are no longer generally available for Alexa Built-in. Please visit the Works with Alexa program if you are interested in building devices that connect to Alexa.

The Alexa Voice Service (AVS) includes an AudioPlayer Interface for managing, controlling, and reporting on streaming audio content. For example, Amazon Music, Flash Briefing, Audible, and TuneIn skills all rely on the AudioPlayer Interface for streaming audio functionalities.

AVS sends directives to your device instructing the device to perform an action, such as playing an audio stream. In response, AVS expects the device to return events in a specific order the device performs the actions. This topic provides conceptual information, definitions, and sequence diagrams to help you implement the AudioPlayer Interface interface.

Audioplayer best practices

Use the following guidelines to provide a familiar Alexa experience to your customers.

Recommended media support

Play directives provide audio in a variety of formats, containers, and bit rates. For more details about the codecs, containers, streaming formats, and playlists that your product should support, see Recommended Media Support.

Playback queue management

Creating and managing the device playback queue ensures that media services associated with the AudioPlayer Interface work as designed. The device playback queue must:

Have the ability to handle multiple Play directives.
Use the playBehavior in the payload of each Play directive to adjust or maintain the device playback queue.
Match the token of the active stream with the expectedPreviousToken of the stream that you're adding to the queue. If the tokens don't match, ignore the stream. However, if Alexa doesn't return any expectedPreviousToken, add the stream to the queue.
Clear the queue whenever the device receives a ClearQueue directive.

Audioplayer interaction sequence

Imagine that you're in the kitchen cooking dinner. Rather than reach for your phone to play music, you say, "Alexa, play some music." The following process describes the sequence of interactions between your device and AVS, resulting in your device playing the music that you requested.

Your device sends a Recognize event, including the captured speech as a binary audio attachment, to AVS. AVS processes and translates the captured audio into a series of directives and potentially corresponding audio attachments. AVS sends these directives and any applicable audio attachments to your device, instructing the device to perform one or more actions.
The device receives Speak and Play directives from AVS:
The device handles the Speak directive, which instructs your device to play Alexa speech and sends a SpeechStarted event when the device starts playback of Alexa speech, such as, "Shuffling your music."
The device sends a SpeechFinished event when playback of Alexa speech finishes.
The device handles the Play directive, which instructs your device to start playback of your music. For more details about the Play directive, see the section Play directive walkthrough.

When playback begins your device sends a series of lifecycle events to AVS. These events notify Alexa playback has started, request the next stream and provide progress reporting information to AVS and music service providers.

PlaybackStarted – The device sends a PlaybackStarted event to AVS when playback begins. The offsetInMilliseconds sent to AVS should match the offset provided in the Play directive.
PlaybackNearlyFinished – Send the PlaybackNearlyFinished event when your device is ready to buffer/download the next stream in your playback queue. One option is to send this event following the PlaybackStarted event to start buffering and reduce lag between playback of streams.
ProgressReportDelayElapsed – Send the ProgressReportDelayElapsed event to AVS if the Play directive includes a progressReportDelayInMilliseconds.
ProgressReportIntervalElapsed – Send the ProgressReportIntervalElapsed event to AVS if the Play directive includes a progressReportIntervalInMilliseconds.
PlaybackFinished – Send the PlaybackFinished event when your device finishes playing a stream.
PlaybackStopped – Send the PlaybackStopped event if your device receives a Stop directive and stops playback.

`Play` directive walkthrough

Remember that after asking Alexa to play music, AVS returns a Play directive to your device, which instructs the device to start playing an audio stream or binary audio attachment. The payload provides your device with all the information needed to handle an audio stream and add it to your local playback queue, such as the stream URL, when the stream URL expires, the expected playback behavior, and progress reporting requirements.

The following example shows a payload from a Play directive:

{
  "directive": {
    "header": {
      "namespace": "AudioPlayer",
      "name": "Play",
      "messageId": "42941f13-90ed-4d9e-8159-xxxxxxxx",
      "dialogRequestId": "req:a345fgh598383xxx""
    },
    "payload": {
      "playBehavior": "REPLACE_ALL",
      "audioItem": {
        "audioItemId": "test1.as-ct.v1.XYZ-ABCDE-FGHIJ#ACRI#url#ACRI#0f6bcd24-f621-555a-822c-1111111:1",
        "stream": {
          "url": "https://opml.radiotime.com/Tune.ashx?serial=SAMPLE&formats=aac,mp3&partnerId=SAMPLE",
          "streamFormat": "AUDIO_MPEG",
          "offsetInMilliseconds": 0,
          "expiryTime": "2016-09-13T18:22:49+0000",
          "progressReport": {
            "progressReportDelayInMilliseconds": 15000,
            "progressReportIntervalInMilliseconds": 900000
          },
          "token": "test1.as-ct.v1.XYZ-ABCDE-FGHIJ#ACRI#url#ACRI#0f6bcd24-f621-555a-822c-1111111:1"
        }
      }
    }
  }
}

The following steps walk through the Play directive payload, explaining the parameter values in detail:

The first payload parameter is playBehavior, which provides information about how this particular Play directive impacts your local playback queue. AVS supports the following three play behaviors:
- REPLACE_ALL – Instructs your device to begin playback of the stream included in the payload and replace any enqueued streams in your local playback queue.
In this example, the playBehavior value is REPLACE_ALL, meaning that your device must clear its local playback queue and then start playback of the audio stream included in the payload.

ENQUEUE – Instructs your device to add the stream contained in the Play directive to the end of your current playback queue.
REPLACE_ENQUEUED – Instructs your device to replace all streams in your local playback queue. This doesn't impact the actively playing stream.

The next item in the payload is the audioItem object, which includes audioItemId and stream:
- audioItemId – Opaque token that identifies the audio stream.
- stream – Object that provides specific information about the audio stream, including:
- url – URL of the audio content. If the audio content is a binary audio attachment, the value is a unique identifier for the content formatted with the following prefix: cid:.
- streamFormat – Format of the audio stream.
- offsetInMilliseconds – Offset in milliseconds from which your device should start playback of the audio stream.
- expiryTime – Timestamp for when the stream is to become invalid in ISO 8601 format.
- progressReport – Object that contains information about the progress reports required by the content provider. progressReport supports progressReportIntervalInMilliseconds and progressReportDelayInMilliseconds.
  - progressReportDelayInMilliseconds – Offset for when to send the initial progress report. The device sends this event at the exact interval specified in the Play directive.
  - progressReportIntervalInMilliseconds – Offset for when progress reports must be periodically sent, which is each time the offset elapses from the start of the track.
- token – Opaque token that represents the current audio stream.

For a complete listing of directives/events and associated behaviors, see the AudioPlayer Interface.

Progress reporting

AVS uses the progress reporting part of the Play directive to describe which measures of progress reporting that a content provider requires for a given audio stream. If a Play directive payload contains progressReportDelayInMilliseconds, progressReportIntervalInMilliseconds, or both, these parameters indicate that the audio content provider requires progress reporting for this specific stream. When determining when to send a progress report, your device must send the progress report events at the start of a stream, not the offset specified by the Play directive.

When these parameters are present, your device must send the following corresponding lifecycle events:

progressReportDelayInMilliseconds – If the Play directive contains the progressReportDelayInMilliseconds parameter, the device must send the following events to AVS:
- ProgressReportDelayElapsed – Send this event at the specified interval from the start of the stream, not from the offsetInMilliseconds. For example, if the Play directive contains progressReportDelayInMilliseconds with a value of 20000, send the ProgressReportDelayElapsed event 20,000 milliseconds after the start of the track. However, if the Play directive contains an offsetInMilliseconds value of 10000 and progressReportDelayInMilliseconds value 20000, send the event 10,000 milliseconds into playback.
progressReportIntervalInMilliseconds – If the Play directive contains the progressReportIntervalInMilliseconds parameter, send the ProgressReportIntervalElapsed event to AVS periodically at the specified interval from the start of the stream, not from the offsetInMilliseconds. For example, if the Play directive contains progressReportIntervalInMilliseconds with a value of 20000, send the ProgressReportIntervalElapsed event 20,000 milliseconds from the start of the track and every 20,000 milliseconds until the stream ends. However, if the Play directive contains an offsetInMilliseconds value of 10000 and progressReportIntervalInMilliseconds value of 20000, send the event 10,000 milliseconds from the start of playback, and every 20,000 milliseconds after that until the stream ends.

Event sequences for common AVS audioplayer use cases

The following diagrams illustrate lifecycle events that your device should send and related actions that the device should take in response to directives sent by AVS. In conjunction with logs produced by the AVS Device SDK, you can use these diagrams to troubleshoot development and certification issues.

Scenario 1: "Alexa, play rock music from iHeartRadio."

Consider a scenario where a user makes a request to play rock music from iHeartRadio. The following diagram provides the appropriate sequencing of events sent to and directives expected from AVS. In this example, the first stream plays until completion, and the device sends a PlaybackFinished event.

Scenario 2: Stop and resume an audio stream

Consider a scenario the user plays a song, and 45 seconds into playback, the user says, "Alexa, stop." After waiting 10 seconds, the user says, "Alexa, resume."

In this example, the user makes a request to stop audio playback. When the user interrupts audio playback, the device pauses the playback temporarily because the Dialog channel is active and in the foreground. When this occurs, your device must send PlaybackPaused. After AVS identifies your request, AVS sends StopCapture and Stop directives to instruct your device to close the microphone and to stop audio playback on the Content channel. In response to the Stop directive, the device sends a PlaybackStopped event. This scenario differs from the previous scenario, where the device sent a PlaybackFinished event when the stream played to completion.

Important: Send PlaybackPaused only when temporarily pausing audio to accommodate higher priority content. In this scenario, the higher priority content is the user request on the Dialog channel. Send PlaybackStopped in response to a Stop directive.

The following diagram illustrates the device sending the correct progress reports from the origination of a stream and highlights the use of channels to direct audio outputs, such as audio playback and Alexa speech.

Scenario 3: Use a physical control on a device to navigate to the next stream in your playback queue

Consider a scenario where a user plays a song, and 15 seconds into playback, the user presses the Next button on the device to skip to the next stream.

Scenario 4: Use voice to navigate to the next stream in your playback queue

In this scenario the user plays a song, and 15 seconds into playback, the user says, "Alexa, next".

Scenario 5: An alarm interrupts music playback

In this scenario, a user asks the device to play music. During playback, an already-set alarm goes off, which the user stops.

The following diagram illustrates the appropriate sequencing of events sent to and directives expected from AVS and highlights the use of channels to direct audio outputs, such as audio playback and alarm management.

Scenario 6: "Alexa, what movies are playing by me?"

In this scenario, a user asks which movies are playing nearby. The following diagram provides the appropriate sequencing of events sent to and directives expected from AVS.

Next steps

Review the AudioPlayer Interface.

Resources

Was this page helpful?

Provide feedback

Last updated: Nov 27, 2023