Before today, the Alexa Skills Kit enabled short audio via SSML audio tags on your skill responses. Today we are excited to announce that we have now added streaming audio support for Alexa skills including playback controls. This means you can easily create skills that playback audio content like podcasts, news stories, and live streams.
New AudioPlayer and PlaybackController interfaces provide directives and requests for streaming audio and monitoring playback progression. With this new feature, your skill can send audio directives to start and stop the playback. The Alexa service can provide your skill with information about the audio playback’s state, such as when the track is nearly finished, or when playback starts and stops. Alexa can also now send requests in response to hardware buttons, such as those on a remote control.
To enable audio playback support in your skill you simply need to turn the Audio Player functionality on and handle the new audio Intents. Navigate to the Alexa developer portal and do the following:
In addition to the required built-in intents, your skill should gracefully handle the following additional built-in intents:
Note: Users can invoke these built-in intents without using your skill’s invocation name. For example, while in a podcast skill you create, a user could say “Alexa Next” and your skill would play the next episode.
If your skill is currently playing audio, or was the skill most recently playing audio, these intents are automatically sent to your skill. Your code needs to expect them and not return an error. If any of these intents does not apply to your skill, handle it in an appropriate way in your code. For instance, you could return a response with text-to-speech indicating that the command is not relevant to the skill. The specific message depends on the skill and whether the intent is one that might make sense at some point, for example:
Note: If your skill uses the AudioPlayer directives, you cannot extend the above built-in intents with your own sample utterances.
The new AudioPlayer Interface provides directives to control the audio stream. You must call the Play Directive from one of your Intents to start Audio Playback. Here is a list of the Directives:
Directive |
Description |
---|---|
AudioPlayer.Play |
Sends Alexa a command to stream the audio file identified by the specified audioItem. |
AudioPlayer.Stop |
Stops any currently playing audio stream. |
AudioPlayer.ClearQueue |
Clears the queue of all audio streams. |
As part of your skills’ generated responses to the Alexa service you can now include Audio directives. When including a directive in your response, set the type property to the directive you want to send. Include directives in the directives array in your response. Here is an example of using directives in an Alexa response:
{
"version": "1.0",
"sessionAttributes": {},
"response": {
"outputSpeech": {},
"card": {},
"reprompt": {},
"directives": [
{
"type": "AudioPlayer.Play",
"playBehavior": "string",
"audioItem": {
"stream": {
"token": "string",
"url": "string",
"offsetInMilliseconds": 0
}
}
}
],
"shouldEndSession": true
}
}
When responding to a LaunchRequest or IntentRequest, your response can include both AudioPlayer directives and standard response properties such as outputSpeech, card, and reprompt. For example, if you provide outputSpeech in the same response as a Play directive, Alexa speaks the provided text before beginning to stream the audio.
Using the AudioPlayer.Play Directive sends Alexa a command to stream the audio file identified by the specified audioItem. Use the playBehavior parameter to determine whether the stream begins playing immediately, or is added to the queue.
When sending a Play directive, you normally set the shouldEndSession flag in the response object to true to end the session. If you set this flag to false, Alexa sends the stream to the device for playback, then immediately pauses the stream to listen for the user’s response. Here is an example of a full response object sent from a LaunchRequest or IntentRequest.
{
"version": "1.0",
"sessionAttributes": {},
"response": {
"outputSpeech": {
"type": "PlainText",
"text": "Playing the requested song."
},
"card": {
"type": "Simple",
"title": "Play Audio",
"content": "Playing the requested song."
},
"reprompt": {
"outputSpeech": {
"type": "PlainText",
"text": null
}
},
"directives": [
{
"type": "AudioPlayer.Play",
"playBehavior": "ENQUEUE",
"audioItem": {
"stream": {
"token": "this-is-the-audio-token",
"url": "https://my-audio-hosting-site.com/audio/sample-song.mp3",
"offsetInMilliseconds": 0
}
}
}
],
"shouldEndSession": true
}
}
Notice the response includes a simple home card and plaintext outputspeech. In this example Alexa would say “Playing the requested song” along with the homecard and then begin playing back audio with the provided audioItem url.
The AudioPlayer Interface sends requests to notify your skill about changes to the audio’s playback state. You can handle these requests to understand what is happening with the audio stream and you can optionally choose to respond.
Request Type |
Description |
---|---|
AudioPlayer.PlaybackStarted |
Sent when Alexa begins playing the audio stream previously sent in a Play directive. This lets your skill verify that playback began successfully. |
AudioPlayer.PlaybackFinished |
Sent when the stream that Alexa is playing comes to an end on its own. |
AudioPlayer.PlaybackStopped |
Sent when Alexa stops playing an audio stream in response to a voice request or an AudioPlayer directive. |
AudioPlayer.PlaybackNearlyFinished |
Sent when the currently playing stream is nearly complete and the device is ready to receive a new stream. |
AudioPlayer.PlaybackFailed |
Sent when Alexa encounters an error when attempting to play a stream. |
Note: Since these requests are not sent in the context of a skill session the requests do not include the session object. You can use the context object to get details such as the applicationId and userId if you need them. Here is an example of an AudioPlayer request created when audio playback has started:
{
"version": "string",
"context": {
"System": {
"application": {},
"user": {},
"device": {}
}
},
"request": {
"type": "AudioPlayer.PlaybackStarted",
"requestId": "string",
"timestamp": "string",
"token": "string",
"offsetInMilliseconds": 0,
"locale": "string"
}
}
Your skill is not required to respond to AudioPlayer requests but if it does, please be aware that it can only respond with the AudioPlayer directives mentioned earlier (Play, Stop and ClearQueue). The response should not include any of the standard properties such as outputSpeech, just like the AudioPlayer directives.
For the full request format, see Request Format in the JSON Interface Reference for Custom Skills.
AudioPlayer Requests handle playback in response to voice requests such as “Alexa, next song.” Those voice requests are then sent to your skill as built-in intents (such as AMAZON.NextIntent) via a normal IntentRequest request. The PlaybackController interface however, provides requests to notify your skill when the user interacts with player controls (such as the buttons on a device or remote control.)
Just like AudioPlayer Requests your skill can respond to these requests with AudioPlayer directives to start and stop playback. Also, some hardware devices do not support sending PlaybackController requests in response to button presses.
PlaybackController sends the following requests to notify your skill about playback control events:
Request Type |
Description |
---|---|
PlaybackController.NextCommandIssued |
Sent when the user uses a “next” button with the intent to skip to the next audio item. |
PlaybackController.PauseCommandIssued |
Sent when the user uses a “pause” button with the intent to stop playback. |
PlaybackController.PlayCommandIssued |
Sent when the user uses a “play” or “resume” button with the intent to start or resume playback. |
PlaybackController.PreviousCommandIssued |
Sent when the user uses a “previous” button with the intent to go back to the previous audio item. |
Just like AudioPlayer Requests these requests do not include the session object so you will need to use the context object to get details such as the applicationId and userId. Here is an example of a PlaybackController request when the “next button” is clicked.
{
"version": "string",
"context": {
"System": {
"application": {},
"user": {},
"device": {}
},
"AudioPlayer": {
"token": "string",
"offsetInMilliseconds": 0,
"playerActivity": "string"
}
},
"request": {
"type": "PlaybackController.NextCommandIssued",
"requestId": "string",
"timestamp": "string",
"locale": "string"
}
}
Just like AudioPlayback requests, you can only respond with AudioPlayer directives to a PlaybackController request.
For the full request format, see Request Format in the JSON Interface Reference for Custom Skills.
When your skill sends a Play directive, the Alexa service sends the audio stream to the device for playback. Once the session ends normally (for instance, if your response included the shouldEndSession flag set to true), Alexa remembers that your skill started the playback until the user does one of the following:
During this time, users can invoke the following built-in playback control intents without using your skill’s invocation name:
Let’s take a look at an example of this for an imaginary custom skill called “My Podcast Player.” This example skill defines an intentPlayLatestEpisode mapped to a sample utterance “play the latest episode.” Here is how the flow of Audio playback would work:
User: Alexa, ask My Podcast Player to play the latest episode.
User: Alexa, next. (note no invocation name used.)
User: Alexa, pause.
Although at this point the audio is not playing and there is no current session, the Alexa service is still tracking My Podcast Player as the skill that most recently streamed audio. Assuming the device remains on and the user does not use any other audio streaming skills or services, the following could take place at any time later:
User: Alexa, resume.
Keep in mind this only applies to the built-in intents. The intents you define (such as the example PlayLatestEpisodeintent) must be invoked using a normal invocation phrase.
For complete information on the AudioPlayer Interface check out the following link as well as the full interface for the PlayBackController Interface here.
We’ve also released a sample Alexa Audio Player skill for Node.js that provides a working framework for developers to quickly get started in building a skill that can play audio and respond to events.
For more information about getting started with Alexa, check out the following:
Audio Interface Reference
PlaybackController Reference
Alexa Dev Chat Podcast
Alexa Training with Big Nerd Ranch
Intro to Alexa Skills On Demand
Voice Design 101 On Demand
Alexa Skills Kit (ASK)
Alexa Developer Forums
-Dave (@TheDaveDev)