New Alexa Skills Kit (ASK) Feature: Audio Streaming in Alexa Skills

David Isbitski Aug 24, 2016
Share:
Tips & Tools Tutorial Promote
Blog_Header_Post_Img
intro to streaming with alexa image

Before today, the Alexa Skills Kit enabled short audio via SSML audio tags on your skill responses. Today we are excited to announce that we have now added streaming audio support for Alexa skills including playback controls. This means you can easily create skills that playback audio content like podcasts, news stories, and live streams.

New AudioPlayer and PlaybackController interfaces provide directives and requests for streaming audio and monitoring playback progression. With this new feature, your skill can send audio directives to start and stop the playback. The Alexa service can provide your skill with information about the audio playback’s state, such as when the track is nearly finished, or when playback starts and stops. Alexa can also now send requests in response to hardware buttons, such as those on a remote control.

Enabling Audio Playback Support in Your Skill

To enable audio playback support in your skill you simply need to turn the Audio Player functionality on and handle the new audio Intents. Navigate to the Alexa developer portal and do the following:

  • On the Skill Information page in the developer portal, set the Audio Player option to Yes.
     
  • Include the required built-in intents for pausing and resuming audio in your intent schema and implement them in some way:
    • AMAZON.PauseIntent
    • AMAZON.ResumeIntent
       
  • Call the AudioPlayer.Play Directive from one of your Intents to start the Audio Playback
     
  • Handle AudioPlayer and PlaybackController Requests and optionally respond

In addition to the required built-in intents, your skill should gracefully handle the following additional built-in intents:
 

  • AMAZON.CancelIntent
  • AMAZON.LoopOffIntent
  • AMAZON.LoopOnIntent
  • AMAZON.NextIntent
  • AMAZON.PreviousIntent
  • AMAZON.RepeatIntent
  • AMAZON.ShuffleOffIntent
  • AMAZON.ShuffleOnIntent
  • AMAZON.StartOverIntent

Note: Users can invoke these built-in intents without using your skill’s invocation name. For example, while in a podcast skill you create, a user could say “Alexa Next” and your skill would play the next episode.

If your skill is currently playing audio, or was the skill most recently playing audio, these intents are automatically sent to your skill. Your code needs to expect them and not return an error. If any of these intents does not apply to your skill, handle it in an appropriate  way in your code. For instance, you could return a response with text-to-speech indicating that the command is not relevant to the skill. The specific message depends on the skill and whether the intent is one that might make sense at some point, for example:
 

  • For a podcast skill, the AMAZON.ShuffleOnIntent intent might return the message: “I can’t shuffle a podcast.”
  • For version 1.0 of a music skill that doesn’t yet support playlists and shuffling, the AMAZON.ShuffleOnIntent intent might return: “Sorry, I can’t shuffle music yet.”


Note: If your skill uses the AudioPlayer directives, you cannot extend the above built-in intents with your own sample utterances.

Implementing Audio Directives

The new AudioPlayer Interface provides directives to control the audio stream. You must call the Play Directive from one of your Intents to start Audio Playback. Here is a list of the Directives:

Directive

Description

AudioPlayer.Play

Sends Alexa a command to stream the audio file identified by the specified audioItem.

AudioPlayer.Stop

Stops any currently playing audio stream.

AudioPlayer.ClearQueue

Clears the queue of all audio streams.


As part of your skills’ generated responses to the Alexa service you can now include Audio directives. When including a directive in your response, set the type property to the directive you want to send. Include directives in the directives array in your response. Here is an example of using directives in an Alexa response:

Copied to clipboard
{

  "version": "1.0",

  "sessionAttributes": {},

  "response": {

    "outputSpeech": {},

    "card": {},

    "reprompt": {},

    "directives": [

      {

        "type": "AudioPlayer.Play",

        "playBehavior": "string",

        "audioItem": {

          "stream": {

            "token": "string",

            "url": "string",

            "offsetInMilliseconds": 0

          }

        }

      }

    ],

    "shouldEndSession": true

  }

}

When responding to a LaunchRequest or IntentRequest, your response can include both AudioPlayer directives and standard response properties such as outputSpeech, card, and reprompt. For example, if you provide outputSpeech in the same response as a Play directive, Alexa speaks the provided text before beginning to stream the audio.

Using the Play Directive

Using the AudioPlayer.Play Directive sends Alexa a command to stream the audio file identified by the specified audioItem. Use the playBehavior parameter to determine whether the stream begins playing immediately, or is added to the queue.

When sending a Play directive, you normally set the shouldEndSession flag in the response object to true to end the session. If you set this flag to false, Alexa sends the stream to the device for playback, then immediately pauses the stream to listen for the user’s response. Here is an example of a full response object sent from a LaunchRequest or IntentRequest.

Copied to clipboard
{

  "version": "1.0",

  "sessionAttributes": {},

  "response": {

    "outputSpeech": {

      "type": "PlainText",

      "text": "Playing the requested song."

    },

    "card": {

      "type": "Simple",

      "title": "Play Audio",

      "content": "Playing the requested song."

    },

    "reprompt": {

      "outputSpeech": {

        "type": "PlainText",

        "text": null

      }

    },

    "directives": [

      {

        "type": "AudioPlayer.Play",

        "playBehavior": "ENQUEUE",

        "audioItem": {

          "stream": {

            "token": "this-is-the-audio-token",

            "url": "https://my-audio-hosting-site.com/audio/sample-song.mp3",

            "offsetInMilliseconds": 0

          }

        }

      }

    ],

    "shouldEndSession": true

  }

}

Notice the response includes a simple home card and plaintext outputspeech. In this example Alexa would say “Playing the requested song” along with the homecard and then begin playing back audio with the provided audioItem url.

Handling AudioPlayer Requests

The AudioPlayer Interface sends requests to notify your skill about changes to the audio’s playback state. You can handle these requests to understand what is happening with the audio stream and you can optionally choose to respond.

Request Type

Description

AudioPlayer.PlaybackStarted

Sent when Alexa begins playing the audio stream previously sent in a Play directive. This lets your skill verify that playback began successfully.

AudioPlayer.PlaybackFinished

Sent when the stream that Alexa is playing comes to an end on its own.

AudioPlayer.PlaybackStopped

Sent when Alexa stops playing an audio stream in response to a voice request or an AudioPlayer directive.

AudioPlayer.PlaybackNearlyFinished

Sent when the currently playing stream is nearly complete and the device is ready to receive a new stream.

AudioPlayer.PlaybackFailed

Sent when Alexa encounters an error when attempting to play a stream.

Note: Since these requests are not sent in the context of a skill session the requests do not include the session object. You can use the context object to get details such as the applicationId and userId if you need them. Here is an example of an AudioPlayer request created when audio playback has started:

Copied to clipboard
{

  "version": "string",

  "context": {

    "System": {

      "application": {},

      "user": {},

      "device": {}

    }

  },

  "request": {

    "type": "AudioPlayer.PlaybackStarted",

    "requestId": "string",

    "timestamp": "string",

    "token": "string",

    "offsetInMilliseconds": 0,

    "locale": "string"

  }

}

Your skill is not required to respond to AudioPlayer requests but if it does, please be aware that it can only respond with the AudioPlayer directives mentioned earlier (Play, Stop and ClearQueue). The response should not include any of the standard properties such as outputSpeech, just like the AudioPlayer directives.

For the full request format, see Request Format in the JSON Interface Reference for Custom Skills.

Handling PlaybackController Requests

AudioPlayer Requests handle playback in response to voice requests such as “Alexa, next song.” Those voice requests are then sent to your skill as built-in intents (such as AMAZON.NextIntent) via a normal IntentRequest request. The PlaybackController interface however, provides requests to notify your skill when the user interacts with player controls (such as the buttons on a device or remote control.)

Just like AudioPlayer Requests your skill can respond to these requests with AudioPlayer directives to start and stop playback. Also, some hardware devices do not support sending PlaybackController requests in response to button presses.

PlaybackController sends the following requests to notify your skill about playback control events:
 

Request Type

Description

PlaybackController.NextCommandIssued

Sent when the user uses a “next” button with the intent to skip to the  next audio item.

PlaybackController.PauseCommandIssued

Sent when the user uses a “pause” button with the intent to stop playback.

PlaybackController.PlayCommandIssued

Sent when the user uses a “play” or “resume” button with the intent to start or resume playback.

PlaybackController.PreviousCommandIssued

Sent when the user uses a “previous” button with the intent to go back to the previous audio item.

Just like AudioPlayer Requests these requests do not include the session object so you will need to use the context object to get details such as the applicationId and userId. Here is an example of a PlaybackController request when the “next button” is clicked.

Copied to clipboard
{

  "version": "string",

  "context": {

    "System": {

      "application": {},

      "user": {},

      "device": {}

    },

    "AudioPlayer": {

      "token": "string",

      "offsetInMilliseconds": 0,

      "playerActivity": "string"

    }

  },

  "request": {

    "type": "PlaybackController.NextCommandIssued",

    "requestId": "string",

    "timestamp": "string",

    "locale": "string"

  }

}

Just like AudioPlayback requests, you can only respond with AudioPlayer directives to a PlaybackController request.

For the full request format, see Request Format in the JSON Interface Reference for Custom Skills.

Using the New Built-In Intents for Audio Playback

When your skill sends a Play directive, the Alexa service sends the audio stream to the device for playback. Once the session ends normally (for instance, if your response included the shouldEndSession flag set to true), Alexa remembers that your skill started the playback until the user does one of the following:
 

  • Invokes audio playback with a different skill.
  • Invokes another service that streams audio, such as the built-in music service or the flash briefing.
  • Reboots the device.
     

During this time, users can invoke the following built-in playback control intents without using your skill’s invocation name:

  • AMAZON.CancelIntent
  • AMAZON.LoopOffIntent
  • AMAZON.LoopOnIntent
  • AMAZON.NextIntent
  • AMAZON.PauseIntent
  • AMAZON.PreviousIntent
  • AMAZON.RepeatIntent
  • AMAZON.ResumeIntent
  • AMAZON.ShuffleOffIntent
  • AMAZON.ShuffleOnIntent
  • AMAZON.StartOverIntent
     

Let’s take a look at an example of this for an imaginary custom skill called “My Podcast Player.” This example skill defines an intentPlayLatestEpisode mapped to a sample utterance “play the latest episode.” Here is how the flow of Audio playback would work:

User: Alexa, ask My Podcast Player to play the latest episode. 

  • Alexa opens a new skill session and sends the My Podcast Player skill the normalPlayLatestEpisode.
  • My Podcast Player skill sends a Play directive. The skill session closes and audio begins playing. 
     

User: Alexa, next. (note no invocation name used.)

  • Alexa opens a new skill session and sends the My Podcast Player skill AMAZON.NextIntent.
  • My Podcast Player skill takes appropriate action for ‘next’ and closes the skill session. 
     

User: Alexa, pause. 

  • Alexa opens a new skill session and sends the skill AMAZON.PauseIntent.
  • My Podcast Player skill sends a Stop directive and closes the skill session. The audio is stopped.

Although at this point the audio is not playing and there is no current session, the Alexa service is still tracking My Podcast Player as the skill that most recently streamed audio. Assuming the device remains on and the user does not use any other audio streaming skills or services, the following could take place at any time later:

User: Alexa, resume. 

  • Alexa opens a new skill session and sends My Podcast Player the AMAZON.ResumeIntent.
  • My Podcast Player takes appropriate action to determine the previously playing track and send a new Play directive to restart playback.

Keep in mind this only applies to the built-in intents. The intents you define (such as the example PlayLatestEpisodeintent) must be invoked using a normal invocation phrase.

For complete information on the AudioPlayer Interface check out the following link as well as the full interface for the PlayBackController Interface here.

We’ve also released a sample Alexa Audio Player skill for Node.js that provides a working framework for developers to quickly get started in building a skill that can play audio and respond to events.

For more information about getting started with Alexa, check out the following:

Audio Interface Reference
PlaybackController Reference
Alexa Dev Chat Podcast
Alexa Training with Big Nerd Ranch
Intro to Alexa Skills On Demand
Voice Design 101 On Demand
Alexa Skills Kit (ASK)
Alexa Developer Forums

-Dave (@TheDaveDev)