Important: Anyone can build a music skill for public distribution in the United States. However, radio and podcast skills are currently in developer preview. To register your radio or podcast skill for the preview, contact your Alexa Music, Radio, or Podcast representative.
The Alexa.Media.Playback interface enables Alexa to start immediate playback of content on an Alexa device.
Understand ContentId
To build a high quality music or podcast skill, you must understand ContentId. ContentId identifies a listening experience that a skill can return and play on a device. A ContentId can reference a track, an editorial playlist of popular songs, a custom (artist- or genre-seeded) station, an album, or a season of a program series.
ContentId must be globally unique within your skill, be long-lived, and always represent the same experience for all skill users. For example, imagine that a user says, "Alexa, play the album Rainier Fog by Alice in Chains", and the skill returns a ContentId of "123" to represent the album. This same ContentId should represent this album for all users. When another user says, "Alexa, play the album Rainier Fog by Alice in Chains", your skill should send the same ContentId of "123" in response to the GetPlayableContent request, even if the request happens one year after the original user's request.
Here is another example: Imagine that your music service has a "Top Weekly Songs" playlist, where the list of songs in the playlist changes from week to week to reflect the most popular songs on the charts. the user says, "Alexa, play the top weekly songs playlist". Your skill responds with a ContentId of "321" which represents the "Top Weekly Songs" playlist. When Alexa sends this ContentId in an Initiate request, your skill returns the first track of the playlist, for example, Shallow by Lady Gaga. One month later, the user says, "Alexa, play the top weekly songs playlist". Your skill again responds with a ContentId of "321" because this ContentId always represents the "Top Weekly Songs" playlist. However, this time when Alexa sends this ContentId in an Initiate request, your skill returns, for example, the song Better Now by Post Malone, because the playlist contents change weekly.
Important: As explained in the preceding examples, ContentId must be immutable.
When a user sets a podcast alarm (for example, "Alexa, wake me up to the John Smith podcast from skill name at 8 AM"), Alexa saves the ContentId returned in the GetPlayableContent response. Each time the alarm is triggered, which might be months later for a repeating alarm, Alexa sends an Initiate request to your skill with the saved ContentId. The resulting queue of programs might be different because the program series changes daily, but the user is still listening to the "As It Happens" program series, so the result is correct.
When a user sets a music alarm (for example, "Alexa, wake me up to Can't Stop The Feeling by Justin Timberlake from skill name at 8 AM"), Alexa saves the ContentId returned in the GetPlayableContent response. Each time the alarm is triggered, which might be months later for a repeating alarm, Alexa sends an Initiate request to your skill with the saved ContentId, and the response should reflect the content the user requested when setting the alarm.
Similarly, when a user browses their history of music requests and selects an item to replay, Alexa calls Initiate with the saved ContentId. In the preceding example for "Top Weekly Songs," if the user sees "Top Weekly Songs" in their history and clicks to play it again, Alexa sends an Initiate request with a ContentId of "321." The resulting queue of songs might be different because the playlist changes weekly, but the user is still listening to the "Top Weekly Songs" playlist, so the result is correct.
Utterances
When you use the Alexa.Media.Playback interface, the voice interaction model is already built for you. The following example show a customer utterance:
If you build your skill by using the ASK CLI, configure it in your skill manifest JSON.
Supporting premium audio
When the provider supports premium audio, the Initiate request contains a list of Endpoint objects which identify the content type identifiers that the provider can provide, and which are playable on the target device. An endpoint corresponds to a playback device. Currently the list only contains a single Endpoint object which is the target device. Within each Endpoint is a list of ContentFormat objects which contain the content type identifiers for the provider to choose from when it's choosing a playback stream.
Directives
Initiate directive
When Alexa receives a content identifier from a skill's GetPlayableContent response and is ready to start immediate playback of the content on an Alexa device, Alexa sends an Initiate request. The request includes the content identifier, and the skill responds with the stream URI for immediate playback of the content. The following table shows which content types use this directive:
Content type
Required?
Music
Required
Radio
Required
Podcast
Required
There are three primary scenarios that cause Alexa to call this directive:
The user requested music, radio, or a program series to play, so playback is initiated immediately.
A previously set music, radio, or podcast alarm is triggered. For example, the user set an alarm to play a song at 7:00 AM, so at that time Alexa makes an Initiate call to the skill.
The user selects content from a play history UI that shows (for example in the Alexa app, or on an Alexa device with a screen) to hear the content again.
Podcast skills define two types of program series: serial and episodic (non-serial). In a serial program series, users expect your skill to play episodes in order, from oldest to latest. In an episodic program series, each episode is a stand-alone program, and users expect your skill to play episodes from latest to oldest by default. Therefore, in its response to the Initiate directive, your skill should return the first item with the oldest program for a serial program series and return the latest program for a non-serial (episodic) program.
The item that's currently playing (active) on the target endpoint, if any. This property is absent when nothing is playing. Your skill should use this property to enforce concurrency limits. Specifically, it should use this property to determine whether the playback session starts on an endpoint where no stream is playing, or whether it replaces an existing stream on an endpoint.
The playback modes requested by the user. If the user doesn't mention anything about a looped or shuffled queue, this attribute defaults to false for all supported playback modes.
Object
playbackModes.shuffle
True to shuffle the queue, false to play the queue in order. Note: Ignored for podcast skills.
Boolean
playbackModes.loop
True to start playing the queue again after it finishes, false to end. Note: Ignored for podcast skills.
Boolean
playbackPosition
(Podcast only) The position where playback should begin, based on the Alexa user's requirements. The only supported value is RESUME. If the user doesn't mention anything about where to start playback, this attribute is absent.
String
endpoints
(Premium audio only) A list of Endpoint objects containing the content type identifiers that the music provider supports that are playable on the target device. See the Endpoint object for more information. This field is present only if the provider supports premium audio.
When a user says, "Alexa, play the podcast/program <program series name>", the skill should return a valid response (containing a content reference) to the GetPlayableContent directive. Alexa then plays the content from that response. To start playback, Alexa sends an Initiate request, similar to the following example, instructing the skill to create a queue from the content reference.
The following example demonstrates the request that occurs when a user asks Alexa to resume content, for example "Alexa, resume the podcast/program <program series name>."
If you handle a Initiate directive successfully, respond with an Alexa.Response event.
In response to the first of the preceding examples, the skill creates a queue for the user based on the requested ContentId and returns the queue identifier and the first audio item to Alexa. The Initiate response should contain enough information for Alexa to know how to manage the queue, and the first track to play for the user. To get the second track to play for the user, Alexa calls GetNextItem after beginning to play the first track. Subsequent tracks are also retrieved with GetNextItem after each track begins playback.
The time it takes your skill to respond to an Initiate request directly impacts the Alexa user experience. Music skills should adhere to the following response latency limits.
Call Percentage
Latency Limit (in milliseconds)
50%
100 ms
90%
250 ms
99%
400 ms
Important: Longer response times might cause your skill to fail certification.
Initiate response event payload details
Field
Description
Type
Required
playbackMethod
Information about the playback method that Alexa should use to achieve playback for the user, and the first track.
The following Initiate response example demonstrates support for premium audio.
When the provider returns a response, the stream object contains a content format object with the content type ID selected by the provider for playback. Note that the selected content type ID may not correspond to what's actually played on the device. For example, if network quality is low, the device may select or fall back to a lower quality stream if one is present in the manifest that the stream URI points to.
The following example adds a background image for Alexa to display while playing music. For more information, see the background field of the BaseMetadata object.
In response to the preceding Initiate directive example, the skill creates a queue for the user based on the requested ContentId and returns the queue identifier and the first audio item to Alexa. The Initiate response should contain enough information for Alexa to know how to manage the queue, and the first program to play for the user. To get the second program to play for the user, Alexa makes an additional call after beginning to play the first program.
To respond a resume request, your skill should use the offsetInMilliseconds field in the returned stream object to indicate where to start playback. The following example shows a response for a resume request.
Following is a sample response with the latest program content. The isLatest flag should be true when a program is the latest, or most recently released, program in a program series. When the isLatest flag is true, the customer receives a prompt indicating that what they're about to hear is the latest episode. When set to false, or not specified in the response, the customer might hear a hint suggesting that they can ask to play the latest episode. If the isLatest flag isn't implemented, customers receive incorrect prompts and playback behavior.
{"header":{"messageId":"2cae4d53-6bc1-4f8f-aa98-7dd2727ca84b","namespace":"Alexa.Media.Playback","name":"Initiate.Response","payloadVersion":"1.0"},"payload":{"playbackMethod":{"type":"ALEXA_AUDIO_PLAYER_QUEUE","id":"76f325d5-a648-4e8f-87ad-6e53cf99e4c7","controls":[{"type":"TOGGLE","name":"SHUFFLE","enabled":true,"selected":false},{"type":"TOGGLE","name":"LOOP","enabled":true,"selected":false}],"rules":{"feedback":{"type":"PREFERENCE","enabled":false}},"firstItem":{"id":"e73befbe-8c27-4e4b-ab0c-9865ce8516f0","playbackInfo":{"type":"DEFAULT"},"metadata":{"type":"PROGRAM","name":{"speech":{"type":"PLAIN_TEXT","text":"Example Program"},"display":"Example Program "},"series":{"speech":{"type":"PLAIN_TEXT","text":"Example Program Series"},"display":"Example Program Series"},"isLatest":true,"art":{}},"durationInMilliseconds":3725000,"controls":[{"type":"COMMAND","name":"NEXT","enabled":true},{"type":"COMMAND","name":"PREVIOUS","enabled":true},{"type":"COMMAND","name":"SEEK_FORWARD","enabled":true},{"type":"COMMAND","name":"SEEK_BACKWARD","enabled":true},{"type":"COMMAND","name":"SEEK_POSITION","enabled":true}],"rules":{"feedbackEnabled":true},"stream":{"id":"STREAMID_92_14629004","uri":"https://www.example.com/podcast.mp3","offsetInMilliseconds":0},"feedback":{"type":"PREFERENCE","value":"POSITIVE"}}}}}
Initiate directive error handling
If your skill can't handle a Initiate directive successfully, it should respond with an Alexa.Media.ErrorResponse event or an Alexa.ErrorResponse event. For more information, see Alexa Music, Radio, and Podcast Skill API Error Responses.