Understand the Music, Radio, and Podcast Skill API

The Alexa Music, Radio, and Podcast Skill API is a set of interfaces for selection and control of audio content streamed through an Alexa-enabled device. When you use this API to build a skill, the voice interaction model is defined and handled for you. Alexa interprets user utterances and communicates these requests to your skill.

API capabilities

The Alexa Music, Radio, and Podcast Skill API enables you to do the following:

  • Integrate your service with Alexa so that users can play content from your music, radio, or podcast catalog on their Alexa-enabled devices.
  • Integrate your service with Alexa features like music alarms, multi-room music, and more.
  • Ingest your music, radio, or podcast catalog for voice modeling.
  • Provide metadata for audio streamed by your service.
  • Subscribe to reporting capabilities.

With the Alexa Music, Radio, and Podcast Skill API, Alexa innovates on the core audio and voice interaction experience for you. All you have to do is onboard users and optimize your music service for Alexa.

Who can build music, radio, and podcast skills?

To offer a skill for general public use, you must submit it for certification. Additionally, your Alexa Music, Radio, or Podcast representative must invite you to participate in a developer preview of the skill. For preview announcements, check the Alexa Skills Kit (ASK) blog.

Prerequisites

Skills that use the Alexa Music, Radio, and Podcast Skills API should provide content that is limited to music, radio streaming, or podcast content. To create such a skill, here's what you need:

  • An Amazon developer account. Sign-up is free.
  • An Amazon Alexa-enabled device, such as Amazon Echo, registered to your Amazon developer account.
  • A streaming music, radio, or podcast service with a cloud API to control it.
  • The ability to provide your music, radio, or podcast catalog metadata to Amazon on a regular basis (for example, weekly) for voice modeling and entity resolution purposes.
  • Permission to stream the content that your skill or service makes available to users.
  • An AWS account. You host your skill code as an AWS Lambda function.
  • Knowledge of one of the programming languages supported by AWS Lambda: Node.js, Java, Python, C#, Go, Ruby, or PowerShell.
  • A basic understanding of OAuth 2.0, if your skill uses account linking.

How to create a music, radio, or podcast skill

To create a music, radio, or podcast skill, complete the steps described in Steps to Create a Music, Radio, or Podcast Skill.

How a music, radio, or podcast skill works

An Alexa music, radio, or podcast skill system consists of the following elements:

User
The person who listens to a music service, radio program, or podcast, and interacts with an Alexa-enabled device.
Music, Radio, and Podcast Skill API
A service that understands a user's voice commands and converts them into messages for a music, radio, or podcast skill.
AWS Lambda
An Amazon Web Services (AWS) compute service that hosts the skill code.
Music, Radio, or Podcast Skill
A standalone music, radio, or podcast capability that an Alexa user can discover, enable, use, or disable to enhance the Alexa experience. A skill includes both the cloud-based code and the developer console or CLI configuration.
Music, Radio, or Podcast Service Cloud
The cloud environment that manages your users and content.
Music, Radio, or Podcast Content
Audio content that's sent to Alexa for playback on an Alexa-enabled device.
Music, Radio, or Podcast Catalogs
User-provided files that contain information about the music, radio, or podcast content available through your skill.

The following example workflow explains how an Alexa music skill system works:

  1. A user activates a music skill on an Alexa-enabled device, and then says, "Alexa, play Lady Gaga on <skill name>."
  2. The Alexa-enabled device hears this utterance and sends it to the Alexa service for interpretation.
  3. The Alexa service interprets the action as a GetPlayableContent request. It sends a JSON message to the skill to determine if music or other audio is available to satisfy the user's utterance. The GetPlayableContent request includes the following:
    • The action ("resolve to playable content").
    • A list of resolved entities (artist, album, track, station, and so on) found in the music partner's catalog for that utterance.
    • An OAuth 2.0 token to authenticate the user (only for skills that have enabled account linking).
  4. The skill receives and parses the action request, the resolved entities, and the authentication details. It shares this information with the music service cloud.
  5. The skill communicates with the music service cloud to determine which audio satisfies the user's utterance. The cloud returns a content identifier that represents the audio. In this example, the identifier might represent a playlist of popular songs by Lady Gaga.
  6. The skill sends a GetPlayableContent response to the Music Skill API to indicate that the user's utterance can be satisfied. This response includes the content identifier for the audio.
  7. The Alexa service sends an Initiate API request to the skill, which indicates that playback of the audio content should start. The skill returns an Initiate response that contains the first playable track.
  8. The Alexa service translates this response into speech on the user's device. For example, Alexa might say, "Playing popular songs by Lady Gaga." Alexa then queues the first track for immediate playback on the user's device.
  9. When playback of the first track is almost finished, Alexa uses a GetNextItem request to get the next track. The skill returns another track to the Alexa service for playback on the user's device. This process repeats until the skill, in response to a request for the next track, indicates that there are no more tracks to play.