Gracias por tu visita. Esta página solo está disponible en inglés.

Understand the Music Skill API

The Alexa Music Skill API is a set of interfaces that enable selection and control of audio content streamed through an Alexa-enabled device. When you build a music skill, the voice interaction model is defined and handled for you. Alexa interprets user utterances and sends messages to your skill that communicate these requests.

Currently, anyone can build a music skill for distribution in the United States. If you have an Amazon Business Development representative, contact that person to learn about distribution in other locales.


The Alexa Music Skill API enables you to:

  • Integrate your music service with Alexa so that users can play music from your catalog on Alexa-enabled devices.
  • Integrate your music service with Alexa features like setting music alarms, multi-room music, and more.
  • Ingest your music catalog for voice modeling purposes.
  • Provide metadata for audio playing from your service.
  • Subscribe to reporting capabilities.

With the Alexa Music Skills API, you can rely on Alexa to innovate on the core audio and voice interaction experience while you focus on onboarding and optimizing your music service for Alexa.

Who can build music skills?

Anyone can build a music skill. Generally, music skill builders are developers who want to integrate their music service with Alexa, either for their own private use or general public use (certification required for public use).

Currently, anyone can build a music skill for distribution in the United States. If you have an Amazon Business Development representative, contact that person to learn about distribution in other locales.


To create a music skill, you need the following:

  • An Amazon developer account. Sign up is free.
  • An Amazon Alexa-enabled device, such as Amazon Echo, registered to your Amazon developer account.
  • A streaming music service with a cloud API to control it.
  • The ability to provide your music catalog metadata to Amazon on a regular basis (for example, weekly) for voice modeling and entity resolution purposes.
  • Permission to stream the content that your music skill or music service makes available to users.
  • An AWS account. You host your skill code as an AWS Lambda function.
  • Knowledge of one of the programming languages supported by AWS Lambda: Node.js, Java, Python, C#, or Go.
  • A basic understanding of OAuth 2.0 if your skill uses account linking.

Steps to create a music skill (overview)

To create a music skill, complete the following steps. For more information about each step, see steps to create a music skill.

  1. Create the music skill.
  2. Create an AWS Lambda function for your skill code.
  3. Configure account linking (optional).
  4. Upload catalogs.
  5. Enable the skill.
  6. Test the skill.
  7. Submit the skill for certification.

How a music skill works

An Alexa music skill system consists of the following:

The person who subscribes to a music service and interacts with an Alexa-enabled device.
The Music Skill API
A service that understands a user's voice commands and converts them to messages that are sent to a music skill.
AWS Lambda
A compute service offered by Amazon Web Services (AWS) that hosts the music skill code.
Music Skill
Code and configuration that interprets messages received from Alexa, and communicates with a music service cloud.
Music Service Cloud
Your cloud environment that manages your users and content.
Music Content
Audio content that is sent to Alexa for playback on an Alexa-enabled device.
Music Catalogs
Files that you provide to Alexa that contain information about all of the music content available through your music skill.

The following example scenario explains how an Alexa music skill system works:

  1. A user enables a music skill and then says, "Alexa, play Lady Gaga on skill name" to his or her Alexa-enabled device.
  2. The Alexa-enabled device hears this utterance and sends it to the Alexa service for interpretation.
  3. The Alexa service interprets the action as "play". It composes a JSON message (a GetPlayableContent API request) and sends it to the skill to determine if there is music or audio available to satisfy the user's utterance. The GetPlayableContent request includes:
    • The action ("resolve to playable content").
    • A list of resolved entities (artist, album, track, etc.) that were found in the music partner's catalog for that utterance.
    • An OAuth 2.0 token authenticating the user (only for skills that have enabled account linking).
  4. The skill receives and parses the request for the action, the resolved entities, and authentication details. It uses this information to communicate with the music service cloud.
  5. The skill communicates with the music service cloud to determine what audio to return to satisfy the user's utterance. The music service cloud returns a content identifier representing the audio. In this example, the identifier might represent a playlist of popular songs by Lady Gaga.
  6. The skill sends a GetPlayableContent response back to the Music Skill API indicating that the user's utterance can be satisfied, and includes the identifier for the audio.
  7. The Alexa service sends an Initiate API request to the skill, indicating that playback of the audio content should start. The skill returns an Initiate response containing the first playable track to the Alexa service.
  8. The Alexa service translates this into a response on the user's device. For example, Alexa might say, "Playing popular songs by Lady Gaga". Alexa then queues the first track on the device's media player software for immediate playback.
  9. When the first track is almost done playing on the device, the Alexa service requests the next track from the skill using a GetNextItem API request. The skill returns another playable track to the Alexa service, which is sent to the user's device for playback. This process repeats until the skill, in response to a request for the next track, indicates there are no more tracks to play.

The preceding scenario illustrates the interfaces that you need to build a proof-of-concept music skill:

For the best user experience, your skill should support other interfaces in addition to those in the preceding list. For example, GetPreviousItem allows users to replay the previous track. The API reference documentation contains additional details about the interfaces and the functionality they provide.

Next step: Create a music skill

For the steps to create a music skill, see steps to create a music skill.