Add Visuals and Audio to Your Skill

Create a visual experience for your skill with graphics, images, slideshows, video, and animations using Alexa Presentation Language (APL). APL is a responsive layout language that lets you build visuals to render on Alexa-enabled multimodal devices. You can also build audio responses that mix and layer multiple Alexa voices, sound effects, and background music with APL for audio. You can combine audio and visual responses.

Visual and audio responses work on devices with screens, such as the Echo Show, TVs, and Alexa-enabled tablets. Audio responses also work on speaker devices such as the Amazon Echo and Echo Dot.

The APL content is part of the skill response

Custom voice model skills use a request and response interface. Alexa sends your AWS Lambda function or web service a request, such as a LaunchRequest or IntentRequest. Your skill handles this request and returns a response.

APL works within this framework. When your skill returns a response, you include a directive to display a visual response or play an audio response. You pass the directive two items:

  • An APL document, which is a JSON object that defines either a visual or audio template. The document provides the structure and layout for the response. Conditional logic in the document lets the template adapt to different devices and situations.
  • An APL data source, which is a JSON object you define. The data source provides the content to populate the template. You use a data source for the content that might change when the user invokes your skill. This approach lets you separate the visual or audio presentation from the data.

User: Alexa, open Hello World and say hello.
Alexa sends your skill an IntentRequest. Your skill returns a response with speech and visual content.
Alexa: Hello World! (Alexa speaks this response and displays the visual content on the screen at the same time.)

A visual displayed as part of the skill response
A visual displayed as part of the skill response

Users interact with the APL response in different ways

Users can interact with Alexa-enabled devices with screens. For example, users can tap buttons on Echo Show devices, or use a remote to navigate the screen and select items on Fire TV devices. Users can also speak their requests to the skill, as they would with any Alexa device.

A visual response you build with APL can take advantage of these input modes. You define buttons and other touchable items in your APL document. This items run commands. A command can change the presentation on the screen, such as by changing the text the user sees on the screen. A command can also send a message to your skill in a request. You write handlers for these requests, similar to the intent handlers you write for voice requests like IntentRequest.

For speech interactions, you define intents to capture spoken requests and intent handlers to handle those requests in your code. When Alexa sends your skill an IntentRequest, the request includes information about the APL content displayed on the screen. Your handler can use this information to provide a relevant response.

APL works on different types of devices

You can use APL present both audio and visual content:

  • Play rich audio content on all Alexa devices with APL for audio.
  • Display content on devices with screens, such as the Echo Show, Fire Tablet, and Fire TV. APL provides full support for user interaction and rich content, such as images, video, and animation.

    Devices with screens come in different shapes and sizes. You can use conditional logic to adapt your design to the device. For example, you might display a horizontal list on a landscape device, but a vertical list on a portrait device.

  • Display content on devices with alphanumeric clock displays, such as the Echo Dot with clock. You can use a smaller set of APL features to display content on these devices.

    APL supports showing alphanumeric data on the display. These devices also support unique features like the ability to marquee text and show timers and countdowns. For details, see Understand Alexa Presentation Language and Character Displays.

The APL concepts are the same regardless of the device you target.

Learn more about the parts of APL

For more about all the different parts of APL you use when building an audio or visual response, see What Makes Up an APL Visual Response?.

Get started with a tutorial

To get started with a short tutorial that introduces APL, see Tutorial: Add Your First Visual Response to a Custom Skill.