Key takeaways
Alexa has a visual design framework called Alexa Presentation Language (APL), which allows you to build interactive voice and visual experiences across the device landscape. This multimodal experience can make skills more delightful and engaging to the customer. You can design custom visual elements for standard Alexa-enabled devices such as the Echo Show, Fire TV, and select Fire Tablet devices. Learn more about APL in the Alexa Learning Lab.
Need quick advice?
View the Design Checklist for Alexa Presentation Language (APL) for tips on how to create a great skill experience with APL.
In this article:
 
                        Amazon created APL so you can design custom experiences that combine voice, audio, and visual elements in a single customer interface. This framework is adaptable so one design can scale to multiple device types while keeping the visual and voice elements synchronized. There are many ways you can use APL to enrich the customer experience. With APL, you can provide customers with complementary information at a glance from across the room or offer visual clues, such as showing lists or search items. APL supports voice commands as well so that customers can ask for an item on screen instead of relying on touch interactions only. This gives your skill fluidity between interaction types, making customer interactions seamless and intuitive.
For more information about Alexa Presentation language, see Add Visuals and Audio to Your Skill.
 
                        You can deliver images on screen with or without text that can be responsive to touch using TouchWrappers. You can apply filters to images, such as blur.
When placing components on top of images, use the overlay (scrim) to apply a colored opacity layer over your image to help with the legibility and accessibility of your content. When you want to de-emphasize an image, you can also change its opacity to create different effects. You can use images as …
 
                        Ensure that the touch target is tied to, and can be selected by, voice in addition to touch. If you wrap a text string in TouchWrapper, it's best if the string represents the phrase that will trigger the intent. Because touch wrappers are intended to be touched, we recommend a minimum size of 48x48dp, which creates a physical touch target of 9mm, regardless of screen size. You can use the TouchWrapper for …
 
                        When you show text, you can specify the text color, size, and weight for available fonts. You can use TouchWrapper and ScrollView to make your text touch-responsive and allow you to display it outside the bounds of the container. This enables customers to touch to scroll below the “fold” (or, the default viewable area of the screen before the customer scrolls"). (Note that APL does not support custom fonts
When you want to add or remove emphasis to text, you can change its color and opacity to help distinguish states, or primary and secondary content. Too much text on screen can distract from the voice experience and overwhelm the customer. Remember to …
 
                        You can use Pager to show a time-ordered sequence of items that typically advance automatically, such as slideshows. Or you can use Sequence to show a continuous list of choices, such as local restaurants, and allow customers to navigate the list via voice or by touch or remote control. (For most devices, touching the screen will pause pagination.)
Pager is best used for images or text that don't match exactly with the TTS that Alexa is reading, or for content that you don't want the customer to scroll through. For example, you can use Pager to automatically paginate through a carousel of images, or a series of cards displaying sports scores.
Critical information should be spoken. Customers may not be looking at the displayed content, or may miss items at the end of your presentation. Content presented using the Pager component works best when not combined with too many other layouts displayed on screen. Too many things happening at once can be distracting to customers.
With the Sequence component, you can place a list within your skill. Sequences are best suited for providing multiple options or results for a customer to chose from in a predetermined order. Only use one sequence per screen so that the customer understands how to control the sequence with voice commands.
 
                        You can include video content within your APL layouts to continue your skill experience when the skill completes media playback. You can customize the video playback as well as build in playback controls like play, pause, and rewind buttons. Always include closed captioning in your videos, and remember to include a screenshot to use as a static preview.
Provide a way to pause the video content by voice and by using an on-screen button or other control. Customers should always control the video playback experience unless there is a specific reason for the experience to control it. Whenever possible, allow the customer to choose to repeat or loop a video. Finally, allow the customer to use familiar terms to control playback using voice. At a minimum, provide a play, pause, and full screen button.
 
                        Use APL components & imagery strategically
Pair Alexa Presentation Language (APL) components with the appropriate voice interaction: Use lists for enhanced search and browsing via voice, for example. Consider which APL features would best serve your customers at a given step in their experience. The images (photography, icons, and other vector graphic) your skill uses on screen should be appropriate to the customer’s context.
▢ APL components should enhance the voice experience – they shouldn’t detract or complicate the customer experience for their own sake
▢ Visuals should be contextually relevant to the conversation; Don't surface imagery that conflicts with what the skill might be telling the customer
▢ Use images that would enhance the understanding and experience of the content
▢ Avoid using generic imagery that doesn’t add value to the customer’s experience
▢ Don't embed text in imagery, when possible (excluding logos)
▢ Use high-quality images that look crisp on a range of device sizes
▢ Scale images in a way that won't cause letterboxing on some devices
For more information about designing a great multimodal experience for your skill, see Multimodal design: Introduction.