With the Echo family of devices now including devices with screens like Echo Show and Echo Spot, Alexa skill builders have to consider their graphical user interface (GUI) in addition to their voice user interface (VUI) during the voice design process. Here are some tips for designing multimodal, voice-first experiences that prove engaging across all Alexa-enabled devices.
Voice needs to be the primary interaction method with Alexa, even when designing for devices with screens. Consider the display as a way to enhance your skill. Design your voice interaction first, then think about how you can enhance the conversation with visuals.
Be sure keep your VUI consistent across all devices to avoid unnecessary development work. Your customers rely on your skill to deliver an unvarying voice experience. The interaction model for your skill on a voice-only device should be the same as on a multimodal device. Create an experience that avoids display-centric commands like “touch the screen” or “click here.”
It is good practice to account for what customers might say when interacting with a display. If they are looking at an Echo Spot screen, their interaction with the voice component may be different than that of a user looking away. For example, to return to a previous response in a skill, a user might say “Back” or “Up.” If so, what should the behavior be for the latter case, if any? Plan how you want the user to interact with voice in your skill, but also how they may interact with the visual components.
The templates for Echo Show are consistent with Echo Spot, which makes it easy to quickly design visual experiences that will work across devices.
There are some notable differences in how you should use the templates due to the different sizes and shapes of the devices. The same fundamental principles per template still apply:
Body Template 1
Use this template to present information in long blocks of text.
Body Template 2 and 3
Use this template for presenting information on a specific entity with a lot of detail. This screen typically follows selecting an item from a list or if a user’s request yields only one item. Note: Hints can be displayed on Echo Show, but not on Echo Spot.
Body Template 6
This template is used as an introductory, title, or header screen.
Body Template 7
Use this template to display a full-width foreground image.
List Templates
Your list templates can display multiple choices or items to a user. List items should be selectable via both voice and touch.
List Template 1 should be used for lists where images are not the primary content because the content will be relatively small on Echo Spot.
List template 2 should be used for lists where images are the primary content. Note that for Echo Spot, only one item will be visible at a time.
Finally, regardless of the templates you choose to use, remember that you are building for both Echo Show and Echo Spot. You cannot design for a specific device. The templates make multimodal development easier and faster. Design once, and the content will translate appropriately for the device.
When designing for multimodal devices, it is important that your content is easy to consume. Consider brevity, arrangement, and pacing when you are writing your dialogue and designing your visuals.
There are some important technical design principles to consider with your visual components for both Echo Show and Echo Spot:
We’ve updated the Amazon Alexa Voice Design Guide with additional design practices and guidelines to help you deliver with the new capabilities using Echo Show and new Echo Spot visual templates. Visit the guide to get started.
Every month, developers can earn money for eligible skills that drive some of the highest customer engagement. Developers can increase their level of skill engagement and potentially earn more by improving their skill, building more skills, and making their skills available in in the US, UK and Germany. Learn more about our rewards program and start building today.
Special thanks to Jaime Radwan for co-authoring this post!