How to Design Visual Components for Voice-First Alexa Skills

Jaime Radwan Dec 12, 2018

When we communicate with others, we use a variety of visual cues with our body language to give subtle emphasis to what we're saying. Just like we use body language to convey expression, Alexa can provide rich and engaging experiences to customers by adding visual and touch interactions to responses, in addition to the voice experience.

Imagine I was talking to you and your eyes were closed. You would still understand the message I am conveying and would be able to follow along and interact with me without any problems. But now imagine your eyes are open. You can see me as I talk to you, my facial expressions, and my hand movements. These visuals create a richer conversational experience and help you become more engaged in what I'm communicating. That's how multimodal skills can enhance the voice experience.

With the recent release of the all-new Echo Show and the introduction of Alexa on Fire TV Cube, there are now tens of millions of Alexa-enabled devices with screens available to customers. Using the Alexa Presentation Language (APL), developers can easily add visuals to skills and build engaging multimodal voice experiences in a responsive way, while tailoring the skill to each device to enhance the customer experience.

Here are a few things to keep in mind as you start designing visual components to complement your voice-first Alexa skill.

Start with a Storyboard and Be Mindful with Your Visuals

If you've already designed and built your Alexa skill, you already have a script and voice flow. Storyboarding is a great way to quickly sketch out how to pair visuals to your text-to-speech output to enhance your experience. You can sketch, draw it on a whiteboard, or even use a graphics program. Use anything that allows you to quickly put your thoughts down on paper and visualize how what you show will pair with what is spoken.

Think about your visuals carefully. Remember the visuals, or the graphical user interface, you design for your skill should augment your voice experience, adding relevant content and context for the customer. What you display on the screen should always be in harmony with what Alexa is saying and shouldn't distract from the overall voice experience.

Remember, the visual display on Alexa-enabled devices should be used to enhance the experience, but it shouldn't be required for the customer to proceed through your skill. Customers are likely to multitask and will often alternate between looking at their device and just listening. Therefore, it is important to make sure your visual designs supplement the overall experience, rather than replace the voice interactions.

Consider Where and How a Customer Might Be Using Your Skill

Customers may engage with their device at different distances, such as casually glancing at the device from across a room (5-7 ft distance) or sitting next to the device (1-3 ft distance) to be able to interact with touch features. When designing the visual responses in your skill, determine the level of interaction required from the customer, such as touch or interactive elements. Always keep in mind that Alexa is a voice-first experience with complementary visuals that are delightful for the customer to use.

Also, due to the communal nature of Alexa devices, too much interaction could seem demanding to the customer and they may stop using the skill if they need to stay near a device all the time. On the other hand, too little interaction will not keep a customer engaged with the skill for long. It's all about finding that perfect balance for your skill's content and your customer.

Design from the Smallest Form Factor and Work Your Way Up

Devices come in all shapes and sizes, ranging from the small round Echo Spot, up to a 50-inch television with Fire TV, and everything in between. Starting with the smallest device allows you to perfect that core experience, giving the minimum visual information a customer needs to continue to move through a voice interaction. As screen size increases, you can add more contextual information (like text and additional images), but be careful to find a balance between what's needed at that moment and what is extraneous information. In other words, don't add content just to add content or fill up the screen.

Use Text and Images in Meaningful Ways

By pairing text and image components together in APL, you can create visual layouts that add context or content that words alone could not express. Change the visual response with each interaction the customer has with your skill to acknowledge that the customer made a choice and Alexa responded. But be sure to use visual responses consistently for similar functionality or results to add predictability and reduce the learning curve for your customers. For example, if you have a recipe skill, the general layout for all recipes should look the same each time a customer requests one, changing only the data that is being displayed. Once you have your basic layouts and visual flows designed, work with adjusting font sizes and weights to add visual hierarchy to your content. Or include TouchWrappers as another way for customers to interact with richer responses.

Tailor Your Visual Output for Different Device Form Factors

With APL and viewport characteristics, you can adapt your visual experience to fit each device, delivering a skill that feels tailored to that device for the customer. For example, a horizontal list of multiple items may be appropriate for an Echo Show or Fire TV. But given the small screen size of an Echo Spot, that same list may need to be reformatted to show only one item at a time. In addition to different layouts for each device, you can also use the viewport characteristics to send different resolution images to different sized devices. This helps cut down on latency for the customer, and gives developers the ability to send higher-quality images to larger devices, like a television screen.

Have Fun

The opportunity to pair visuals with the voice responses in your skill gives you new ways to surprise and delight your customers. Have fun with this–your customers will appreciate it. And remember, while visuals can enhance your voice experience on Alexa-enabled devices with screens, customers will still be able to enjoy your voice-only skill on their screen-based devices.

Enter the Alexa Skills Challenge: Multimodal

In addition to building a visually rich Alexa skill with APL, you can enter the Alexa Skills Challenge: Multimodal with Devpost and compete for cash prizes and Amazon devices. We invite you to participate and build voice-first multimodal experiences that customers can enjoy across tens of millions of Alexa-enabled devices with screens. Learn more, start building APL skills, and enter the challenge by January 22.

How to Design Visual Components for Voice-First Alexa Skills

Start with a Storyboard and Be Mindful with Your Visuals

Consider Where and How a Customer Might Be Using Your Skill

Design from the Smallest Form Factor and Work Your Way Up

Use Text and Images in Meaningful Ways

Tailor Your Visual Output for Different Device Form Factors

Have Fun

Enter the Alexa Skills Challenge: Multimodal

More Resources to Get Started with APL

Alexa Skills Kit

Resources

Alexa Voice Service

AVS Resources

Connected Devices

Agreements

Blogs

Support