Why build multimodal Alexa skills?

Although you learned about the need to use APL, you haven’t been formally introduced to it yet. Here are the basics.

What is multimodal?

Multimodal interactions add other forms of communication, such as visual aids, to the voice experience that Alexa already provides.

While Alexa is always voice first, adding visuals as a secondary mode can greatly enrich your user experience for Alexa-enabled devices with screens. More than 100 million Alexa devices include support for Alexa visuals. The devices include those from Amazon—such as the Echo Spot, Echo Show, Fire TV, and Fire tablets—and devices from other manufacturers with Alexa Built-in.

Complementary visual aids

Multimodal devices provide an opportunity to add visual branding to your experience. You can add your own brand logo, color palette, and styling to create a unique visual experience for your users. For example, if you have a business skill, you might want to show a sales summary chart with your skill’s logo at the top and set your colors to represent metrics, such as gains and losses. High-quality visuals improve your skill's brand image.

img

Rich media experiences

While you always need a voice experience, some developers build skills primarily as a way to showcase visual media. Multimodal interfaces give you the ability to provide videos, images, and animations together with voice.

Alexa Presentation Language (APL)

The Alexa Presentation Language (APL), is designed to render visuals across the ever-increasing category of Alexa-enabled multimodal devices. With APL, you can add graphics, images, slideshows, video, and animations to create a coherent visual experience for your skill. In addition to screens, APL has variations to target non-screen multimodal devices, such as the Echo Dot with clock.

For more details about APL, see the following resources on the developer portal: