Multimodal Display Guide


For devices with screens supporting multimodal experiences, Alexa Voice Service (AVS) gives your customers additional ways to interact with Alexa, which for some use cases might provide a richer, more delightful interaction. There are additional considerations for developing your product for multimodal interactions. This page details design guidance and best practices for displaying Alexa multimodal responses.

Choosing an implementation

You can choose to implement support for the Alexa Presentation Language (APL) to receive UI markup and metadata, similar to HTML, for rendering Alexa visual responses.

See the Alexa Multi-Modal API Overview page for implementation details.

If you do not use APL, you must create your own Display Cards to render Alexa's visual responses. Display Card-specific design guidelines are currently available for the following device categories:

If your device does not fall exactly into one of these categories, select the guidelines that most closely match your product's attributes. Note: This may require combining elements from different categories of guidance.

Device Attributes

  Tablet TV Low Resolution
Aspect ratio Variable 16:9 Variable
Input Touch 5-way remote None
Context of use In home or on the go In home In home or on the go
Avg. viewing distance 2 ft. 6-8 ft. 2-4 ft.
Visual complexity High Medium Low
Attention system On screen On screen On device

For devices with variable resolutions, we provide layouts in percentages to adapt to your device's screen size.

For technical information on how to enable Display Cards for your device, see Display Cards Overview.

Window Types

Alexa response content can appear in two types of windows: Overlay and Standard. Overlay windows occupy a specified area of the screen and are intended to share the screen with other experiences. Standard windows are used for fullscreen experiences. You should use an Overlay or Standard window appropriate to the experience you are delivering to the customer.

Your device must provide a Standard, and optionally an Overlay, window (viewport) appropriate to your device's defined screen parameters. Windows should:

  • Be consistent (for example use either lower-third or right-hand Overlay windows, not both)
  • Properly display all provided content
  • Appear at a size and aspect ratio that allows all content to be displayed legibly.

Important: Your product must not alter the response content provided, or obscure Alexa attention states for Listening and Thinking.

Voice Chrome

The customer invokes Alexa with the wake word “Alexa” or a corresponding physical or GUI button. This activates the voice chrome to show Alexa is listening. Voice chrome is a visual indicator of the Alexa attention states, such as Listening, Thinking, and Speaking. Most devices should use on-screen voice chrome, although you may use on-device LEDs to display the states instead.

Details of how and when to display voice chrome can be found in the Interactions section below, and in the Interruption Scenarios section on this page. The colors and animations of voice chrome should follow the patterns specified in our Attention System documentation.

Interactions

Customer interactions with Alexa follow a series of steps, each of which has a corresponding screen state.

General interactions

Step 1: Invocation

The customer invokes Alexa with the wake word “Alexa” or a physical or GUI button. This activates the voice chrome to show Alexa is listening.

Step 2: Listening and thinking

Once Alexa is invoked and voice chrome is displayed, the customer can control Alexa with their voice. If Alexa recognizes an utterance, it processes the intent and returns a response. If Alexa doesn’t hear anything from the customer within 8 seconds, voice chrome dismisses.

Step 3: Response and dismissal

Alexa responds verbally, and when applicable, with information about how to display the response on screen. The visual display for the response should stay up while Alexa is speaking and dismiss automatically when finished. The user can also clear the response before Alexa is done speaking by using a button. If the user interrupts Alexa with a new request, Alexa should stop speaking, open the voice chrome, and listen to the new request. You may also choose to dim the window (viewport) when Alexa is listening.

Now Playing interactions

Some content, such as streaming music, allows the customer to have more control over the playback. For example, they can play, pause, rewind, and so on. Alexa allows customers to resume this Now Playing content after being interrupted by another interaction. Your device should maintain the window for Now Playing content, and allow customers to resume playback, until explicitly dismissed.

Step 1: Invocation

The customer invokes Alexa with the wake word “Alexa” or a physical or GUI button. This activates the voice chrome to show Alexa is listening.

Step 2: Listening and thinking

Once Alexa is invoked and voice chrome is displayed, the customer can control Alexa with their voice. If Alexa recognizes an utterance, it processes the intent and returns a response. If Alexa doesn’t hear anything from the customer within 8 seconds, voice chrome dismisses.

Step 3: Response and dismissal

Alexa responds verbally. The device displays a window with the content metadata and controls, and plays the requested media. The customer can control the media via voice commands or buttons. The window content updates with each new song.

If the customer:

  • Initiates another action with a remote control, the music must pause or attenuate, and the window remains though it may optionally dim.

  • Pauses or stops the music via voice or button, then the audio stops and the window remains. The window should auto-dismiss after one minute of inactivity.

  • Activates Alexa again, the music pauses or attenuates, and voice chrome reappears awaiting the next command.

Controls Per Music Service Provider

Alexa can play music from multiple Music Service Providers. Controls vary per provider offering. For example, a live radio station might not have forward and back controls. The correct controls for each offering will be specified in the PlayerInfo directive.

Best practices

The customer should retain control over Alexa responses and their corresponding display windows. Interactions should follow these best practices:

  • The customer should be able to clear a visual response before Alexa is done speaking. If the customer interrupts Alexa with a new request, Alexa should stop speaking, optionally dim the window (or viewport), open the Listening voice chrome, and listen to the new request.
  • Likewise, if a customer initiates another action not related to Alexa, for example using a remote control, Alexa should stop speaking.
  • If a customer pauses or stops the music via voice or button, then the audio stops and the window remains. The window should auto-dismiss after one minute of inactivity.
  • If the customer activates Alexa, any currently streaming music should pause or attenuate and voice chrome appears awaiting the next command.

Interruption scenarios

General window to general window

Alexa Interruption Scenarios example for TV: General to General
Click to enlarge

Step 1: Utterance 1

At the customer's first utterance, voice chrome overlays the existing window.

Step 2: TTS 1 + GUI 1

Alexa responds.

Step 3: Utterance 2 (Interruption)

When the customer interrupts Alexa (via wake word or button) with a second utterance, Alexa stops speaking and voice chrome overlays the existing window.

Step 4: TTS 2 + GUI 2

Alexa responds to the second utterance via voice and provides new window (or viewport) content. This window dismisses according to regular dismissal rules.

Now Playing window to general window

Alexa Interruption Scenarios example for TV: NowPlaying to General
Click to enlarge

Step 1: Utterance 1

At the customer's first utterance, voice chrome overlays the existing window.

Step 2: TTS 1 + GUI 1

Alexa responds. Music plays.

Step 3: Utterance 2 (Interruption)

When the customer interrupts the music (via wake word or button) with a second utterance, the music pauses or attenuates and voice chrome overlays the Now Playing window.

Step 4: TTS 2 + GUI 2

If the utterance is understood and a visual response is required, Alexa responds to the utterance via voice and a new window. Once the TTS completes, the music returns to the regular volume. The window dismisses according to regular dismissal rules, except that when dismissed, the customer returns to the Now Playing window.

If the song changes during this time, when returning the Now Playing GUI should reflect the new song.

General window to Now Playing window

Alexa Interruption Scenarios example for TV: General to NowPlaying
Click to enlarge

Step 1: Utterance 1

At the customer's first utterance, voice chrome overlays the existing window (or viewport).

Step 2: TTS 2 + GUI 1

Alexa responds.

Step 3: Utterance 2 (Interruption)

When the customer interrupts Alexa (via wake word or button) with a second utterance, Alexa stops speaking and the voice chrome overlays the existing window.

Step 4: TTS + GUI 2

Alexa responds to the second utterance with a Now Playing window. Once the TTS completes, music begins playing. This window dismisses according to regular dismissal rules.

Now Playing window to Error

Alexa Interruption Scenarios example for TV: NowPlaying to Error
Click to enlarge

Step 1: Utterance 1

At the customer's first utterance, voice chrome overlays the existing window (or viewport).

Step 2: TTS 2 + GUI 1

Alexa responds. Music plays.

Step 3: Utterance 2 (Interruption)

When the customer interrupts the music (via wake word or button) with a second utterance, the music pauses or attenuates and voice chrome overlays the Now Playing window.

Step 4: GUI 1

If no customer utterance is understood, the voice chrome ends, returning to the Now Playing window, and the music resumes at its original volume.

Please also see Interrupts for more information about interrupt types and behaviors.

Transitions

The initial appearance, transitions, and dismissal of windows should be smooth, and follow these best practices:

  • A window (or viewport) must appear as soon as Alexa begins responding or media begins playing, and the window’s contents must match the Alexa response or media.
  • The window should stay up while Alexa is speaking and dismiss automatically when finished.
  • When animated, transitions should employ easing to create a smooth feel. Animated partial-screen windows should enter from the right, and full-screen windows should fade in.
  • Transitions between windows, as well as the transition into displaying and dismissing windows, should be quick (< 1 sec).
Alexa Display Cards for TV: TV Transition Example
Click to enlarge

Was this page helpful?

Last updated: Nov 27, 2023