Invoking Alexa

In the vehicle, customers can invoke Alexa by saying the wake word or pressing a button to begin speech dialogue. Alexa uses sound cues and visuals (voice chrome) to indicate listening state. Voice chrome also indicates when Alexa is thinking and speaking.

There are two primary ways to invoke Alexa in the vehicle. Both are required.

  1. Saying the wake word “Alexa”.
  2. Pressing the Push-to-talk (PTT) button or an on-screen Tap-to-Talk (TTT) button to directly invoke Alexa without the wake word.

Alexa wake word

(Required) Support wake word to invoke Alexa.

The wake word provides hands-free, voice-forward experiences with Alexa. Minimizing the need for drivers to view or touch the screen helps to reduce the visual (eyes off of the road) and manual (hands off the wheel) distractions in the car. Customers can turn off wake word in settings. Customers must first enable Alexa before they can begin speaking with her. See Menu and settings for details.

(Required) Enable invocation of Alexa only after the customer has completed Alexa setup.

To ensure customer privacy, don’t enable wake word or dialogue with Alexa until the customer has enabled Alexa in setup. See Setup for details.

Push-to-talk

PTT is another way customers can invoke Alexa. If the wake word is off, customers can still use PTT to speak to Alexa in their vehicle.

(Required) If the vehicle offers a PTT button, customers must be able to invoke Alexa via PTT without a wake word.

If a customer assigns Alexa as the default voice assistant, use a short press on the PTT button to invoke Alexa without wake word. Alexa should be invoked immediately (within 250 ms) after the PTT button has been pressed and released.

If Alexa is not set as the default assistant for PTT, it’s recommended to still allow the customer to say “Alexa” after having pressed the PTT as another way to speak with Alexa.

PTT button

Tap-to-Talk

Customers can press the Tap-to-talk (TTT) button to invoke Alexa immediately with one tap and without needing to say “Alexa”. The TTT button should behave similarly to the PTT button.

(Required) If PTT is not available, an Alexa TTT on-screen button is used to invoke Alexa. This must be accessible at all times.

(Required) TTT can only be used to invoke Alexa. TTT can’t be used for other voice assistants such as Google Assistant or Siri.

Examples of TTT images

(Required) The TTT on-screen button must be placed on the driver side and be accessible in the same screen location at all times with minimal exceptions, such as when the screen displays rear view when backing up.

See Visual Language for details on how to display TTT.

(Required) The TTT button must be the Alexa Talk Bubble button in the Design toolkit. Maintain consistent size and style to avoid confusion.

(Required) Allow Alexa to be invoked while mobile projection applications are running or other assistants are present, such as Android Auto or Apple CarPlay.

Example of projection mode while Alexa is available.

Interrupting Alexa

(Required) Allow customers to interrupt when Alexa is speaking (barge-in).

Customers must be able to interrupt Alexa with all available invocation methods. When interrupted, Alexa will stop speaking and start listening. For example, when Alexa is speaking about the weather, the customer can barge-in with wake word, PTT or TTT and say “will it rain tomorrow?”

(Required) Allow customers to cancel listening and speaking.

Customers can stop Alexa from listening by saying “cancel” and by pressing the PTT or TTT button during the listening state. When Alexa is speaking and the customer closes a display card, Alexa’s speech should be stopped. See the table below for more interruption behaviors.

(Required) Implement interruption behaviors as described in the table below:

This table shows how interruptions are to be implemented for interactions where Alexa is already listening or speaking.

Customer action Idle Listening Thinking Speaking
Wake word Start listening No change Barge-in Barge-in
PTT press Start listening Cancel listening Barge-in Barge-in
TTT press Start listening Cancel listening Barge-in Barge-in
Presses a cancel, back or close button - Cancel listening Cancel dialog Cancel dialog
Dismisses the display card - - Cancel dialog Cancel dialog
Touches the screen (e.g. to scroll text or launch an app) - Listening continues Thinking continues Speech continues
Example interruptions

The Alexa attention system

Alexa is a single personality that is coherent and familiar to customers across many devices. While the physical devices might be different, the attention system ensures Alexa behaves predictably and with familiarity. This consistency creates customer trust and strengthens the customer’s understanding of Alexa.

Alexa’s attention system is comprised of non-verbal audio and visual components that work together to communicate all of Alexa’s different states to the customer. Color, sound, and animation are critical for effectively communicating Alexa's state. Audio and visual cues must be synced so that Alexa’s state change indicators occur simultaneously as the customer wakes, speaks to, and listens to Alexa.

Sound Cues

Start of Request (wake) and End of Request (endpointing) sounds give customers confidence and clarity about when Alexa is listening without them needing to take their eyes off the road. All sounds mentioned here are provided in the Alexa Automotive Design Toolkit.

(Required) Play the Start of Request sound immediately after the wake word is detected.

Sound: med_ui_wakesound_hybrid, see Design toolkit.

This allows the customer to know when Alexa is listening without looking at the screen. This sound is required to play when visual cues display the Listening state.

(Required) Play the Start of Request sound immediately after a press of the PTT or TTT button.

Sound: med_ui_wakesound_hybrid, see Design toolkit.

This allows the customer to know the system is listening without looking at the screen. This sound is required to play when visual cues display the Listening state.

(Required) Play the End of Request sound at the end of speech input.

Sound: med_ui_endpointing, see Design toolkit.

This sound allows the customer to know your assistant has heard their request without looking at the screen. This sound is distinct from the Start of Request sounds, and is required to play when the visual cues exit the Listening state.

(Required) Allow customers to turn off the Start and End of Request sounds under the Settings menu.

See Menu and settings for details.

(Required) Use Alexa’s sounds only for Alexa features.

Don’t use Alexa's sound cues for any other interactions, including other speech systems or voice assistants.

Voice chrome

(Required) Display the Alexa voice chrome when the customer invokes Alexa.

Voice chrome is a visual indicator of Alexa’s attention system and is displayed whenever the customer interacts with Alexa by voice. Use linear voice chrome, as it works best with Alexa’s Display Cards and does not obscure other on-screen content.

Voice chrome should reflect that Alexa is seamlessly integrated into the vehicle’s IVI and is not limited to a single app. Place voice chrome along the bottom edge of the screen as an overlay that does not cover the entire display. This provides a less jarring experience when invoking Alexa, and makes for a more seamless integration with the vehicle.

Summary:

  1. Place linear voice chrome along an edge of the screen, preferably at the bottom.
  2. Don’t use a full-screen overlay or popup with voice chrome.
  3. Overlay any current IVI screens, such as Navigation.
Linear voice chrome
Linear voice chrome over navigation

(Required) Use only Alexa brand graphics to indicate that Alexa is listening.

Except for the physical PTT button on the steering wheel, don't use additional icons to invoke or represent Alexa. Use only Alexa icons and voice chrome to represent Alexa.

Do not use these icons with Alexa

Attention system states

Attention states address the personality of Alexa at a high level across all domains. The Core Alexa states are: Idle, Listening, Thinking and Speaking. For products with visual cues, it is required that these states are distinguishable from each other.

Idle

The Idle state can be considered Alexa’s default state. No visual voice chrome elements are displayed in this state, in contrast with all other states. This communicates Alexa is passively waiting for a request and not actively communicating.

Listening

The Listening state starts when Alexa has been invoked by wake word, PTT, or TTT and the microphone begins streaming the customer’s request to the Alexa Voice Service. There are three stages to the Listening state:

  • Start Listening - Alexa transitions from Idle to the Listening state and waits for a request from the customer.
  • Active Listening - When the customer begins speaking, Alexa transitions into an Active Listening state however, if Alexa doesn’t hear anything from the customer, returns to the Idle state.
  • End Listening - When the customer's end of speech is identified, Alexa transitions out of Listening state.

(Required) In multi-turn interactions, the Start of Request sound must play each time the mic opens during the interaction. The End of Request sound must play each time the mic closes.

Thinking

When a customer completes a request, Alexa enters the Thinking state. This state lets the customer know the microphone is no longer active and Alexa is processing their request.

Speaking

The Speaking state is displayed when Alexa is responding to a request with text-to-speech (TTS). This state is not displayed when Alexa is responding with long running mixable media such as music, books, and Flash Briefings.

(Required) Do not duplicate Alexa voice chrome or supplement with other attention state signifiers.

The Alexa voice chrome is a branded and established design pattern to convey attention states. Do not surface multiple instances of voice chromes or use other signifiers such as icons and texts for different states.

Example interaction

PTT button

Examples

State Description Voice chrome Colors Blue #214CFB Cyan #05FEFE Red #FC361D Icons Sound Cues
Idle Alexa is available through invocation methods. No visuals are displayed on-screen. No visual indicators.      
Listening Start Voice chrome appears and a sound cue plays once when the customer wakes Alexa including PTT. The microphone becomes active. Start Listening Blue, Cyan   Sound: med_ui_wakesound_hybrid, see Design Toolkit
Listening Active Voice chrome persists while Alexa is capturing speech from the customer. When end of speech is detected, a sound cue plays and voice chrome transitions to Thinking state. Listening Active Blue, Cyan   At the end of listening: End of Request sound
Thinking Voice chrome plays in a loop while Alexa is processing, or 'thinking about' what the customer has said. Displaying this state ensures that the customer understands that the interaction has not ended. Thinking Blue, Cyan    
Speaking Voice chrome plays in a loop while Alexa is responding to the customer via TTS. Speaking Blue, Cyan    

Note: Alexa voice chrome is available as part of the Alexa Auto SDK.

Privacy

(Required) Vehicles with restricted modes must disable Alexa in those modes (valet mode or for guest drivers).

Customers expect Alexa to protect their privacy. Disable invocation and access to Alexa when restricted modes are activated in the vehicle’s system, such as valet mode.

Example: Driver pulls up to a hotel and quickly turns on valet mode. The valet gets into the vehicle and is unable to use Alexa because valet mode is enabled. This ensures the valet can not access the customers private information using Alexa.

This requirement does not apply to vehicles that do not have restricted modes.

Mobile Projection

(Required) Customers should be able to use Alexa even when mobile projection is being projected.

  • Alexa voice chrome should always be fully visible even when mobile projection is displayed. It should be on the foreground if it overlaps with top or bottomm part of the projection, or be above or below the projection so it is fully visible to users.
  • GUI displayed on mobile projection should still be tappable even when Alexa voice chrome is showing and when Alexa is speaking.
  • Alexa Display Card should show while mobile projection is active and customers should be able to easily close by both voice and touch (See more on Display Cards section). Customers should be able to invoke, cancel and interrupt Alexa when mobile projection is active.
  • The product shall educate customers how to allow concurrent use of Alexa if mobile projection prohibits concurrent access to Alexa via graphic interface.

Was this page helpful?

Last updated: Nov 25, 2023