Functional Requirements for AVS Products
Customers who purchase a product with Amazon Alexa expect a familiar experience. This document provides functional and design requirements and recommendations to help you meet customer expectations and avoid issues as you develop, prototype, and prepare your product with Alexa enabled for commercial release.
- Common Terms
- Living Document
- 1. Core Requirements and Recommendations
- 2. Voice-Initiated Products
- 3. Touch-Initiated Products
- 4. Media Services
- 5. Alerts
- 6. Notifications
- 7. Display Cards
- 8. Setup and Authentication
- 9. Bluetooth
The following terms are used consistently throughout this document to signify requirements and recommendations:
- SHALL: Items preceded by SHALL are required for all commercial product releases.
- SHOULD: Items preceded by SHOULD are recommended for all commercial product releases and significantly improve the Alexa customer experience.
The following terms are used consistently throughout this document to describe Alexa features and concepts:
- Voice-initiated: Products activated by customer speech for a hands-free experience. They can also be activated by a customer’s touch.
- Touch-initiated: Products activated by a customer’s physical action on the product. These products do not support voice-initiated interactions.
- Tap-to-talk: Touch-initiated products activated by the customer pushing and releasing a button before speaking.
- Hold-to-talk: Touch-initiated products activated by the customer holding down a button while speaking.
- Attention states: The parts of an Alexa conversation flow, including Listening and Thinking.
- Physical control: A hardware or GUI control that is used to wake Alexa or adjust the product’s settings.
- Visual cues: Visual cues are LEDs or GUI elements that provide feedback to the customer on Alexa’s state.
- Audio cues: Audio cues are sounds that provide feedback to the customer on Alexa’s transitions between attention states.
- Multi-turn: A multi-turn interaction refers to situations where Alexa requests additional spoken information from the customer to complete an interaction. Multi-turn situations are initiated when your product receives an ExpectSpeech Directive from AVS.
We look to the AVS community to innovate and create new Alexa-enabled experiences. As we learn from you and we introduce new features and functionality, this document will be improved and updated.
The current guidelines were published on June 26, 2017.
1. Core Requirements and Recommendations
The following requirements and recommendations are applicable to all products with Alexa enabled.
1.1. Your product SHALL be capable of audio input (i.e. capturing customer speech via one or more microphones) and streaming captured speech to the cloud per the specs provided in the SpeechRecognizer Interface.
1.2. Your product SHALL be capable of audio output (e.g. speaker, headphones, line out, or Bluetooth).
1.2.1 Your product SHOULD provide physical controls for adjusting volume.
1.3. Your product SHALL provide a physical control to manually initiate an interaction with Alexa.
1.3.1. Your product SHALL enable customers to interrupt an Alexa-initiated output (e.g. media playback or Alexa voice response) using voice or a physical control. The physical control that satisfies 1.3 SHALL also be used to interrupt an Alexa-initiated output.
1.3.2. If you choose to implement a GUI control to satisfy 1.3, it SHALL always be accessible from your user interface and cannot be hidden at any time.
1.3.4. The physical control SHOULD only have the single purpose of initiating Alexa interactions.
1.4. Your product SHALL clearly convey core Alexa attention states to the customer. The core attention states are Listening, Thinking and Speaking. See the AVS UX Design Guidelines for further information about the Alexa attention states.
1.4.1. Your product SHOULD use prominent visual cues to satisfy 1.4. If your product uses visual cues, your product SHOULD indicate all attention states as defined in the AVS UX Design Guidelines.
1.4.2. If your product does not use prominent visual cues to satisfy 1.4, your product SHALL use prominent audio cues to indicate when the Alexa Listening state starts and when it stops. You may choose to allow customers to disable the sounds.
1.4.3. If your product uses both visual and audio cues to satisfy 1.4, the visual and audio cues SHALL be synchronized to indicate when the Alexa Listening state starts and when it stops.
1.5. Your product SHALL support multi-turn interactions with Alexa.
1.5.1. Your product SHALL use the same methods for conveying the start and end of the Listening attention state for all multi-turn interactions as for the initial interaction.
1.6. Your product SHALL support silencing alerts, adjusting volume, and stopping media when internet connectivity is unavailable.
2. Voice-Initiated Products
The following guidelines are specific to voice-initiated products and extend the Core Requirements and Recommendations for those products.
2.1. Your product SHALL only use approved Amazon Alexa wake words, such as “Alexa”.
2.2. Your product SHALL automatically activate its microphones without waiting for the wake word in multi-turn interactions. See also 1.5.
2.3. Your product SHALL provide an always-available control to disable its microphones, and SHALL use visual cues clearly and continually to convey to the customer that the Alexa Microphone Off attention state is active.
2.3.1. You SHALL provide audio cues to indicate when the Microphone Off attention state is activated and deactivated.
2.4. Your product SHALL support enabling/disabling microphones when internet connectivity is unavailable. See also 1.6.
2.5. Your product SHALL support cloud-based wake word verification.
3. Touch-Initiated Products
The following guidelines are specific to touch-initiated products and extend the Core Requirements and Recommendations for those products. Unless noted, the guidelines apply to both tap-to-talk and hold-to-talk products.
3.1. Your product SHALL NOT require the use of a wake word as part of the customer utterance.
3.2. Your product’s microphones SHALL be disabled until customer initiates an Alexa interaction.
3.3. Your product SHALL automatically activate its microphones without waiting for a touch interaction in multi-turn situations. The sole exception is hold-to-talk devices where the microphones are only activated when the customer holds down a physical control. See also 1.5.
3.4. Your product SHALL use audio cues to indicate the start and end of the Listening attention state.
4. Media Services
The following guidelines apply to all products that support media services such as Amazon Music, TuneIn, iHeartRadio, Audible and Flash Briefing. For additional information on handling competing audio outputs, please review the Alexa Voice Service Interaction Model. All of the below guidelines apply to both voice-initiated and touch-initiated products.
4.1. Your product SHALL pause or attenuate (lower speaker volume) audio output when a customer initiates an Alexa interaction during media playback.
4.1.1. Your product SHALL pause Audible content playback when interrupted by a customer.
4.1.2. If your product pauses media because of a customer interruption, it SHOULD resume playback automatically.
4.2. Your product SHALL allow customers to resume paused media using a voice request or a physical control.
4.3. Your product SHOULD sufficiently buffer media so that short interruptions in internet connectivity do not disrupt playback.
The following guidelines apply to delivering and controlling alerts, such as timers or alarms, and extends 1.6. of the Core Requirements and Recommendations.
5.1. Your product SHALL deliver previously scheduled alerts to customers when internet connectivity is unavailable.
5.1.1 If alerts are delivered while internet connectivity is unavailable, your product SHALL send the appropriate events for the delivered alerts to Alexa when an internet connection is reestablished. For additional information, see Alerts Overview.
5.2. Your product SHOULD support the use of physical controls to stop sounding alerts.
5.3. Your product SHOULD support independent volume control for alerts. When a customer adjusts a product’s volume for media output, it SHOULD NOT affect the volume for alerts.
The following guidelines apply to delivering and controlling notifications.
6.1. Your product SHALL download and use the audio asset specified in the notification directive’s payload.
6.1.1. If the download fails or times out, your product SHALL use the Notification sound provided by Amazon.
6.2. Your product SHOULD implement the visual Notification indicator patterns as defined in the Attention System guidance.
7. Display Cards
The following guidelines are specific to screen-based products, and apply to both voice-initiated and touch-initiated products. See the AVS UX Design Guidelines and the TemplateRuntime Interface for more information about Display Cards.
7.1. Your product SHALL render all visual metadata to the specification for its screen size and SHALL NOT add, remove, or alter the metadata in any way when presented to the user.
7.1.1. If media is enabled your product SHALL display all playback controls provided.
7.2. Your product SHOULD conform to the provided Display Card Design Guidelines.
8. Setup and Authentication
The Alexa setup process communicates the value of Alexa and helps customers connect your product to their Amazon account. Ideally, the Alexa setup flow should be incorporated into the setup or first run experience on your product. See the AVS UX Design Guidelines for more branding and style information for the Alexa setup and authentication experience.
8.1. Your product SHALL use Login With Amazon (LWA) to authenticate the customer. See the Authorization section in the Alexa Voice Service API Overview for additional information.
8.2. Your product SHALL have an Alexa setup/sign in experience that follows the Setup and Authentication guidelines in the AVS UX Design Guidelines.
8.2.1. Your product SHALL have a Splash Screen before the customer enters the Login With Amazon authentication flow. Your Splash Screen SHALL include the required elements as defined in the AVS UX Setup and Authentication Guidelines.
8.2.2. Your product SHALL have a Things to Try screen after the customer exits the Login With Amazon authentication flow. Your Things To Try screen SHALL include the required elements as defined in the AVS UX Setup and Authentication Guidelines.
8.3. Your product SHALL support logout by the customer.
8.4. You SHOULD include information on Alexa setup and use in your product’s instructional materials.
These requirements are specific to products that use the Bluetooth interface:
9.1. Your product SHOULD support the Advanced Audio Distribution Profile (A2DP) Bluetooth profile.
If your product supports A2DP, it SHALL support receiving digital audio streams from an A2DP SOURCE device.
If your product supports A2DP, it SHALL support the Audio/Video Remote Control Profile (AVRCP) Bluetooth profile.
9.2. If your product uses the Bluetooth interface, it SHALL use the Bluetooth sounds provided by Amazon.