Functional Requirements for AVS Products
Customers who purchase a product with Amazon Alexa expect a familiar experience. This document provides functional and design requirements and recommendations to help you meet customer expectations and avoid issues as you develop, prototype, and prepare your product with AVS for commercial release.
The guidelines on this page apply to device makers planning to implement the general AVS APIs and SDKs. If your AVS implementation is more specialized, please see the following specific documentation for your requirements:
- Alexa for Auto – If you are implementing AVS in an automotive accessory, see the Alexa Automotive Documentation.
- Alexa for Business – If you are building with Alexa for Business, see Build with Alexa for Business and the Alexa for Business Requirements.
The following terms are used consistently throughout this document to signify requirements and recommendations:
- SHALL: Items preceded by SHALL are required for all commercial product releases.
- SHOULD: Items preceded by SHOULD are recommended for all commercial product releases and significantly improve the Alexa customer experience.
The following terms are used consistently throughout this document to describe Alexa features and concepts:
- Voice-initiated: Products activated by customer speech for a hands-free experience. They can also be activated by a customer's touch.
- Touch-initiated: Products activated by a customer’s physical action on the product. These products do not support voice-initiated interactions.
- Tap-to-talk: Touch-initiated products activated by the customer pushing and releasing a button before speaking.
- Hold-to-talk: Touch-initiated products activated by the customer holding down a button while speaking.
- Attention states: The parts of an Alexa conversation flow, including Listening and Thinking.
- Action button: A button used to wake or interrupt Alexa. It can be a hardware or GUI button on a device, a button on a product's remote control, or a GUI button in a companion app.
- Device control: A control used to adjust product settings or interact with media. The control can be a hardware or GUI button on a device, a button on a product's remote control, or a GUI button in a companion app.
- Visual cues: Visual cues are LEDs or GUI elements that provide feedback to the customer on Alexa's state.
- Audio cues: Audio cues are sounds that provide feedback to the customer on Alexa's transitions between attention states.
- Multi-turn: A multi-turn interaction refers to situations where Alexa requests additional spoken information from the customer to complete an interaction. Multi-turn situations are initiated when your product receives an ExpectSpeech Directive from AVS.
We look to the AVS community to innovate and create new Alexa Built-in experiences. As we learn from you and we introduce new features and functionality, this document will be improved and updated.
These current guidelines were published on April 16, 2019.
1. Core Requirements and Recommendations
The following requirements and recommendations are applicable to all products with Alexa Built-in.
1.1. Your product SHALL be capable of audio input (i.e. capturing customer speech via one or more microphones) and streaming captured speech to the cloud per the specs provided in the SpeechRecognizer Interface.
1.2. Your product SHALL be capable of audio output (e.g. speaker, headphones, line out, or Bluetooth).
1.2.1. Your product SHALL provide device controls for adjusting volume.
1.3. Your product SHALL provide an Action button. See UX for Product Buttons for more about the Action button.
1.3.1. The Action button SHALL enable customers to initiate an Alexa interaction.
1.3.2. The Action button SHALL enable customers to interrupt an Alexa output (e.g. media playback, Alexa voice responses, or Alerts). See UX Interrupt Guidance for more.
1.3.3. The Action button SHALL be easily accessible to your customer.
1.3.4. The Action button SHOULD have the single purpose of initiating Alexa interactions.
1.4. Your product SHALL clearly convey core Alexa attention states to the customer. The core attention states are Listening, Thinking and Speaking. See the AVS UX Design Overview for further information about the Alexa attention states.
1.4.1. Your product SHOULD use prominent visual cues to satisfy Requirement 1.4. If your product uses visual cues, your product SHOULD indicate all attention states as defined in the AVS UX Design Overview.
1.4.2. If your product does not use prominent visual cues to satisfy Requirement 1.4, your product SHALL use prominent audio cues to indicate when the Alexa Listening state starts and when it stops. You may choose to allow customers to disable the sounds.
1.4.3. If your product uses both visual and audio cues to satisfy Requirement 1.4, the visual and audio cues SHALL be synchronized to indicate when the Alexa Listening state starts and when it stops.
1.5.1. Your product SHALL use the same methods for conveying the start and end of the Listening attention state for all multi-turn interactions as for the initial interaction.
2. Voice-Initiated Products
The following guidelines are specific to voice-initiated products and extend the Core Requirements and Recommendations for those products.
2.1. Your product SHALL use only approved Amazon Alexa wake words, such as "Alexa".
2.1.1. Your product SHALL support cloud-based wake word verification.
2.1.2. Your product SHALL automatically activate its microphones without waiting for the wake word in multi-turn interactions. See also Requirement 1.5.
2.2. Your product SHALL enable customers to use voice to interrupt Alexa output (e.g. media playback, Alexa voice responses, and Alerts). See also Requirement 1.3.2.
2.3. Your product SHALL provide an always-available control to turn off the Alexa wake word or disable its microphones, putting your product into the Microphone Off state. For more, see UX Attention System.
2.3.1. You SHALL provide audio cues to indicate when the Microphone Off attention state is activated and deactivated.
2.3.2. Your product SHALL use visual cues to convey clearly and continually to the customer that the Alexa Microphone Off attention state is active.
2.4. Your product SHALL support enabling/disabling microphones when internet connectivity is unavailable. See also Requirement 1.6.
2.5. The microphones used for Alexa interactions on your product SHALL have +/- 1 dB sensitivity matching.
3. Touch-Initiated Products
The following guidelines are specific to touch-initiated products and extend the Core Requirements and Recommendations for those products. Unless noted, the guidelines apply to both tap-to-talk and hold-to-talk products.
3.1. Your product SHALL NOT require the use of a wake word as part of the customer utterance.
3.2. Your product's microphones SHALL be disabled until customer initiates an Alexa interaction.
3.3. Your product SHALL automatically activate its microphones without waiting for a touch interaction in multi-turn situations. The sole exception is hold-to-talk devices where the microphones are only activated when the customer holds down a device control. See also Requirement 1.5.
3.4. Your product SHALL enable customers to interrupt an Alexa output (e.g. media playback, Alexa voice responses, and Alerts) using the Action button. See also Requirement 1.3.2.
3.5. Your product SHALL use audio cues to indicate the start and end of the Listening attention state.
4. Media Services
The following guidelines apply to all products that support media services such as Amazon Music, TuneIn, iHeartRadio, Audible and Flash Briefing. For additional information on handling competing audio outputs, please review the Alexa Voice Service Interaction Model. All of the below guidelines apply to both voice-initiated and touch-initiated products.
4.1. Your product SHALL pause or attenuate (lower speaker volume) audio output when a customer initiates an Alexa interaction during media playback.
4.1.1. Your product SHALL pause Audible content playback when interrupted by a customer.
4.1.2. If your product pauses media because of a customer interruption, it SHOULD resume playback automatically.
4.2. Your product SHALL allow customers to resume paused media using a voice request or a device control.
4.3. Your product SHOULD sufficiently buffer media so that short interruptions in internet connectivity do not disrupt playback.
The following guidelines apply to delivering and controlling alerts, such as timers or alarms, and extends Requirement 1.6 of the Core Requirements and Recommendations.
5.1. Your product SHALL deliver previously scheduled alerts to customers when internet connectivity is unavailable.
5.1.1. If alerts are delivered while internet connectivity is unavailable, your product SHALL send the appropriate events for the delivered alerts to Alexa when an internet connection is reestablished. For additional information, see Alerts Overview.
5.2. Your product SHALL support the use of the Action button to stop sounding alerts. See Requirement 1.3.2.
5.3. Your product SHALL play alerts that contain voice responses, such as Reminders, at the same volume as other Alexa voice responses.
5.4. Your product SHOULD support independent volume control for alerts that do not contain voice responses, such as Timers. When a customer adjusts a product’s volume for Alexa voice responses, it SHOULD NOT affect the volume for these alerts.
The following guidelines apply to delivering and controlling notifications.
6.1. Your product SHALL download and use the audio asset specified in the notification directive's payload.
6.1.1. If the download fails or times out, your product SHALL use the Notification sound provided by Amazon.
6.2. Your product SHOULD implement the visual Notification indicator patterns as defined in the Attention System guidance.
7. Visual Displays
The following requirements are specific to screen-based devices, and apply to both voice-initiated and touch-initiated products. You may display Alexa visual responses using APL, Display Cards, or both. See the APL Tech Docs for more information about APL and device viewports, including the Alexa.Presentation.APL and VisualCharacteristics interfaces. See the AVS UX Design Guidelines, TemplateRuntime Interface, and PlayBackController Interface for more information about Display Cards.
7.1 Your product SHALL display visual Alexa responses if it uses a pixel-based screen, for example Smart TVs, Set-Top Boxes, AVRs and MVPDs, and so on.
7.2. If you implement support for APL visual responses in your product, you SHALL implement a viewport (viewhost window) that is appropriate to your product and allows all Alexa response content to render legibly.
7.2.1. Your product SHALL NOT add, remove, or alter the data supplied by APL directives.
7.3. If you implement Display Cards you SHALL follow all requirements regarding their implementation.
7.3.1. Your product SHALL render all visual metadata to the specification for its screen size and SHALL NOT add, remove, or alter the metadata in any way when presented to the user.
7.3.2. If media is enabled, your product SHALL display all playback controls provided.
7.3.3. Any Display Cards your product uses SHOULD conform to the Display Card Design Guidelines.
8. Setup and Authentication
The Alexa setup process communicates the value of Alexa and helps customers connect your product to their Amazon account. Ideally, the Alexa setup flow should be incorporated into the setup or first run experience on your product. See the AVS UX Design Overview for more branding and style information for the Alexa setup and authentication experience.
8.1. Your product SHALL use Login With Amazon (LWA) to authenticate the customer. See the Authorization section in the Alexa Voice Service API Overview for additional information.
8.2. Your product SHALL have an Alexa setup/sign in experience that follows the Setup and Authentication guidelines in the AVS UX Design Overview.
8.2.1. Your product SHALL have a Splash Screen before the customer enters the Login With Amazon (LWA) authentication flow. If your product does not use Code-Based Linking, your product SHALL use the AVS Hosted Splash Screen provided through the Login With Amazon authentication flow. Your Splash Screen SHALL include the required elements as defined in the AVS UX Setup and Authentication guidelines.
8.2.2. Your product SHALL have a Things to Try screen after the customer exits the Login With Amazon authentication flow. Your Things To Try screen SHALL include the required elements as defined in the AVS UX Setup and Authentication guidelines.
8.3. Your device SHOULD allow the customer to choose an Alexa language.
8.3.1. Your device SHOULD include language selection as part of product setup.
8.3.2. Your device SHOULD include an Alexa language selector in the companion app Settings.
8.3.3. If your device supports multilingual mode, it SHALL include a language selection screen which follows the format described in the locale combinations API.
8.4. Your product SHALL support logout by the customer.
8.5. You SHOULD include information on Alexa setup and use in your product's instructional materials.
These requirements are specific to products that use the Bluetooth interface:
9.1. Your product SHOULD support the Advanced Audio Distribution Profile (A2DP) Bluetooth profile.
If your product supports A2DP, it SHALL support receiving digital audio streams from an A2DP SOURCE device.
If your product supports A2DP, it SHALL support the Audio/Video Remote Control Profile (AVRCP) Bluetooth profile.
9.2. If your product uses the Bluetooth interface, it SHALL use the Bluetooth connect and disconnect sounds provided by Amazon.