Functional Requirements for AVS Products


Customers who purchase a product with Amazon Alexa expect a familiar experience. This document provides functional and design requirements and recommendations to help you meet user expectations and avoid issues as you develop, prototype, and prepare your product with Alexa Voice Service (AVS) for commercial release.

The AVS functional and design guidelines apply to device makers planning to implement the general AVS APIs and SDKs. If your AVS implementation is more specialized, see the following specific documentation for your requirements:

Requirements subject to change

As Amazon introduces new Alexa features and functionality, these guidelines are periodically improved and updated.

Terminology

This document consistently uses the following terms to signify requirements and recommendations:

  • SHALL: Items preceded by SHALL are requirements for all commercial product releases.
  • SHOULD: Items preceded by SHOULD are recommendations for all commercial product releases and improve the Alexa user experience.

This document consistently uses the following terms to describe Alexa features and concepts:

  • Voice-initiated: Products activated by user speech for a hands-free experience or by user touch.
  • Touch-initiated: Products activated by a user physically touching a control on the product. These products don't support voice-initiated interactions.
  • Tap-to-talk: Touch-initiated products activated by the customer pushing and releasing a button before speaking.
  • Hold-to-talk: Touch-initiated products activated by the customer holding down a button when speaking.
  • Attention states: The parts of an Alexa conversation flow, including Listening and Thinking.
  • Action button: A button used to wake or interrupt Alexa. It can be a hardware or GUI button on a device, a button on a remote control for a device, or a GUI button in a companion app.
  • Device control: A control used to adjust product settings or interact with media. The control can be a hardware or GUI button on a device, a button on a remote control for a device, or a GUI button in a companion app.
  • Visual cues: Visual cues are LEDs or GUI elements that provide feedback to the user on the current Alexa state.
  • Audio cues: Audio cues are sounds that provide feedback to the user on transitions between Alexa attention states.
  • Multi-turn: A multi-turn interaction refers to situations where Alexa requests additional spoken information from the customer to complete an interaction. Multi-turn situations are initiated when your product receives an ExpectSpeech Directive from AVS.

1. Core requirements and recommendations

The following requirements and recommendations are applicable to all products with Alexa Built-in.

1.1. Your product SHALL be capable of audio input (i.e. capturing customer speech via one or more microphones) and streaming captured speech to the cloud per the specs provided in the SpeechRecognizer Interface. Customer privacy is a primary consideration for all aspects of speech capture. Customers must be able to physically control and visually discern whether the microphones are on or off. Customers must always be able to trust that the displayed state of the microphones is correct. For more details about privacy, see Privacy visual attention system.

1.1.1. The microphone ON/OFF control SHALL be hardware-based. Microphones SHALL be turned OFF by removing their power.

1.1.2. The product SHALL use a dedicated and persistent red light to indicate microphone OFF state (privacy ON).

1.2. Wearable devices, such as smart watches, SHOULD be capable of audio output. All other devices SHALL be capable of audio output, such as speaker, headphones, line out, or Bluetooth.

1.2.1. If your product provides audio output, it SHALL provide on-device controls for adjusting volume.

1.2.2. Wearable devices, such as smart watches, with no audio output capability SHALL provide haptic feedback in place of audio cues. Wearable devices with audio output SHOULD provide haptic feedback when audio output is turned off.

1.2.3. Alexa responses SHALL always be delivered to customers in full, without defects or alterations.

1.3. An Alexa Built-in product SHOULD have an Action button. For more details about Action buttons, see UX for Product Buttons.

  • If your product does include an Action button, the following requirements and recommendations apply:

1.3.1. The Action button SHALL enable customers to initiate an Alexa interaction.

1.3.2. The Action button SHALL enable customers to interrupt an Alexa output (e.g. media playback, Alexa voice responses, or Alerts). See UX Interrupt Guidance for more.

1.3.3. A voice-initiated product with an integrated touch-screen interface SHALL also display an on-screen Alert dismissal prompt.

1.3.4. The Action button SHALL be easily accessible to your customer.

1.3.5. The Action button SHOULD have the single purpose of initiating Alexa interactions.

  • If your product does not include an Action button, the following requirements and recommendations apply:

1.3.6. A voice-initiated product with an integrated touch-screen interface SHALL display an on-screen Alert dismissal prompt.

1.3.7. A voice-initiated headless product SHALL conditionally assign Alert dismissal to a physical button or touch-point on the product while an Alert is sounding. For more details about this type of conditional assignment, see Alert dismissal guidance.

1.4. Your product SHALL clearly convey core Alexa attention states to the customer using visual and audio cues. The core attention states are Listening, Thinking, Speaking, Microphone ON/OFF, Alerts, Notifications, and Do Not Disturb. See the AVS UX Design Overview for further information about the Alexa attention states.

1.4.1. The visual cues your product uses to satisfy Requirement 1.4 SHOULD be prominent. See Visual attention system prominence for further information about visual cue prominence.

1.4.2. For products that do not have prominent visual cues, Start of Request and End of Request audio cues SHALL be on by default.

1.4.3. Visual, audio, and haptic cues SHALL be synchronized to indicate when the Alexa Listening state starts and when it stops.

1.5. Your product SHALL support multi-turn interactions with Alexa.

1.5.1. Your product SHALL use the same methods for conveying the start and end of the Listening attention state for all multi-turn interactions as for the initial interaction.

1.6. Your product SHALL support silencing alerts, adjusting volume, and stopping media when internet connectivity is unavailable.

1.7. Device-level states such as Alexa Do Not Disturb ON/OFF and Microphone ON/OFF SHALL NOT be altered by customer sign-out and sign-in.

2. Voice-initiated products

The following guidelines are specific to voice-initiated products and extend the Core Requirements and Recommendations for those products.

2.1. Your product SHALL use only approved Amazon Alexa wake words, such as "Alexa".

2.1.1. Your product SHALL support cloud-based wake word verification.

2.1.2. Your product SHALL automatically activate its microphones without waiting for the wake word in multi-turn interactions. See also Requirement 1.5.

2.2. Your product SHALL enable customers to use voice to interrupt Alexa output (e.g. media playback, Alexa voice responses, and Alerts). See also Requirement 1.3.2.

2.3. Your product SHALL provide an always-available, physical (non-GUI) control to disable the device microphones, putting your product into the Microphone Off state. For details, see UX Attention System.

2.3.1. You SHALL provide audio cues to indicate when a user activates or deactivates the Microphone Off attention state.

2.3.2. Your product SHALL use visual cues to convey clearly and continually to the customer that the Alexa Microphone Off attention state is active.

2.4. Your product SHALL support enabling/disabling microphones when internet connectivity is unavailable. See also Requirement 1.6.

2.5. The microphones used for Alexa interactions on your product SHALL have +/- 1 dB sensitivity matching.

2.6. Your product SHALL report Wake Word Diagnostics Information (WWDI) as documented in the WWDI Integration Guide. Request a copy of the WWDI Integration Guide from your AVS contact. WWDI implementation is required for AVS certification. If your product uses a third-party wake word engine, you are not required to implement WWDI.

3. Touch-initiated products

The following guidelines are specific to touch-initiated products and extend the Core Requirements and Recommendations for those products. Unless noted, the guidelines apply to both tap-to-talk and hold-to-talk products.

3.1. Your product SHALL NOT require the use of a wake word as part of the user utterance.

3.2. Your microphones on the device SHALL be disabled until a user initiates an Alexa interaction.

3.3. Your product SHALL automatically activate its microphones without waiting for a touch interaction in multi-turn situations. The sole exception is hold-to-talk devices where the microphones are only activated when the customer holds down a device control. See also Requirement 1.5.

3.4. Your product SHALL enable customers to interrupt an Alexa output (e.g. media playback, Alexa voice responses, and Alerts) using the Action button. See also Requirement 1.3.2.

3.5. Your product SHALL use audio cues to indicate the start and end of the Listening attention state.

4. Media services

The following guidelines apply to all products that support media services, such as Amazon Music, TuneIn, iHeartRadio, Audible, and Flash Briefing. These media service guidelines apply to both voice-initiated and touch-initiated products.

For more details about handling competing audio outputs, review the Alexa Voice Service Interaction Model.

4.1. Your product SHALL pause or lower the speaker volume for audio output when a customer initiates an Alexa interaction during media playback.

4.1.1. Your product SHALL pause Audible content playback when interrupted by a customer.

4.1.2. If your product pauses media because of a customer interruption, it SHOULD resume playback automatically.

4.2. Your product SHALL allow customers to resume paused media through a voice request or a device control.

4.3. Your product SHOULD sufficiently buffer media so that short interruptions in internet connectivity don't disrupt playback.

5. Alerts

The following guidelines apply to delivering and controlling alerts, such as timers or alarms, and extends Requirement 1.6 of the Core Requirements and Recommendations.

5.1. Your product SHALL always deliver scheduled alerts to customers even when internet connectivity is unavailable.

5.1.1. If alerts are delivered while internet connectivity is unavailable, your product SHALL send the appropriate events for the delivered alerts to Alexa when an internet connection is reestablished. For additional information, see Alerts Overview.

5.2. Your product SHALL support the use of the Action button to stop sounding alerts. For more details, see Requirement 1.3.2.

5.3. Your product SHALL play alerts that contain voice responses, such as Reminders, at the same volume as other Alexa voice responses.

5.4. Your product SHOULD support independent volume control for alerts that do not contain voice responses, such as Timers. When a customer adjusts the device volume for Alexa voice responses, it SHOULD NOT affect the volume for these alerts.

5.5. A Timer or Alarm on your product SHALL sound for one hour from the scheduled time unless it is stopped with a voice request or a physical dismissal. For more details, see Requirement 5.2 and the Alerts API reference.

5.5.1 A wearable timepiece product with a native alert system, such as a smartwatch, may use a shorter alert sound duration to improve product battery life and harmonize the user experiences for native and Alexa alerts.

5.5.2. The sound durations for Timers and Alarms on a wearable timepiece product SHALL be at least 15 seconds and SHOULD be no longer than 120 seconds.

5.5.3. The sound duration of a Timer may be different from that of an Alarm on a wearable timepiece product.

5.5.4. If the Alexa and native Alerts systems have been fully unified on a product, it may be acceptable for all Alerts to use the same audio cues. This implementation requires Amazon review and approval.

5.6. Your product SHALL support delivery of Alerts at their scheduled time regardless of any power cycle events. If an Alert would have been delivered while the product's power is OFF, the following rules apply. For more details about API event reporting under these conditions, see Manage alerts locally.

5.6.1. Alerts scheduled for delivery at a time during the 30 minutes prior to power ON SHALL be delivered when power is restored.

5.6.2. Alerts scheduled for delivery at a time greater than 30 minutes prior to power ON SHALL be discarded when power is restored.

6. Notifications

The following guidelines apply to delivering and controlling notifications.

6.1. Your product SHALL download and play the Notification arrival audio asset specified in the directive payload when a new Notification arrives.

6.1.1. If the download fails or times out, your product SHALL use the locally stored Notification arrival audio asset available from Alexa sound library for AVS.

6.2. When un-retrieved Notifications are in queue, the Notifications-in-Queue visual cue SHALL be persistently visible whenever your product is powered ON.

6.2.1. If a new Notification is delivered while your product is powered OFF or is in STBY mode, the one-time Notification arrival audio cue SHALL play and the persistent Notifications-in-Queue visual cue SHALL begin as soon as the product returns to power ON state.

6.2.2. If all Notifications-in-Queue have been retrieved by the customer elsewhere while your product is powered OFF or is in STBY mode, your product SHALL synchronize its Notification queue state to empty (with no visual cue) within no more than 10 seconds after it returns to power ON state.

6.2.3. If your product is a TV or STB, the on-screen Notifications-in-Queue visual cue SHALL be persistently accessible via the system home, system settings menu, or on-device Alexa app and can also be displayed as a triggered, temporary reminder if media is not playing.

7. Visual displays

The following requirements are specific to screen-based devices, and apply to both voice-initiated and touch-initiated products. Display Alexa visual responses by implementing the Alexa Presentation Language (APL), Display Cards, or both.

7.1. Your product SHALL display visual Alexa responses if it uses a pixel-based screen, such as, Smart TVs, Set-Top Boxes, AVRs, and MVPDs.

7.2. If you implement support for APL visual responses in your product, you SHALL implement a an appropriate viewport or viewhost window for your device that allows all Alexa response content to render legibly.

7.2.1. Your product SHALL NOT add, remove, or alter the data supplied by APL directives.

7.3. If you implement Display Cards you SHALL follow all requirements regarding their implementation.

7.3.1. Your product SHALL render all visual metadata to the specification for its screen size and SHALL NOT add, remove, or alter the metadata in any way when presented to the user.

7.3.2. If media is enabled, your product SHALL display all playback controls provided.

7.3.3. Any Display Cards your product uses SHOULD conform to the Display Card Design Guidelines.

7.4. If your product has a camera, the camera ON/OFF control SHALL be hardware-based.

7.4.1. The camera SHALL be turned OFF by removing power.

7.4.2. The product SHALL visibly indicate camera ON/OFF state.

7.4.3. The product SHOULD have a physical cover or shutter for the camera.

8. Setup and authentication

The Alexa setup process communicates the value of Alexa to your users and helps them connect your product to their Amazon account. Ideally, the Alexa setup flow should be incorporated into the setup or first run experience on your product. See the AVS UX Design Overview for more branding and style information for the Alexa setup and authentication experience.

8.1. Your product SHALL use Login With Amazon (LWA) to authenticate the customer. See the Authorization section in the Alexa Voice Service API Overview for additional information.

8.2. Your product SHALL have an Alexa setup/sign in experience that follows the Setup and Authentication guidelines in the AVS UX Design Overview.

8.2.1. Your product SHALL have a Splash Screen before the customer enters the Login With Amazon (LWA) authentication flow. If your product doesn't use Code-Based Linking, your product SHALL use the AVS Hosted Splash Screen provided through the Login With Amazon authentication flow. Your Splash Screen SHALL include the required elements as defined in the AVS UX Setup and Authentication guidelines.

8.2.2. Your product SHALL have a Things to Try screen after the customer exits the Login With Amazon authentication flow. Your Things To Try screen SHALL include the required elements as defined in the AVS UX Setup and Authentication guidelines.

8.3. Your product SHOULD allow the customer to choose an Alexa language.

8.3.1. Your product SHOULD include Alexa language selection as part of product setup.

8.3.2. Your product SHOULD include an Alexa language selector in the companion app or on-device Settings menu.

8.3.3. A product that supports multiple locales SHALL support multilingual mode for all possible locale pairs. Refer to System Interface and AVS Device SDK Release Notes documentation to determine which locale pairs are possible for your product.

8.3.4. If your product supports multilingual mode, it SHALL include an Alexa language selection screen which follows the format described in the locale combinations API.

8.4. Your product SHALL support logout by the customer.

8.5. You SHOULD include information about Alexa setup and use in your product's instructional materials.

9. Skills

Your product SHALL support product-appropriate Alexa Skill experiences.

9.1 Customers SHALL be able to successfully enable and experience basic and custom Alexa Skills on an ABI product.

9.2 Customers SHALL be able to successfully enable and experience multimodal Alexa Skills on a multimodal ABI product.

9.3 Customers SHALL be able to successfully enable and experience declared Alexa Video Skills on a supporting ABI product.

9.4 Customers SHALL be able to successfully enable and experience declared Alexa Smart Home Skills on a supporting ABI product.

10. Bluetooth

These requirements are specific to products that use the Bluetooth interface:

10.1. Your product SHOULD support the Advanced Audio Distribution Profile (A2DP) Bluetooth profile.

If your product supports A2DP, it SHALL support receiving digital audio streams from an A2DP SOURCE device.

If your product supports A2DP, it SHALL support the Audio/Video Remote Control Profile (AVRCP) Bluetooth profile.

10.2. If your product uses the Bluetooth interface, it SHALL use the Bluetooth connect and disconnect sounds provided by Amazon.

11. Reporting

Reporting requirements provide information to AVS about the distribution of AVS software versions. The following reporting requirements apply to all new products entering the AVS certification process after January 2, 2021:

11.1. Your product SHALL report a valid product firmware version to Alexa through the SoftwareInfo event in the System interface.

11.2. Your product SHALL report all capability interface versions that devices support through the Alexa.Discovery interface.

11.3. Your product SHALL declare support for the Alexa.SoftwareComponentReporter interface.

11.3.1 Products that implement the AVS Device SDK must use the Alexa.SoftwareComponentReporter interface to report their AVS Device SDK version to Alexa with the component name com.amazon.alexa.deviceSDK. Products that don't use the AVS Device SDK must still assert support for the Alexa.SoftwareComponentReporter interface and omit the entry for com.amazon.alexa.deviceSDK.

Resources


Was this page helpful?

Last updated: Dec 11, 2023