Overview of the Alexa Voice Service (AVS) Device SDK

The Alexa Voice Service (AVS) Device SDK provides you with a set of C ++ libraries to build an Alexa Built-in product. With these libraries your device has direct access to cloud-based Alexa capabilities to receive voice responses instantly. Your device can be almost anything – a smartwatch, a speaker, headphones – the choice is yours.

The SDK is modular and abstract. It provides separate components to handle necessary Alexa functionality including processing audio, maintaining persistent connections, and managing Alexa interactions.

Each component exposes Alexa APIs to customize your device integrations as needed. The SDK also includes a sample app, to test interactions before integration.

Release notes

For a complete list of releases, updates, and known bugs, see the SDK release notes.

Version Release date
1.26.0 November 15, 2021
1.25.0 August 24, 2021
1.24.0 June 4, 2021
1.23.0 March 29, 2021
1.22.0 December 8, 2020
1.21.0 October 26, 2020
1.20.1 August 6, 2020
1.20.0 June 22, 2020
1.19.1 April 27, 2020
1.19.0 April 13, 2020
1.18.0 February 19, 2020
1.17.0 December 10, 2019
1.16.0 October 25, 2019
1.15.0 September 25, 2019
1.14.0 July 09, 2019
1.13.0 May 05, 2019
1.12.1 April 02, 2019
1.12.9 February 25, 2019
Older versions SDK release notes

SDK architecture

The following diagram illustrates components of the SDK and how data flows between them.

The green boxes are official components of the SDK – they include the following items:

  • Audio Input Processor (AIP)
  • Shared Data Stream (SDS)
  • Alexa Communication Library (ACL)
  • Alexa Directive Sequencer Library (ADSL)
  • Activity Focus Manager Library (AFML)
  • Capability Agent

The white and blue boxes aren't official components and depend on external libraries – these include the following items:

  • Audio Signal Processor (ASP)
  • Wake Word Engine (WWE)
  • Media Player

For general information about Alexa and client interaction, see the Interaction Model.

avs device sdk architecture

Here's an example interaction with the SDK. This process might vary if you've added or removed any components.

  1. You ask a question, "Alexa, what is the weather?"
  2. The microphone captures the audio and writes it to the SDS.
  3. The WWE is always monitoring the SDS. When the WWE detects the wake word Alexa, it sends the audio to the AIP.
  4. The AIP sends a SpeechRecognizer event to AVS using the ACL.
  5. AVS processes the event and sends the appropriate directive back down through the ACL. The SDS then picks up the directive and sends it to the ADSL.
  6. The ADSL examines the header of the payload and determines what Capability Agent it must call.
  7. When the Capability Agent activates, it requests focus from the AFML.
  8. The Media Player plays the directive. For this example, Alexa responds with "The weather is nine degrees and cloudy with a chance of rain."

Here are some details about each individual component in the sequence.

Audio Signal Processor (ASP)

The ASP isn't actually a component of the AVS Device SDK. It's Software On a Chip (SOC) or firmware on a dedicated Digital Signal Processor (DSP). Its job is to clean up the audio and create a single audio stream, even if your device uses a multimicrophone array. Techniques used to clean the audio include Acoustic Echo Cancellation (AEC), noise suppression, beam forming, Voice Activity Detection (VAD), Dynamic Range Compression (DRC), and equalization.

Shared Data Stream (SDS)

The SDS is single producer, multi-consumer audio input buffer that transports data between a single writer and one or more readers. This ring buffer moves data throughout the different components of the SDK without duplication. This process minimizes the memory footprint, as it continuously overwrites itself. SDS operates on product-specific and user-specified memory segments, allowing for interprocess communication. Keep in mind, the writer and readers might be in different threads or processes.

SDS handles these key tasks:

  1. Receives audio from the ASP and then passes it to the WWE.
  2. Passes the audio from the WWE engine to the ACL. The ACL then passes the audio to AVS for processing.
  3. Receives data attachments back from the ACL and passes it to the appropriate Capability Agent.

Wake Word Engine (WWE)

The WWE is software that constantly monitors the SDS, waiting for a preconfigured wake word. When the WWE detects the correct wake word, it notifies the AIP to begin reading the audio. When using the AVS Device SDK, the wake word is always "Alexa."

The WWE consists of following two binary interfaces.

  • Interface 1 – Handles general wake word detection.
  • Interface 2 – Handles specific wake word models.

Audio Input Processor (AIP)

Responsibilities of the AIP include reading audio from the SDS and then sending it to AVS for processing. The AIP also includes the logic to switch between different audio input sources. The AIP triggers with the following inputs:

  • External audio – Captured with on-device microphones, remote microphones and other audio input sources.
  • Tap-to-Talk – Captured with designated Tap-to-Talk inputs.
  • Speech directive – Sent from AVS to continue an interaction. For example, multiturn dialog.

When triggered, the AIP continues to stream audio until it receives a Stop directive or times out. AVS can only receive one audio input source at any given time.

Alexa Communications Library (ACL)

The ACL manages the network connection between the SDK and AVS. The ACL performs the following key functions:

  • Establishes and maintains long-lived persistent connections with AVS. ACL adheres to the messaging specification detailed in Managing an HTTP/2 Connection with AVS.
  • Provides message sending and receiving capabilities. These capabilities include support JSON-formatted text, and binary audio content. For more details, see Structuring an HTTP/2 Request to AVS.
  • Forwards incoming directives to the ADSL.
  • Handles disconnect and reconnections. If the device disconnects, it automatically attempts to reconnect for you.
  • Manages secure connections.

Alexa Directive Sequencer Library (ADSL)

The ADSL Manages handles incoming directives, as outlined in the AVS Interaction Model. The ACL performs the following key functions:

  1. Accepts directives from the ACL.
  2. Manages the lifecycle of each directive, including queuing, reordering, or canceling directives as necessary.
  3. Forwards directives to the appropriate Capability Agents by examining the directive header and reading the namespace of the interface.

Capability Agents

A Capability Agent is what performs the desired action on a device. They map directly to interfaces supported by AVS. For example, if you ask Alexa to play a song, the Capability Agent is what loads the song into your media player and plays it. A Capability Agent performs the following two tasks:

  1. Receives the appropriate directive from the ADSL.
  2. Reads the payload and performs the requested action on the device.

The following table maps the core AVS Interfaces to their equivalent AVS Device SDK Capability Agents. For a complete list of SDK interfaces, browse the SDK source files on GitHub.

AVS Interface SDK Capability Agent Description
Alerts Alerts Settings, stopping, and deleting timers and alarms.
AudioPlayer AudioPlayer Managing and controlling audio playback.
Bluetooth Bluetooth Managing Bluetooth connections between peer devices and Alexa-Built-in products.
DoNotDisturb DoNotDisturb Enabling the Do Not Disturb feature.
EqualizerController Equalizer Adjust equalizer settings, such as decibel (dB) levels and modes.
InteractionModel InteractionModel Enable a client to support complex interactions initiated by Alexa, such as Alexa Routines.
Notifications Notifications Displaying notifications indicators.
PlaybackController PlaybackController Navigating a playback queue with GUI or buttons.
Multi-room Music Multi-room music Implement the Multi-room Music (MRM) feature.
Speaker SpeakerManager Volume control, including mute and unmute.
SpeechRecognizer Audio Input Processor Speech capture.
SpeechSynthesizer SpeechSynthesizer Alexa speech output.
System System Communicating product status/state to AVS.
TemplateRuntime TemplateRuntime Rendering visual metadata.

Activity Focus Manager Library (AFML)

The AFML makes sure the SDK handles directives in the correct order. It determines which capability has control over the input and output of the device at any time. For example, if you're playing music and an alarm goes off on your device, the alarm takes focus over the music. The music pauses and the alarm rings.

Focus uses a concept called channels to govern the prioritization of audiovisual inputs and outputs.

Channels exist in the foreground or background. At any given time, only one channel can inherit the foreground state and take focus. If more than one channel is active, a device must respect the following priority order: Dialog > Alerts > Content. When a channel in the foreground becomes inactive, the next active channel in the priority order moves into the foreground.

Focus management isn't specific to Capability Agents or Directive Handlers. Agents that aren't related to Alexa also use it. Focus management enables all agents by using the AFML to have a consistent focus across a device.

Media player

The media player isn't actually a component of the AVS Device SDK. The SDK comes with a wrapper for Gstreamer and Android Media Player. If you want to use a different media player, you must build a wrapper for it with the MediaPlayer interface. For more details about custom media players, see media player.

Important considerations

Security requirements

All Alexa products must meet the AVS Security Requirements. When building the AVS Device SDK, you are required to adhere to the following security principles.

  • Protect configuration parameters, such as those found in the AlexaClientSDKConfig.json file, from tampering and inspection.
  • Protect executable files and processes from tampering and inspection.
  • Protect persistent states of the SDK from tampering and inspection.
  • Your C++ implementation of AVS Device SDK interfaces must not retain locks, crash, stop responding, or throw exceptions.
  • Use exploit mitigation flags and memory randomization techniques when you compile your source code to prevent vulnerabilities from exploiting buffer overflows and memory corruptions.

Was this page helpful?

Last updated: Nov 15, 2021