About the Alexa Voice Service (AVS) Interaction Model

A device that interacts with the Alexa Voice Service (AVS) encounters events/directives that produce competing audio. For example, a user might ask a question when Alexa is speaking, or a scheduled alarm plays when music is already streaming. The rules that govern the prioritization and handling of these inputs and outputs make up the AVS interaction model.

InteractionModel API

Implement the InteractionModel 1.2 to enable Alexa Routines for your product. InteractionModel 1.2 includes the NewDialogRequest directive and modifications to the AVS interaction model voice request lifecycle.

Device vs. AVS-initiated audio interactions

Either the device or the AVS might begin an audio interaction:

  • Device-initiated interactions – In a device-initiated interaction, the device sends an event to AVS. AVS processes the event and then returns any appropriate directives to the device in response. For example, when a user asks Alexa, "What time is it?" The device streams the captured user audio to AVS, and after AVS processes the event, AVS returns a directive to the device instructing the device to output speech, such as, "It's 10:00 AM."
  • AVS-initiated interactions – In an AVS-initiated interaction, the device receives directives without any preceding device events. For example, when a user adjusts device volume from the Amazon Alexa app there is no event sent directly from the device to Alexa. Alexa interprets the action taken on the Amazon Alexa app and sends a directive to the device, which the device then acts upon.

Send each event to AVS in its own event stream. AVS might return directives and corresponding audio attachments in the same stream or in a separate downchannel stream. The downchannel stream delivers AVS-initiated directives to your device. The downchannel remains open in a half-closed state from the device and open from the Alexa Voice Service for the life of a connection. You have several options for implementing event and downchannel streams, depending on transport protocol. For more details on establishing both event and downchannel streams over HTTP/2, see Managing an HTTP/2 Connection.

Voice request lifecycle

When a device sends events to AVS, make sure that the device enforces the following rules:

  • Your device must create a unique dialogRequestId for each Recognize event the device sends to AVS. The dialogRequestId correlates the Recognize event with directives sent to your device from AVS.
  • Don't reuse any dialogRequestId within a session.
  • Include the dialogRequestId in the Recognize event header.
  • Keep track of the active dialogRequestId.
  • The dialogRequestId remains active until the device sends the next Recognize event to AVS. After the device sends the next Recognize event, cancel any directives associated with older dialogRequestIds.

If AVS initiates an interaction, AVS sends a NewDialogRequest directive with a dialogRequestId in the payload. This dialogRequestId replaces any older dialogRequestIds. Cancel any directives associated with the previous dialogRequestId.

When AVS sends directives to a device, make sure that the device enforces the following rules:

  • Process the directives with a dialogRequestId in the header that matches the active dialogRequestId in sequence.
  • Set the dialogRequestId in the payload of InteractionModel.NewDialogRequest directives to active, and then implement the directives.
  • Process directives with a dialogRequestId in the header that matches the dialogRequestId from a NewDialogRequest in sequence.
  • Implement directives without a dialogRequestId upon receipt.
  • Your device must send an ExceptionEncountered event to AVS when it encounters new or unknown directives.
  • If your device receives a Speak directive, you must fully playback the associated audio before processing subsequent directives.

For an example, see AudioInputProcessor.cpp in the AVS Device SDK.

Channels

Channels help your device to determine priority for audio inputs and outputs. A channel can be either active or inactive and can be in the foreground or in the background at any given time.

Types of channels

Organize all audio handled by your device into three types of channels:

  • Dialog channels – Active when either a user or Alexa is speaking.
  • Alerts channels – Active when a timer or alarm is sounding.
  • Content channels – Active when your device is playing media, such as audio streams.

Each channel maps to one or more AVS interfaces, and only one interface can be active on a given channel at a time. For example, the Dialog channel maps to the SpeechSynthesizer interface. When AVS returns a Speak directive to your device, the Dialog channel becomes active and remains active until Alexa has finished responding. Similarly, when a timer goes off, the Alerts channel becomes active and remains active until a user cancels the timer.

The following table shows which interfaces map to each channel:

Channel Interfaces
Dialog SpeechRecognizer, SpeechSynthesizer
Alerts Alerts
Content AudioPlayer

Multiple channels might be active concurrently. For instance, if a user is listening to music and asks Alexa a question, the Content and Dialog channels are concurrently active as long as the user or Alexa is speaking.

Foreground vs. background channels

Channels can either be in the foreground or background. At any given time, a device can have one channel in the foreground. When a channel in the foreground becomes inactive, the next active channel in the priority order moves into the foreground. If multiple channels are active, use the following priority order for your channels:

  1. Dialog
  2. Alerts
  3. Content

The following rules govern how channels interact:

  • Inactive channels are always in the background.
  • The Dialog channel is always in the foreground when active.
  • The Alerts channel is in the foreground when the Dialog channel is inactive.
  • The Content channel is in the foreground when all other channels are inactive.
  • When a channel in the foreground becomes inactive, the next active channel in the priority order moves into the foreground.
  • When the Content Channel is in the background, this refers to the pausing or attenuation of audio playback.

How to handle a directive for a given interface depends on the state of the associated channel. Is the channel active or inactive? Is the channel in the foreground or background? For example, if the Dialog channel is in the foreground, and an alarm sounds, the alarm should play in short alert mode as long as the Dialog channel is active. If an alarm sounds and the Dialog channel is inactive, a long alert should play.

For more details on how to handle each directive, see AVS API Overview.

Test the interaction model

Run the following test scenarios to verify that your implementation of the AVS interaction model is working as expected. You can test these scenarios on an Amazon Echo device or by the AVS Device SDK.

Test Alert and Dialog channel interactions

To test Alert and Dialog channel interactions

  1. Ask Alexa to set a timer for five seconds.
  2. After Alexa notifies you that the timer was set, ask Alexa for the weather forecast.

    As Alexa provides you with the forecast, the timer should go off as a short alert until Alexa has finished speaking. This behavior indicates that the Dialog channel is active, and the Alerts channel must be in the background. After Alexa finishes speaking, the Alerts channel moves to the foreground, and a long alert should continue to play until you stop the timer.

Test Content channel interactions

To test Content channel interactions

  1. Ask Alexa to set a timer for one minute.
  2. When Alexa notifies you that the timer has been set, ask Alexa to play your favorite song.

    The song begins playing, and one minute into playback, the music should be sent to the background as your timer plays. This behavior occurs because the Content channel can only be in the foreground if the other channels are inactive. The music should remain in the background until you stop the timer, at which point, your favorite song should return to normal volume, or resume from a paused state.

Test Dialog channel interactions

To test Dialog channel interactions

  1. Ask Alexa to play your favorite song.
  2. After playback begins, ask Alexa for local news.

    The music should sent to the background for your entire voice request and the response from Alexa. This behavior occurs because the Dialog channel is always in the foreground when active. When Alexa finishes responding, music should return to normal volume or resume from a paused state.

Next steps

Resources