About the Alexa Voice Service (AVS) Interaction Model
A device that interacts with the Alexa Voice Service (AVS) encounters events/directives that produce competing audio. For example, a user might ask a question when Alexa is speaking, or a scheduled alarm plays when music is already streaming. The rules that govern the prioritization and handling of these inputs and outputs make up the AVS interaction model.
InteractionModel API
Implement the InteractionModel 1.2 to enable Alexa Routines for your product. InteractionModel 1.2 includes the NewDialogRequest
directive and modifications to the AVS interaction model voice request lifecycle.
Device vs. AVS-initiated audio interactions
Either the device or the AVS might begin an audio interaction:
- Device-initiated interactions – In a device-initiated interaction, the device sends an event to AVS. AVS processes the event and then returns any appropriate directives to the device in response. For example, when a user asks Alexa, "What time is it?" The device streams the captured user audio to AVS, and after AVS processes the event, AVS returns a directive to the device instructing the device to output speech, such as, "It's 10:00 AM."
- AVS-initiated interactions – In an AVS-initiated interaction, the device receives directives without any preceding device events. For example, when a user adjusts device volume from the Amazon Alexa app there is no event sent directly from the device to Alexa. Alexa interprets the action taken on the Amazon Alexa app and sends a directive to the device, which the device then acts upon.
Send each event to AVS in its own event stream. AVS might return directives and corresponding audio attachments in the same stream or in a separate downchannel stream. The downchannel stream delivers AVS-initiated directives to your device. The downchannel remains open in a half-closed state from the device and open from the Alexa Voice Service for the life of a connection. You have several options for implementing event and downchannel streams, depending on transport protocol. For more details on establishing both event and downchannel streams over HTTP/2, see Managing an HTTP/2 Connection.
Voice request lifecycle
When a device sends events to AVS, make sure that the device enforces the following rules:
- Your device must create a unique
dialogRequestId
for eachRecognize
event the device sends to AVS. ThedialogRequestId
correlates theRecognize
event with directives sent to your device from AVS. - Don't reuse any
dialogRequestId
within a session. - Include the
dialogRequestId
in theRecognize
event header. - Keep track of the active
dialogRequestId
. - The
dialogRequestId
remains active until the device sends the nextRecognize
event to AVS. After the device sends the nextRecognize
event, cancel any directives associated with olderdialogRequestId
s.
If AVS initiates an interaction, AVS sends a NewDialogRequest
directive with a dialogRequestId
in the payload. This dialogRequestId
replaces any older dialogRequestId
s. Cancel any directives associated with the previous dialogRequestId
.
When AVS sends directives to a device, make sure that the device enforces the following rules:
- Process the directives with a
dialogRequestId
in the header that matches the activedialogRequestId
in sequence. - Set the
dialogRequestId
in the payload ofInteractionModel.NewDialogRequest
directives to active, and then implement the directives. - Process directives with a
dialogRequestId
in the header that matches thedialogRequestId
from aNewDialogRequest
in sequence. - Implement directives without a
dialogRequestId
upon receipt. - Your device must send an
ExceptionEncountered
event to AVS when it encounters new or unknown directives. - If your device receives a
Speak
directive, you must fully playback the associated audio before processing subsequent directives.
For an example, see AudioInputProcessor.cpp
in the AVS Device SDK.
Channels
Channels help your device to determine priority for audio inputs and outputs. A channel can be either active or inactive and can be in the foreground or in the background at any given time.
Types of channels
Organize all audio handled by your device into three types of channels:
- Dialog channels – Active when either a user or Alexa is speaking.
- Alerts channels – Active when a timer or alarm is sounding.
- Content channels – Active when your device is playing media, such as audio streams.
Each channel maps to one or more AVS interfaces, and only one interface can be active on a given channel at a time. For example, the Dialog channel maps to the SpeechSynthesizer interface. When AVS returns a Speak
directive to your device, the Dialog channel becomes active and remains active until Alexa has finished responding. Similarly, when a timer goes off, the Alerts channel becomes active and remains active until a user cancels the timer.
The following table shows which interfaces map to each channel:
Channel | Interfaces |
---|---|
Dialog | SpeechRecognizer, SpeechSynthesizer |
Alerts | Alerts |
Content | AudioPlayer |
Multiple channels might be active concurrently. For instance, if a user is listening to music and asks Alexa a question, the Content and Dialog channels are concurrently active as long as the user or Alexa is speaking.
Foreground vs. background channels
Channels can either be in the foreground or background. At any given time, a device can have one channel in the foreground. When a channel in the foreground becomes inactive, the next active channel in the priority order moves into the foreground. If multiple channels are active, use the following priority order for your channels:
- Dialog
- Alerts
- Content
The following rules govern how channels interact:
- Inactive channels are always in the background.
- The Dialog channel is always in the foreground when active.
- The Alerts channel is in the foreground when the Dialog channel is inactive.
- The Content channel is in the foreground when all other channels are inactive.
- When a channel in the foreground becomes inactive, the next active channel in the priority order moves into the foreground.
- When the Content Channel is in the background, this refers to the pausing or attenuation of audio playback.
ExpectSpeech
directive in response to a Recognize
event prompting a user for additional speech, the Dialog channel should remain active until all directives associated with the request/response scenario are processed.How to handle a directive for a given interface depends on the state of the associated channel. Is the channel active or inactive? Is the channel in the foreground or background? For example, if the Dialog channel is in the foreground, and an alarm sounds, the alarm should play in short alert mode as long as the Dialog channel is active. If an alarm sounds and the Dialog channel is inactive, a long alert should play.
For more details on how to handle each directive, see AVS API Overview.
Test the interaction model
Run the following test scenarios to verify that your implementation of the AVS interaction model is working as expected. You can test these scenarios on an Amazon Echo device or by the AVS Device SDK.
Test Alert and Dialog channel interactions
To test Alert and Dialog channel interactions
- Ask Alexa to set a timer for five seconds.
-
After Alexa notifies you that the timer was set, ask Alexa for the weather forecast.
As Alexa provides you with the forecast, the timer should go off as a short alert until Alexa has finished speaking. This behavior indicates that the Dialog channel is active, and the Alerts channel must be in the background. After Alexa finishes speaking, the Alerts channel moves to the foreground, and a long alert should continue to play until you stop the timer.
Test Content channel interactions
To test Content channel interactions
- Ask Alexa to set a timer for one minute.
-
When Alexa notifies you that the timer has been set, ask Alexa to play your favorite song.
The song begins playing, and one minute into playback, the music should be sent to the background as your timer plays. This behavior occurs because the Content channel can only be in the foreground if the other channels are inactive. The music should remain in the background until you stop the timer, at which point, your favorite song should return to normal volume, or resume from a paused state.
Test Dialog channel interactions
To test Dialog channel interactions
- Ask Alexa to play your favorite song.
-
After playback begins, ask Alexa for local news.
The music should sent to the background for your entire voice request and the response from Alexa. This behavior occurs because the Dialog channel is always in the foreground when active. When Alexa finishes responding, music should return to normal volume or resume from a paused state.
Next steps
- Alerts Overview
- AudioPlayer Overview
- Display Cards Overview
- Notifications Overview
- Recommended Media Support
Resources
Last updated: Dec 03, 2020