Understand the AVS SpeechRecognizer
The SpeechRecognizer
interface is the core interface of the Alexa Voice Service (AVS) and exposes directives and events for capturing and interacting with user speech. This page discusses the concepts and process flows for the SpeechRecognizer
interface.
SpeechRecognizer functionality
Every user utterance leverages SpeechRecognizer
, including the following interaction types between an Alexa Built-in device and AVS:
- User Speech Capture: Captures user speech from an Alexa Built-in device.
- User Speech Prompting: Prompts a user for more speech input when needed by Alexa to deliver an appropriate response.
- Interaction Initiation Communication: Enables a device to inform AVS of how a user initiated an Alexa interaction, such as press-and-hold, tap-and-release, or voice-initiated/wake word enabled. See Device Form Factor and Alexa Interaction.
- ASR Profile Selection: SpeechRecognizers chooses the appropriate Automatic Speech Recognition (ASR) profile for your product, which allows Alexa to understand user speech and respond with precision. See Automatic Speech Recognition (ASR) profile
State diagram
The following diagram illustrates state changes driven by SpeechRecognizer
components. Boxes represent SpeechRecognizer
states, and the connectors represent state transitions.
SpeechRecognizer
has the following states:
- IDLE: When not actively processing speech, the
SpeechRecognizer
is in an "idle" state. The idle state occurs under the following conditions:- Before capturing user speech.
- Returning to an idle state after concluding a speech interaction with Alexa.
- When an
ExpectSpeechTimedOut
event elapses.
Note: In a multi-turn Alexa interaction, if Alexa requires more user speech input,SpeechRecognizer
should transition from the idle state to theExpectSpeech
state without the user starting a new interaction. - RECOGNIZING: When a user begins interacting with your client, specifically when the client streams captured audio to AVS,
SpeechRecognizer
should transition from the idle state to theRecognize
state.SpeechRecognizer
should remain in theRecognize
until the client stops recording speech or finishes streaming, at which point yourSpeechRecognizer
component should transition from theRecognize
state to the "busy" state. - BUSY: While processing the speech request,
SpeechRecognizer
should be in the "busy" state. You cannot start another speech request untilSpeechRecognizer
transitions out of the busy state. From the busy state,SpeechRecognizer
transitions to the idle state if Alexa processes and completes the request or to theExpectSpeech
state, if Alexa requires more speech input from the user. - EXPECTING SPEECH:
SpeechRecognizer
should be in theExpectSpeech
state when Alexa requires more audio input from a user. FromExpectSpeech
,SpeechRecognizer
should transition to theRecognize
when a user interaction occurs, or the interaction is automatically started on behalf of the user. It should transition to the idle state if Alexa detects no user interaction within the specified timeout window.
The following diagram illustrates the expected transitions among the four SpeechRecognizer
states:

Wake words
The list of wake words informs Alexa of the possible valid wake words that a device might be set to listen for through the SetWakeWords
, WakeWordsReport
, and WakeWordsChanged
messages.
Currently, the only wake word available for Alexa Built-in devices is ALEXA
, which applies to every possible locale for a device. Therefore, specify ALEXA
in the global DEFAULT
scope.
Last updated: Nov 27, 2023