Enable Wake Word Verification

Important: Alexa Voice Service (AVS) developer tools are no longer generally available for Alexa Built-in. Please visit the Works with Alexa program if you are interested in building devices that connect to Alexa.

Important: Cloud-based wake word verification is required for voice-initiated products.

Cloud-based wake word verification improves wake word accuracy for Alexa Built-in products by reducing false wakes caused by words that sound similar to the wake word. Examples of words that might cause a false wake for "Alexa" include "Alex", "election", "Alexis". Cloud-based wake word verification also detects media mentions of the wake word, such as, the mention of "Alexa" in an Amazon commercial.

The wake word engine on a device performs the initial wake word detection, and then the cloud verifies the wake word. If the cloud detects a false wake, the Alexa Voice Service (AVS) sends a StopCapture directive to the device in the downchannel that instructs it to close the audio stream, and if applicable, to turn off the blue LEDs to indicate that Alexa has stopped listening.

Requirements for Cloud-Based wake word verification

Voice-initiated devices begin to stream user speech to AVS when the wake word engine detects a spoken wake word, such as "Alexa." The stream closes when the user stops speaking or when AVS identifies a user intent, and AVS returns a StopCapture directive to the device. Cloud-based wake word verification has the following requirements:

Wake word – Include the wake word in the stream so that AVS can perform cloud-based wake word verification, which reduces false wakes. If AVS can't detect the wake word detected during cloud-based wake word verification, AVS discards the utterance.
500 milliseconds of pre-roll – Pre-roll is the audio captured before AVS detects the wake word and helps calibrate the ambient noise level of the recording to enhance speech recognition.
User speech – Any user speech that the device captures until receiving a StopCapture directive. This allows AVS to verify the wake word included in the stream, reducing the number of erroneous responses due to false wakes.

To learn how to implement a shared memory ring buffer for writing and reading audio samples and to include the start and stop indices for wake word detection in each Recognize event sent to AVS, see the Requirements for Cloud-Based Wake Word Verification.

Update device code to send RecognizerState

A Context container communicates the state of your device components to AVS. To support cloud-based wake word verification, all voice-initiated products must send a RecognizerState context object with each applicable event.

If your product isn't voice-initiated, the RecognizerState object isn't required.

Example message


{
    "header": {
        "namespace": "SpeechRecognizer",
        "name": "RecognizerState"
    },
    "payload": {
        "wakeword": "ALEXA"
    }
}

Payload parameters

Parameter	Description	Type
wakeword	Identifies the current wake word. Accepted value: "ALEXA"	string

Example

The following example illustrates a SpeechRecognizer.Recognize event with RecognizerState included.

Click to view example+

{
    "context": [
        {
            "header": {
                "namespace": "SpeechRecognizer",
                "name": "RecognizerState"
            },
            "payload": {
                "wakeword": "{{STRING}}"
            }
        },
        {
            "header": {
                "namespace": "AudioPlayer",
                "name": "PlaybackState"
            },
            "payload": {
                "token": "{{STRING}}",
                "offsetInMilliseconds": {{LONG}},
                "playerActivity": "{{STRING}}"
            }
        },
        {
            "header": {
                "namespace": "Alerts",
                "name": "AlertsState"
            },
            "payload": {
                "allAlerts": [
                                  {
                        "token": "{{STRING}}",
                        "type": "{{STRING}}",
                        "scheduledTime": "{{STRING}}"
                    }
                ],
                "activeAlerts": [
                                  {
                        "token": "{{STRING}}",
                        "type": "{{STRING}}",
                        "scheduledTime": "{{STRING}}"
                    }
                ]
            }
        },
        {
            "header": {
                "namespace": "Speaker",
                "name": "VolumeState"
            },
            "payload": {
                "volume": {{LONG}},
                "muted": {{BOOLEAN}}
            }
        },
        {
            "header": {
                "namespace": "SpeechSynthesizer",
                "name": "SpeechState"
            },
            "payload": {
                "token": "{{STRING}}",
                "offsetInMilliseconds": {{LONG}},
                "playerActivity": "{{STRING}}"
            }
        }
    ],
    "event": {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "Recognize",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "profile": "{{STRING}}",
            "format": "{{STRING}}"
        }
    }
}

Implement SpeechRecognizer support for Recognize and ExpectSpeech

The SpeechRecognizer interface implements support for cloud-based wake word recognition through the Recognize event and ExpectSpeech directive.

Recognize event

The Recognize event includes the initiator object, which contains information about the initiation of an interaction with Alexa. If the interaction was voice-initiated, initiator includes the start and stop indices for the wake word. For more details, see the Recognize event reference.

ExpectSpeech directive

The ExpectSpeech directive also includes an initiator object. In a multi-turn scenario, where Alexa requires additional information from the user to complete a request, return the initiator from the directive back to Alexa in the subsequent Recognize event, regardless of how the interaction was initiated. For more details, see the ExpectSpeech directive reference.

Resources

Was this page helpful?

Provide feedback

Last updated: Nov 27, 2023