Your Alexa Dashboards Settings

SpeechSynthesizer Interface

When users ask your product a question or make a request, the SpeechSynthesizer interface is used to return Alexa’s speech response. For instance, when a user asks Alexa, “What’s the weather in Seattle?” The Alexa Voice Service will return a Speak directive to your client with a binary audio attachment, which your client should process and play. This page covers SpeechSynthesizer directives and events.

States

SpeechSynthesizer has the following states:

PLAYING: While Alexa is speaking, SpeechSynthesizer should be in a playing state. SpeechSynthesizer should transition to the finished state when playback of Alexa’s speech is complete.

FINISHED: When Alexa is finished speaking, SpeechSynthesizer should transition to the finished state following a SpeechFinished event.

Speak Directive

This directive is sent from AVS to your client any time a speech response from Alexa is required. In most cases, the Speak directive is sent in response to a user request, such as a Recognize event. However, a Speak directive may also be sent to your client to preface an action that will be taken. For instance, when a user makes a request to set a timer, in addition to receiving a SetAlert directive that instructs the client to set an alarm, the client also receives a Speak directive which notifies the user that the timer was successfully set.

This directive is sent to your client as a multipart message: one part a JSON-formatted directive and one binary audio attachment.

Sample Message

{
    "directive": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "Speak",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "url": "{{STRING}}",
            "format": "{{STRING}}",
            "token": "{{STRING}}"
        }
    }
}

Binary Audio Attachment

Each Speak directive will have a corresponding binary audio attachment as one part of the multipart message. The following multipart headers will precede the binary audio attachment:

Content-Type: application/octet-stream
Content-ID: {{Audio Item CID}}

{{BINARY AUDIO ATTACHMENT}}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string
dialogRequestId A unique ID used to correlate directives sent in response to a specific Recognize event. string

Payload Parameters

Parameter Description Type
url A unique identifier for audio content. The URL always follows the prefix cid:.
Example: cid:{{STRING}}
string
format Provides the format of returned audio.
Accepted value: "AUDIO_MPEG"
string
token An opaque token that represents the current text-to-speech (TTS) object. string

SpeechStarted Event

The SpeechStarted event should be sent to AVS after your client processes the Speak directive and begins playback of synthesized speech.

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechStarted",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }
    }
}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string

Payload Parameters

Parameter Description Type
token The opaque token provided by the Speak directive. string

SpeechFinished Event

The SpeechFinished event must be sent after your client processes a Speak directive and Alexa TTS is fully rendered to the user. If playback is not finished, for example a user interrupts Alexa TTS with “Alexa, stop”, then SpeechFinished is not sent.

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechFinished",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }
    }
}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string

Payload Parameters

Parameter Description Type
token The opaque token provided by the Speak directive. string

Additional Interfaces

Jump to the top of this document. Use the sidebar to navigate to additional interfaces.

Resources