Gracias por tu visita. Esta página solo está disponible en inglés.

SpeechSynthesizer Interface (v1.0)

When users ask your product a question or make a request, the SpeechSynthesizer interface is used to return Alexa's speech response. For instance, when a user asks Alexa, "What's the weather in Seattle?" The Alexa Voice Service will return a Speak directive to your client with a binary audio attachment, which your client should process and play. This page covers SpeechSynthesizer directives and events.

States

SpeechSynthesizer has the following states:

PLAYING: While Alexa is speaking, SpeechSynthesizer should be in a playing state. SpeechSynthesizer should transition to the finished state when playback of Alexa's speech is complete.

FINISHED: When Alexa is finished speaking, SpeechSynthesizer should transition to the finished state following a SpeechFinished event.

Capabilities API

To use version 1.0 of the SpeechSynthesizer interface, it must be declared in your call to the Capabilities API. For additional details, see Capabilities API.

Sample Object

{
    "type": "AlexaInterface",
    "interface": "SpeechSynthesizer",
    "version": "1.0"
}

SpeechSynthesizer Context

Alexa expects a client to report playerActivity (state), and the offsetInMilliseconds for the currently playing TTS with each event that requires context.

To learn more about reporting Context, see Context Overview.

Sample Message

{
    "header": {
        "namespace": "SpeechSynthesizer",
        "name": "SpeechState"
    },
    "payload": {
        "token": "{{STRING}}",
        "offsetInMilliseconds": {{LONG}},
        "playerActivity": "{{STRING}}"
    }
}

Payload Parameters

Parameter Description Type
token An opaque token provided in the Speak directive. string
offsetInMilliseconds Identifies the current offset of TTS in milliseconds. long
playerActivity Identifies the component state of SpeechSynthesizer.
Accepted Values: PLAYING or FINISHED
string
Player Activity Description
PLAYING Speech was playing.
FINISHED Speech was finished playing.

Speak Directive

This directive is sent from AVS to your client any time a speech response from Alexa is required. In most cases, the Speak directive is sent in response to a user request, such as a Recognize event. However, a Speak directive may also be sent to your client to preface an action that will be taken. For instance, when a user makes a request to set a timer, in addition to receiving a SetAlert directive that instructs the client to set an alarm, the client also receives a Speak directive which notifies the user that the timer was successfully set.

This directive is sent to your client as a multipart message: one part a JSON-formatted directive and one binary audio attachment.

Sample Message

{
    "directive": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "Speak",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "url": "{{STRING}}",
            "format": "{{STRING}}",
            "token": "{{STRING}}"
        }
    }
}

Binary Audio Attachment

Each Speak directive will have a corresponding binary audio attachment as one part of the multipart message. The following multipart headers will precede the binary audio attachment:

Content-Type: application/octet-stream
Content-ID: {{Audio Item CID}}

{{BINARY AUDIO ATTACHMENT}}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string
dialogRequestId A unique ID used to correlate directives sent in response to a specific Recognize event. string

Payload Parameters

Parameter Description Type
url A unique identifier for audio content. The URL always follows the prefix cid:.
Example: cid:{{STRING}}
string
format Provides the format of returned audio.
Accepted value: "AUDIO_MPEG"
string
token An opaque token that represents the current text-to-speech (TTS) object. string

SpeechStarted Event

The SpeechStarted event should be sent to AVS after your client processes the Speak directive and begins playback of synthesized speech.

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechStarted",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }
    }
}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string

Payload Parameters

Parameter Description Type
token The opaque token provided by the Speak directive. string

SpeechFinished Event

The SpeechFinished event must be sent after your client processes a Speak directive and Alexa TTS is fully rendered to the user. If playback is not finished, for example a user interrupts Alexa TTS with "Alexa, stop", then SpeechFinished is not sent.

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechFinished",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }
    }
}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string

Payload Parameters

Parameter Description Type
token The opaque token provided by the Speak directive. string