SpeechSynthesizer 1.0

Important: Alexa Voice Service (AVS) developer tools are no longer generally available for Alexa Built-in. Please visit the Works with Alexa program if you are interested in building devices that connect to Alexa.

Warning: This page doesn't describe the latest version of the SpeechSynthesizer interface. If this isn't what you're looking for, use the Version Selector at the top of the page to navigate to the correct version.

When you ask Alexa a question, the SpeechSynthesizer interface returns the appropriate speech response.

For example, if you ask Alexa "What's the weather in Seattle?", your client receives a Speak directive from the Alexa Voice Service (AVS). This directive contains a binary audio attachment with the appropriate answer, which you must process and play.

States

SpeechSynthesizer has the following states:

PLAYING - When Alexa speaks, SpeechSynthesizer is in the PLAYING state. SpeechSynthesizer transitions to the FINISHED state when speech playback completes.
FINISHED - When Alexa finishes speaking, SpeechSynthesizer transitions to the FINISHED state with a SpeechFinished event.

Capability assertion

SpeechSynthesizer 1.0 may be implemented by the device on its own behalf, but not on behalf of any connected endpoints.

New AVS integrations must assert support through Alexa.Discovery, but Alexa will continue to support existing integrations using the Capabilities API.

Sample Object

{
    "type": "AlexaInterface",
    "interface": "SpeechSynthesizer",
    "version": "1.0"
}

Context

For each currently playing TTS that requires context, your client must report playerActivity and offsetInMilliseconds.

To learn more about reporting Context, see Context Overview.

Sample Message

{
    "header": {
        "namespace": "SpeechSynthesizer",
        "name": "SpeechState"
    },
    "payload": {
        "token": "{{STRING}}",
        "offsetInMilliseconds": {{LONG}},
        "playerActivity": "{{STRING}}"
    }
}

Payload Parameters

Parameter	Description	Type
token	An opaque token provided in the `Speak` directive.	string
offsetInMilliseconds	Identifies the current TTS offset in milliseconds.	long
playerActivity	Identifies the component state of `SpeechSynthesizer` Accepted Values: `PLAYING`, `FINISHED` or `INTERRUPTED`	string

Player Activity	Description
`PLAYING`	Speech is playing.
`FINISHED`	Speech finished playing.

Directives

Speak

AVS sends a Speak directive to your client every time Alexa delivers a speech response. There are two different ways to receive a Speak directive, including:

When a user makes a voice request, such as asking Alexa a question. AVS sends a Speak directive to your client after it receives a Recognize event.
When a user preforms an action, such as setting a timer. First, the timer starts with the SetAlert directive. Second, AVS sends a Speak directive to your client, notifying you that the timer started.

Sample Message

The Speak directive is a multipart message containing two different formats – one JSON-formatted directive and one binary audio attachment.

JSON

{
    "directive": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "Speak",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "url": "{{STRING}}",
            "format": "{{STRING}}",
            "token": "{{STRING}}"
        }
    }
}

Binary Audio Attachment

The following multipart headers precede the binary audio attachment.

Content-Type: application/octet-stream
Content-ID: {{Audio Item CID}}

{{BINARY AUDIO ATTACHMENT}}

Header Parameters

Parameter	Description	Type
messageId	A unique ID used to represent a specific message.	string
dialogRequestId	A unique ID used to correlate directives sent in response to a specific `Recognize` event.	string

Payload Parameters

Parameter	Description	Type
url	A unique identifier for audio content. The URL always follows the prefix `cid:`. Example: `cid:{{STRING}}`	string
format	Provides the format of returned audio. Accepted value: "AUDIO_MPEG"	string
token	An opaque token that represents the current text-to-speech (TTS) object.	string

Events

SpeechStarted

Send the SpeechStarted event to AVS after your client processes the Speak directive and begins playback of synthesized speech.

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechStarted",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }
    }
}

Header Parameters

Parameter	Description	Type
messageId	A unique ID used to represent a specific message.	string

Payload Parameters

Parameter	Description	Type
token	The opaque token provided by the `Speak` directive.	string

SpeechFinished

When Alexa finishes speaking, send the SpeechFinished event. Send the event only after Alexa fully processes the Speak directive and finishes rendering the TTS. If a user cancels TTS playback, the SpeechFinished event doesn't send. For example, if a user interrupts the Alexa TTS with "Alexa, stop," send a SpeechFinished event.

Sample Message

  {
      "event": {
          "header": {
              "namespace": "SpeechSynthesizer",
              "name": "SpeechFinished",
              "messageId": "{{STRING}}"
          },
          "payload": {
              "token": "{{STRING}}"
          }
      }
  }

Header Parameters

Parameter	Description	Type
messageId	A unique ID used to represent a specific message.	string

Payload Parameters

Parameter	Description	Type
token	The opaque token provided by the `Speak` directive.	string

Was this page helpful?

Provide feedback

Last updated: Nov 27, 2023