SpeechSynthesizer 1.0

When you ask Alexa a question, the SpeechSynthesizer interface returns the appropriate speech response.

For example, if you ask Alexa "What's the weather in Seattle?," your client receives a SpeechSynthesizer.Speak directive from the Alexa Voice Service (AVS). This directive contains a binary audio attachment with the appropriate answer, which you must process and play.

The following sections cover SpeechSynthesizer directives and events.

States

SpeechSynthesizer has the following states:

  • PLAYING - When Alexa speaks, SpeechSynthesizer is in the PLAYING state. SpeechSynthesizer transitions to the FINISHED state when speech playback completes.
  • FINISHED - When Alexa finishes speaking, SpeechSynthesizer transitions to the FINISHED state with a SpeechFinished event.

Capability assertion

SpeechSynthesizer 1.0 may be implemented by the device on its own behalf, but not on behalf of any connected endpoints.

New AVS integrations must assert support through Alexa.Discovery, but Alexa will continue to support existing integrations using the Capabilities API.

Sample Object

{
    "type": "AlexaInterface",
    "interface": "SpeechSynthesizer",
    "version": "1.0"
}

Context

For each currently playing TTS that requires context, your client must report playerActivity and offsetInMilliseconds.

To learn more about reporting Context, see Context Overview.

Sample Message

{
    "header": {
        "namespace": "SpeechSynthesizer",
        "name": "SpeechState"
    },
    "payload": {
        "token": "{{STRING}}",
        "offsetInMilliseconds": {{LONG}},
        "playerActivity": "{{STRING}}"
    }
}

Payload Parameters

Parameter Description Type
token An opaque token provided in the Speak directive. string
offsetInMilliseconds Identifies the current TTS offset in milliseconds. long
playerActivity Identifies the component state of SpeechSynthesizer
Accepted Values: PLAYING, FINISHED or INTERRUPTED
string
Player Activity Description
PLAYING Speech is playing.
FINISHED Speech finished playing.

Directives

Speak

AVS sends a Speak directive to your client every time Alexa delivers a speech response. There are two different ways to receive a Speak directive, including:

  1. When a user makes a voice request, such as asking Alexa a question. AVS sends a Speak directive to your client after it receives a Recognize event.
  2. When a user preforms an action, such as setting a timer. First, the timer starts with the SetAlert directive. Second, AVS sends a Speak directive to your client, notifying you that the timer started.

Sample Message

The Speak directive is a multipart message containing two different formats – one JSON-formatted directive and one binary audio attachment.

JSON

{
    "directive": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "Speak",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "url": "{{STRING}}",
            "format": "{{STRING}}",
            "token": "{{STRING}}"
        }
    }
}

Binary Audio Attachment

The following multipart headers precede the binary audio attachment.

Content-Type: application/octet-stream
Content-ID: {{Audio Item CID}}

{{BINARY AUDIO ATTACHMENT}}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string
dialogRequestId A unique ID used to correlate directives sent in response to a specific Recognize event. string

Payload Parameters

Parameter Description Type
url A unique identifier for audio content. The URL always follows the prefix cid:.
Example: cid:{{STRING}}
string
format Provides the format of returned audio.
Accepted value: "AUDIO_MPEG"
string
token An opaque token that represents the current text-to-speech (TTS) object. string

Events

SpeechStarted

Send the SpeechStarted event to AVS after your client processes the Speak directive and begins playback of synthesized speech.

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechStarted",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }
    }
}

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string

Payload Parameters

Parameter Description Type
token The opaque token provided by the Speak directive. string

SpeechFinished

When Alexa finishes speaking, send the SpeechFinished event. Send the event only after Alexa fully processes the Speak directive and finishes rendering the TTS. If a user cancels TTS playback, the SpeechFinished event doesn't send. For example, if a user interrupts the Alexa TTS with "Alexa, stop," send a SpeechFinished event.

Sample Message

  {
      "event": {
          "header": {
              "namespace": "SpeechSynthesizer",
              "name": "SpeechFinished",
              "messageId": "{{STRING}}"
          },
          "payload": {
              "token": "{{STRING}}"
          }
      }
  }
  

Header Parameters

Parameter Description Type
messageId A unique ID used to represent a specific message. string

Payload Parameters

Parameter Description Type
token The opaque token provided by the Speak directive. string