Speaker Capability Interface

The Speaker capability comprises the mechanics that enable a device to output audio through a speaker.

It defines messages on the directive and event topics, as well as receipt of binary data on the speaker topic.

Getting Started

AIA Envelope

This capability interface is compatible with v1 of the AIA envelope.

Topic Management

To support Speaker 1.0 messages, the device must participate in the directive, event, and speaker MQTT topics.

Capability Assertion

To use the Speaker 1.0 interface, the device must assert support through the Publish message on the capabilities topic.

Sample Object

{
  "type": "AisInterface",
  "interface": "Speaker",
  "version": "1.0",
  "configurations": {
    "audioBuffer": {
      "sizeInBytes": {{LONG}},
      "reporting": {
        "overrunWarningthreshold": {{LONG}},
        "underrunWarningThreshold": {{LONG}}
      }
    },
    "audioDecoder": {
      "format": "{{STRING}}",
      "bitrate": {
        "type": "{{STRING}}",
        "bitsPerSecond": {{LONG}}
      },
      "numberOfChannels": {{LONG}}
    }
  }
}
Field Name Description Value Type
audioBuffer Details about the on-device buffer that will be used to process speaker audio. object
audioBuffer.
  sizeInBytes
The total size of the audio buffer in bytes. long
audioBuffer.
  reporting
Details about how the on-device audio buffer state will be reported to AIA.

See the BufferStateChanged event for more information.
object
audioBuffer.
  reporting.
    overrunWarningThreshold
The buffer's offset in bytes that will trigger an OVERRUN_WARNING BufferStateChanged event. AIA will slow down the rate at which it's sending audio data in response to this event.

The correct value for this threshold will vary by device, depending on total size of the buffer, network latency, and audio data processing speed. It should be close to the maximum size of the buffer.
long
audioBuffer.
  reporting.
    underrunWarningThreshold
The buffer's offset in bytes that will trigger an UNDERRUN_WARNING BufferStateChanged event. AIA will speed up the rate at which it's sending audio data in response to this event.

The correct value for this threshold will vary by device, depending on total size of the buffer, network latency, and audio data processing speed. It should be as small as possible, while still allowing for uninterrupted audio playback during network delays.
long
audioDecoder Details about the audio decoder supported by the device. object
audioDecoder.
  format
The audio format supported by the device.

Accepted Values: OPUS

Note: Messages on the speaker topic contain raw Opus frames, not Opus in an OGG container.
string
audioDecoder.
  bitrate
Details about the bitrate the audio decoder supports. object
audioDecoder.
  bitrate.
    type
The type of bitrate.

Accepted Values:
CONSTANT: Each audio frame will have the same number of bits per second.
VARIABLE: Individual audio frames may have different numbers of bits per second.
string
audioDecoder.
  bitrate.
     bitsPerSecond
The number of encoded audio stream bits per second that the decoder expects to receive. If type is VARIABLE, this value represents the average bits per second expected.

Accepted Values: 64000, 128000
long
audioDecoder.
  numberOfChannels
The number of audio channels supported by the decoder.

Accepted Values: 1, 2
long

directive Topic

OpenSpeaker

The OpenSpeaker directive instructs the device to open the speaker and prepare to play an audio stream sent in subsequent messages on the speaker topic.

The device must respond with a SpeakerOpened event.

Subsequent audio data sent on the speaker topic must be played until the device receives a CloseSpeaker directive or the user initiates a stop locally via a physical or GUI button or barge-in. After audio playback is stopped through any of these means, the device must send the SpeakerClosed event.

Note: When the device first opens the speaker, its audio buffer may be empty. In this case, the device does not need to send a BufferStateChanged event indicating UNDERRUN. However, if the device does not receive audio data on the speaker topic within 10 seconds of receiving this directive, it should close the AIA connection.

Sample Message

{
  "header": {
    "name": "OpenSpeaker",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
offset Specifies the byte offset in the speaker topic's audio stream at which to start playback. The device should start playing from the audio stream when it reaches this offset, discarding preceding data from that stream.

Note: The value is inclusive and will be greater than or equal to 0; it identifies the first byte of audio that should be played from the stream.
long

CloseSpeaker

The CloseSpeaker directive instructs the device to stop playing audio through the speaker topic and close the speaker.

The device must respond with a SpeakerClosed event.

Note: If the speaker is already closed due to a local stop (eg, caused by MicrophoneOpened or ButtonCommandIssued), the speaker should remain closed, and the device should not send the SpeakerClosed event.

Sample Message

{
  "header": {
    "name": "CloseSpeaker",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
offset Specifies the byte offset in the speaker topic's audio stream at which to stop playback. If the field is omitted or the device has already passed the offset, playback should stop immediately.

Note: The value is inclusive and will be greater than or equal to 0; it identifies the last byte of audio that should be played from the stream.
long

SetVolume

The SetVolume directive instructs the device to adjust its speaker volume. This applies to the binary audio sent over the speaker topic.

The device must respond with a VolumeChanged event.

Note: AIA manages muting and unmuting through this directive, persisting previous volume levels in the cloud service and restoring them as appropriate. There are no separate messages for muting functionality.

Sample Message

{
  "header": {
    "name": "SetVolume",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "volume": {{LONG}}
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
volume The volume level to apply on a scale of 0 (mute) to 100 (maximum output). long
offset Specifies the byte offset in the speaker topic's audio stream at which the specified volume should take effect. The previously set volume should remain active until this offset is reached. If the field is omitted or the device has already passed the offset, the device should change its volume to the new value immediately.

Note: The value is inclusive; it identifies the first byte of audio that should be played with the specified volume level.
long

event Topic

ButtonCommandIssued

The device sends the ButtonCommandIssued event to inform AIA of a user-initiated action related to audio playback control. For example, if a user presses a physical or GUI button to stop audio playback, the device should send this event with the correlative payload.

For button presses that stop or pause audio playback, the device may optionally execute that action immediately, rather than waiting for a CloseSpeaker directive. In that case, the device must also immediately send the SpeakerClosed event and report the last offset played from the audio stream on the speaker topic.

Sample Message

{
  "header": {
    "name": "ButtonCommandIssued",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "command": "{{STRING}}"
  }
}

Payload Parameters

Field Name Description Value Type
command The command issued by the user.

Accepted Values: PLAY, NEXT, PREVIOUS, STOP, PAUSE

Note: The PAUSE value is only applicable to devices that have a pause button or functionality separate from the stop button.
string

SpeakerOpened

The device must send the SpeakerOpened event in response to the OpenSpeaker directive, as soon as it begins playing an audio stream from the speaker topic.

Sample Message

{
  "header": {
    "name": "SpeakerOpened",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
offset The byte offset in the speaker topic's audio stream at which playback was started.

Note: The value is inclusive; it identifies the first byte of audio played from the stream to the user.
long

SpeakerClosed

The device must send the SpeakerClosed event whenever it closes the speaker, as soon as it stops playing an audio stream from the speaker topic.

The device may have closed the speaker as a result of receiving the CloseSpeaker directive or a local trigger, such as a button press (see the ButtonCommandIssued directive) or a user barge-in (see the MicrophoneOpened event).

If the speaker was already closed and this event was already sent, it should not be sent again in response to a subsequent CloseSpeaker directive.

Note: If the audio buffer underruns, the device should send a BufferStateChanged event with the UNDERRUN state, not a SpeakerClosed event.

Sample Message

{
  "header": {
    "name": "SpeakerClosed",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
offset The byte offset in the speaker topic's audio stream at which playback was stopped.

Note: The value is exclusive; all audio up to, but not including, this offset was played to the user.
long

SpeakerMarkerEncountered

The device must send the SpeakerMarkerEncountered event for each marker it encounters in the audio stream of the speaker topic. The event must not be sent until all the audio preceding that marker has been played to the user.

If the marker is the first message in the speaker audio stream, the device must send the SpeakerMarkerEncountered event at the same time it begins playing audio to the user.

These markers and this event give Alexa updated information about audio playback progress.

Sample Message

{
  "header": {
    "name": "SpeakerMarkerEncountered",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "marker": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
marker The value of the marker injected in the speaker audio stream. This will be a 32-bit unsigned integer. long

VolumeChanged

The device must send the VolumeChanged event whenever its volume has changed and in response to the SetVolume directive.

The device may have changed its volume as a result of receiving the SetVolume directive or a local trigger, such as a physical or GUI button.

This event must always be sent in response to the SetVolume directive, even if the volume is the same as the previously set value.

Sample Message

{
  "header": {
    "name": "VolumeChanged",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "volume": {{LONG}},
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
volume The volume level set, on a scale of 0 (mute) to 100 (maximum output). long
offset If the speaker is currently open, this specifies the byte offset in the speaker topic's audio stream at which the specified volume was applied.

Note: The value is inclusive; it identifies the first byte of audio that was played with the specified volume level.
long

speaker Topic

AIA sends binary audio data in the binary stream format to the device using the speaker topic.

An ongoing stream of sequential audio segments is published to this topic, with each message containing one or more audio frames.

Control messages for audio output to the user are communicated through events and directives on the event and directive topics, respectively. The audio data format is specified on the capabilities topic through the Speaker interface's capability assertion.

After the common header that is a part of every message using the v1 envelope format, messages on the speaker topic will include a binary stream header with the values specified below.

Audio Data

Component Byte Offset Size (Bytes) Name Description
Binary Stream Header 0 4 length The length in bytes of the data in this binary stream message, including only the audio stream header and payload. (The common header and binary stream header should not be included.)

This field is an unsigned 32-bit integer stored in little-endian byte order.
4 1 type The type of the audio binary stream message.

Possible Values: 0, signifying that the message contains audio to be played.
5 1 count The 0-indexed number of audio frames in this message.

Possible Values: 0-255, signifying the number of audio frames in the message (1-256, respectively).
6 2 reserved These bytes are reserved for alignment and backward-compatible future use. They will be set to binary 0s.
Audio Stream Header 8 8 offset Byte offset for the start of this segment in the ongoing binary audio stream.

This field is an unsigned 64-bit integer stored in little-endian byte order.

The number begins at 0 when a new connection is established, and each connection's stream will have contiguous offsets.
Audio Stream Payload 16 audio Data bytes for this segment of the ongoing binary audio stream to be played for the user.

The size of this field should be the value specified in the length field minus 8.

Marker Data

Periodically, AIA will send a marker message to your device on the speaker topic, instead of audio stream data. The device must respond with the SpeakerMarkerEncountered event. These markers and the SpeakerMarkerEncountered event give Alexa updated information about audio playback progress.

Component Byte Offset Size (Bytes) Name Description
Binary Stream Header 0 4 length The length in bytes of the data in this binary stream message, including only the audio stream header and payload. (The common header and binary stream header should not be included.)

This field is an unsigned 32-bit integer stored in little-endian byte order.
4 1 type The type of the audio binary stream message.

Possible Values: 1, signifying that the message contains a marker.
5 1 count The 0-indexed number of audio markers in this message.

Note: The value will typically be 0, signifying that there is exactly one marker in the message.
6 2 reserved These bytes are reserved for alignment and backward-compatible future use. They will be set to 0.
Marker Payload 8 4 marker An opaque token formatted as an unsigned 32-bit integer, stored in little-endian byte order.

The device must return this value in the marker field of the SpeakerMarkerEncountered event after all preceding audio in the stream has been played to the user.

Full MQTT Payload Structure Example

The following is an example of the structure of a full MQTT payload on the speaker topic, containing two marker messages and two audio messages, each audio message containing more than one frame. For convenience in offset calculations, it is presented as the first MQTT message on the speaker topic since the last connection was established: all offsets start at 0.

Notes:

  • The example payload in this section is the entire AIA message, which is the payload of the general AWS IoT MQTT message.
  • The index specified in the "Message" column of the table indicates the value used by the index field in control plane messages, such as ExceptionEncountered. Each index number represents a separate binary stream message within the MQTT message.
Message Component Byte Offset Size (Bytes) Name Value Description
Common Header 0 36 Concrete structure and values omitted.
Specific values are immaterial to the example.
Index 0
Marker
Binary Stream Header 36 4 length 4 Because this binary stream message is 1 marker, its size of 4 is represented here.
40 1 type 1 This binary stream message is a marker.
41 1 count 0 This binary stream message contains 1 marker. (This field is 0-indexed.)
42 2 reserved 0 These bytes are reserved for alignment and backward-compatible future use.
Marker Payload 44 4 marker An opaque token formatted as an unsigned 32-bit integer, stored in little-endian byte order.

The device must return this value in the marker field of the SpeakerMarkerEncountered event after all preceding audio in the stream has been played to the user.
Index 1
Audio
Binary Stream Header 48 4 length 308 Because this binary stream message is an audio stream, the sum of the audio stream header's size (8) and all audio stream payloads (2 times 150) is represented here.
52 1 type 0 This binary stream message is an audio stream.
53 1 count 1 This binary stream message contains 2 audio frames. (This field is 0-indexed.)
54 2 reserved 0 These bytes are reserved for alignment and backward-compatible future use.
Audio Stream Header 56 8 offset 0 The first audio stream payload represents the first audio data in this stream.
Audio Stream Payload 1 64 150 audio Data bytes for this segment of the ongoing binary audio stream to be played for the user.
Audio Stream Payload 2 214 150 audio Data bytes for this segment of the ongoing binary audio stream to be played for the user.
Index 2
Marker
Binary Stream Header 364 4 length 4 Because this binary stream message is 1 marker, its size of 4 is represented here.
368 1 type 1 This binary stream message is a marker.
369 1 count 0 This binary stream message contains 1 marker. (This field is 0-indexed.)
370 2 reserved 0 These bytes are reserved for alignment and backward-compatible future use.
Marker Payload 372 4 marker An opaque token formatted as an unsigned 32-bit integer, stored in little-endian byte order.

The device must return this value in the marker field of the SpeakerMarkerEncountered event after all preceding audio in the stream has been played to the user.
Index 3
Audio
Binary Stream Header 376 4 length 458 Because this binary stream message is an audio stream, the sum of the audio stream header's size (8) and all audio stream payloads (3 times 150) is represented here.
380 1 type 0 This binary stream message is an audio stream.
381 1 count 2 This binary stream message contains 3 audio frames. (This stream is 0-indexed.)
382 2 reserved 0 These bytes are reserved for alignment and backward-compatible future use.
Audio Stream Header 384 8 offset 300 The first audio stream payload in this binary stream message comes after two 150-length audio stream payloads in the stream.
Audio Stream Payload 1 392 150 audio Data bytes for this segment of the ongoing binary audio stream to be played for the user.
Audio Stream Payload 2 542 150 audio Data bytes for this segment of the ongoing binary audio stream to be played for the user.
Audio Stream Payload 3 692 150 audio Data bytes for this segment of the ongoing binary audio stream to be played for the user.