Speaker Capability Interface
The Speaker capability comprises the mechanics that enable a device to output audio through a speaker.
It defines messages on the directive
and event
topics, as well as receipt of binary data on the speaker
topic.
Getting Started
AIA Envelope
This capability interface is compatible with v1
of the AIA envelope.
Topic Management
To support Speaker 1.0 messages, the device must participate in the directive
, event
, and speaker
MQTT topics.
Capability Assertion
To use the Speaker 1.0 interface, the device must assert support through the Publish
message on the capabilities
topic.
Sample Object
{ "type": "AisInterface", "interface": "Speaker", "version": "1.0", "configurations": { "audioBuffer": { "sizeInBytes": {{LONG}}, "reporting": { "overrunWarningthreshold": {{LONG}}, "underrunWarningThreshold": {{LONG}} } }, "audioDecoder": { "format": "{{STRING}}", "bitrate": { "type": "{{STRING}}", "bitsPerSecond": {{LONG}} }, "numberOfChannels": {{LONG}} } } }
Field Name | Description | Value Type |
---|---|---|
audioBuffer |
Details about the on-device buffer that will be used to process speaker audio. | object |
audioBuffer. sizeInBytes |
The total size of the audio buffer in bytes. | long |
audioBuffer. reporting |
Details about how the on-device audio buffer state will be reported to AIA. See the BufferStateChanged event for more information.
|
object |
audioBuffer. reporting. overrunWarningThreshold |
The buffer's offset in bytes that will trigger an OVERRUN_WARNING BufferStateChanged event. AIA will slow down the rate at which it's sending audio data in response to this event.The correct value for this threshold will vary by device, depending on total size of the buffer, network latency, and audio data processing speed. It should be close to the maximum size of the buffer. |
long |
audioBuffer. reporting. underrunWarningThreshold |
The buffer's offset in bytes that will trigger an UNDERRUN_WARNING BufferStateChanged event. AIA will speed up the rate at which it's sending audio data in response to this event.The correct value for this threshold will vary by device, depending on total size of the buffer, network latency, and audio data processing speed. It should be as small as possible, while still allowing for uninterrupted audio playback during network delays. |
long |
audioDecoder |
Details about the audio decoder supported by the device. | object |
audioDecoder. format |
The audio format supported by the device. Accepted Values: OPUS Note: Messages on the speaker topic contain raw Opus frames, not Opus in an OGG container.
|
string |
audioDecoder. bitrate |
Details about the bitrate the audio decoder supports. | object |
audioDecoder. bitrate. type |
The type of bitrate. Accepted Values: CONSTANT : Each audio frame will have the same number of bits per second.VARIABLE : Individual audio frames may have different numbers of bits per second.
|
string |
audioDecoder. bitrate. bitsPerSecond |
The number of encoded audio stream bits per second that the decoder expects to receive. If type is VARIABLE , this value represents the average bits per second expected.Accepted Values: 64000 , 128000
|
long |
audioDecoder. numberOfChannels |
The number of audio channels supported by the decoder. Accepted Values: 1 , 2
|
long |
directive
Topic
OpenSpeaker
The OpenSpeaker
directive instructs the device to open the speaker and prepare to play an audio stream sent in subsequent messages on the speaker
topic.
The device must respond with a SpeakerOpened
event.
Subsequent audio data sent on the speaker
topic must be played until the device receives a CloseSpeaker
directive or the user initiates a stop locally via a physical or GUI button or barge-in. After audio playback is stopped through any of these means, the device must send the SpeakerClosed
event.
Note: When the device first opens the speaker, its audio buffer may be empty. In this case, the device does not need to send a BufferStateChanged
event indicating UNDERRUN
. However, if the device does not receive audio data on the speaker
topic within 10 seconds of receiving this directive, it should close the AIA connection.
Sample Message
{ "header": { "name": "OpenSpeaker", "messageId": "{{STRING}}" }, "payload": { "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
offset |
Specifies the byte offset in the speaker topic's audio stream at which to start playback. The device should start playing from the audio stream when it reaches this offset, discarding preceding data from that stream.Note: The value is inclusive and will be greater than or equal to 0 ; it identifies the first byte of audio that should be played from the stream.
|
long |
CloseSpeaker
The CloseSpeaker
directive instructs the device to stop playing audio through the speaker
topic and close the speaker.
The device must respond with a SpeakerClosed
event.
Note: If the speaker is already closed due to a local stop (eg, caused by MicrophoneOpened
or ButtonCommandIssued
), the speaker should remain closed, and the device should not send the SpeakerClosed
event.
Sample Message
{ "header": { "name": "CloseSpeaker", "messageId": "{{STRING}}" }, "payload": { "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
offset |
Specifies the byte offset in the speaker topic's audio stream at which to stop playback. If the field is omitted or the device has already passed the offset, playback should stop immediately.Note: The value is inclusive and will be greater than or equal to 0 ; it identifies the last byte of audio that should be played from the stream.
|
long |
SetVolume
The SetVolume
directive instructs the device to adjust its speaker volume. This applies to the binary audio sent over the speaker
topic.
The device must respond with a VolumeChanged
event.
Note: AIA manages muting and unmuting through this directive, persisting previous volume levels in the cloud service and restoring them as appropriate. There are no separate messages for muting functionality.
Sample Message
{ "header": { "name": "SetVolume", "messageId": "{{STRING}}" }, "payload": { "volume": {{LONG}} "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
volume |
The volume level to apply on a scale of 0 (mute) to 100 (maximum output).
|
long |
offset |
Specifies the byte offset in the speaker topic's audio stream at which the specified volume should take effect. The previously set volume should remain active until this offset is reached. If the field is omitted or the device has already passed the offset, the device should change its volume to the new value immediately.Note: The value is inclusive; it identifies the first byte of audio that should be played with the specified volume level. |
long |
event
Topic
ButtonCommandIssued
The device sends the ButtonCommandIssued
event to inform AIA of a user-initiated action related to audio playback control. For example, if a user presses a physical or GUI button to stop audio playback, the device should send this event with the correlative payload.
For button presses that stop or pause audio playback, the device may optionally run that action immediately, rather than waiting for a CloseSpeaker
directive. In that case, the device must also immediately send the SpeakerClosed
event and report the last offset played from the audio stream on the speaker
topic.
Sample Message
{ "header": { "name": "ButtonCommandIssued", "messageId": "{{STRING}}" }, "payload": { "command": "{{STRING}}" } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
command |
The command issued by the user. Accepted Values: PLAY , NEXT , PREVIOUS , STOP , PAUSE Note: The PAUSE value is only applicable to devices that have a pause button or functionality separate from the stop button.
|
string |
SpeakerOpened
The device must send the SpeakerOpened
event in response to the OpenSpeaker
directive, as soon as it begins playing an audio stream from the speaker
topic.
Sample Message
{ "header": { "name": "SpeakerOpened", "messageId": "{{STRING}}" }, "payload": { "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
offset |
The byte offset in the speaker topic's audio stream at which playback was started.Note: The value is inclusive; it identifies the first byte of audio played from the stream to the user. |
long |
SpeakerClosed
The device must send the SpeakerClosed
event whenever it closes the speaker, as soon as it stops playing an audio stream from the speaker
topic.
The device may have closed the speaker as a result of receiving the CloseSpeaker
directive or a local trigger, such as a button press (see the ButtonCommandIssued
directive) or a user barge-in (see the MicrophoneOpened
event).
If the speaker was already closed and this event was already sent, it should not be sent again in response to a subsequent CloseSpeaker
directive.
Note: If the audio buffer underruns, the device should send a BufferStateChanged
event with the UNDERRUN
state, not a SpeakerClosed
event.
Sample Message
{ "header": { "name": "SpeakerClosed", "messageId": "{{STRING}}" }, "payload": { "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
offset |
The byte offset in the speaker topic's audio stream at which playback was stopped.Note: The value is exclusive; all audio up to, but not including, this offset was played to the user. |
long |
SpeakerMarkerEncountered
The device must send the SpeakerMarkerEncountered
event for each marker it encounters in the audio stream of the speaker
topic. The event must not be sent until all the audio preceding that marker has been played to the user.
If the marker is the first message in the speaker
audio stream, the device must send the SpeakerMarkerEncountered
event at the same time it begins playing audio to the user.
These markers and this event give Alexa updated information about audio playback progress.
Sample Message
{ "header": { "name": "SpeakerMarkerEncountered", "messageId": "{{STRING}}" }, "payload": { "marker": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
marker |
The value of the marker injected in the speaker audio stream. This will be a 32-bit unsigned integer.
|
long |
VolumeChanged
The device must send the VolumeChanged
event whenever its volume has changed and in response to the SetVolume
directive.
The device may have changed its volume as a result of receiving the SetVolume
directive or a local trigger, such as a physical or GUI button.
This event must always be sent in response to the SetVolume
directive, even if the volume
is the same as the previously set value.
Sample Message
{ "header": { "name": "VolumeChanged", "messageId": "{{STRING}}" }, "payload": { "volume": {{LONG}}, "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
volume |
The volume level set, on a scale of 0 (mute) to 100 (maximum output).
|
long |
offset |
If the speaker is currently open, this specifies the byte offset in the speaker topic's audio stream at which the specified volume was applied.Note: The value is inclusive; it identifies the first byte of audio that was played with the specified volume level. |
long |
speaker
Topic
AIA sends binary audio data in the binary stream format to the device using the speaker
topic.
An ongoing stream of sequential audio segments is published to this topic, with each message containing one or more audio frames.
Control messages for audio output to the user are communicated through events and directives on the event
and directive
topics, respectively. The audio data format is specified on the capabilities
topic through the Speaker interface's capability assertion.
After the common header that is a part of every message using the v1
envelope format, messages on the speaker
topic will include a binary stream header with the values specified below.
Audio Data
speaker
topic. In this case, the device should continue adding new data to the audio buffer and dropping older audio data when it runs out of space. If the offset given in the OpenSpeaker
directive has already been dropped from the audio buffer, this is considered a fatal case, and the device should close the AIA connection.Component | Byte Offset | Size (Bytes) | Name | Description |
---|---|---|---|---|
Binary Stream Header | 0 | 4 | length |
The length in bytes of the data in this binary stream message, including only the audio stream header and payload. (The common header and binary stream header should not be included.) This field is an unsigned 32-bit integer stored in little-endian byte order. |
4 | 1 | type |
The type of the audio binary stream message. Possible Values: 0 , signifying that the message contains audio to be played.
|
|
5 | 1 | count |
The 0-indexed number of audio frames in this message. Possible Values: 0 -255 , signifying the number of audio frames in the message (1-256, respectively).
|
|
6 | 2 | reserved | These bytes are reserved for alignment and backward-compatible future use. They will be set to binary 0s. | |
Audio Stream Header | 8 | 8 | offset |
Byte offset for the start of this segment in the ongoing binary audio stream. This field is an unsigned 64-bit integer stored in little-endian byte order. The number begins at 0 when a new connection is established, and each connection's stream will have contiguous offsets. |
Audio Stream Payload | 16 | audio |
Data bytes for this segment of the ongoing binary audio stream to be played for the user. The size of this field should be the value specified in the length field minus 8. |
Marker Data
Periodically, AIA will send a marker message to your device on the speaker
topic, instead of audio stream data. The device must respond with the SpeakerMarkerEncountered
event. These markers and the SpeakerMarkerEncountered
event give Alexa updated information about audio playback progress.
Component | Byte Offset | Size (Bytes) | Name | Description |
---|---|---|---|---|
Binary Stream Header | 0 | 4 | length |
The length in bytes of the data in this binary stream message, including only the audio stream header and payload. (The common header and binary stream header should not be included.) This field is an unsigned 32-bit integer stored in little-endian byte order. |
4 | 1 | type |
The type of the audio binary stream message. Possible Values: 1 , signifying that the message contains a marker.
|
|
5 | 1 | count |
The 0-indexed number of audio markers in this message. Note: The value will typically be 0 , signifying that there is exactly one marker in the message.
|
|
6 | 2 | reserved | These bytes are reserved for alignment and backward-compatible future use. They will be set to 0. | |
Marker Payload | 8 | 4 | marker |
An opaque token formatted as an unsigned 32-bit integer, stored in little-endian byte order. The device must return this value in the marker field of the SpeakerMarkerEncountered event after all preceding audio in the stream has been played to the user.
|
Full MQTT Payload Structure Example
The following is an example of the structure of a full MQTT payload on the speaker
topic, containing two marker messages and two audio messages, each audio message containing more than one frame. For convenience in offset calculations, it is presented as the first MQTT message on the speaker
topic since the last connection was established: all offsets start at 0
.
Notes:
- The example payload in this section is the entire AIA message, which is the payload of the general AWS IoT MQTT message.
- The index specified in the "Message" column of the table indicates the value used by the
index
field in control plane messages, such asExceptionEncountered
. Each index number represents a separate binary stream message within the MQTT message.
Message | Component | Byte Offset | Size (Bytes) | Name | Value | Description |
---|---|---|---|---|---|---|
Common Header | 0 | 36 |
Concrete structure and values omitted. Specific values are immaterial to the example. |
|||
Index 0 Marker |
Binary Stream Header | 36 | 4 | length | 4 |
Because this binary stream message is 1 marker, its size of 4 is represented here.
|
40 | 1 | type | 1 | This binary stream message is a marker. | ||
41 | 1 | count | 0 | This binary stream message contains 1 marker. (This field is 0-indexed.) | ||
42 | 2 | reserved | 0 | These bytes are reserved for alignment and backward-compatible future use. | ||
Marker Payload | 44 | 4 | marker |
An opaque token formatted as an unsigned 32-bit integer, stored in little-endian byte order. The device must return this value in the marker field of the SpeakerMarkerEncountered event after all preceding audio in the stream has been played to the user.
|
||
Index 1 Audio |
Binary Stream Header | 48 | 4 | length | 308 |
Because this binary stream message is an audio stream, the sum of the audio stream header's size (8 ) and all audio stream payloads (2 times 150 ) is represented here.
|
52 | 1 | type | 0 | This binary stream message is an audio stream. | ||
53 | 1 | count | 1 | This binary stream message contains 2 audio frames. (This field is 0-indexed.) | ||
54 | 2 | reserved | 0 | These bytes are reserved for alignment and backward-compatible future use. | ||
Audio Stream Header | 56 | 8 | offset | 0 | The first audio stream payload represents the first audio data in this stream. | |
Audio Stream Payload 1 | 64 | 150 | audio | Data bytes for this segment of the ongoing binary audio stream to be played for the user. | ||
Audio Stream Payload 2 | 214 | 150 | audio | Data bytes for this segment of the ongoing binary audio stream to be played for the user. | ||
Index 2 Marker |
Binary Stream Header | 364 | 4 | length | 4 |
Because this binary stream message is 1 marker, its size of 4 is represented here.
|
368 | 1 | type | 1 | This binary stream message is a marker. | ||
369 | 1 | count | 0 | This binary stream message contains 1 marker. (This field is 0-indexed.) | ||
370 | 2 | reserved | 0 | These bytes are reserved for alignment and backward-compatible future use. | ||
Marker Payload | 372 | 4 | marker |
An opaque token formatted as an unsigned 32-bit integer, stored in little-endian byte order. The device must return this value in the marker field of the SpeakerMarkerEncountered event after all preceding audio in the stream has been played to the user.
|
||
Index 3 Audio |
Binary Stream Header | 376 | 4 | length | 458 |
Because this binary stream message is an audio stream, the sum of the audio stream header's size (8 ) and all audio stream payloads (3 times 150 ) is represented here.
|
380 | 1 | type | 0 | This binary stream message is an audio stream. | ||
381 | 1 | count | 2 | This binary stream message contains 3 audio frames. (This stream is 0-indexed.) | ||
382 | 2 | reserved | 0 | These bytes are reserved for alignment and backward-compatible future use. | ||
Audio Stream Header | 384 | 8 | offset | 300 |
The first audio stream payload in this binary stream message comes after two 150 -length audio stream payloads in the stream.
|
|
Audio Stream Payload 1 | 392 | 150 | audio | Data bytes for this segment of the ongoing binary audio stream to be played for the user. | ||
Audio Stream Payload 2 | 542 | 150 | audio | Data bytes for this segment of the ongoing binary audio stream to be played for the user. | ||
Audio Stream Payload 3 | 692 | 150 | audio | Data bytes for this segment of the ongoing binary audio stream to be played for the user. |