Microphone Capability Interface
The Microphone capability comprises the mechanics that enable a device to send the user's speech to AIA.
It defines messages on the directive
and event
topics, as well as sending binary audio data on the microphone
topic.
Getting Started
AIA Envelope
This capability interface is compatible with v1
of the AIA envelope.
Topic Management
To support Microphone 1.0 messages, the device must participate in the directive
, event
, and microphone
MQTT topics.
Capability Assertion
To use the Microphone 1.0 interface, the device must assert support through the Publish
message on the capabilities
topic.
Sample Object
{ "type": "AisInterface", "interface": "Microphone", "version": "1.0", "configurations": { "audioEncoder": { "format": "{{STRING}}" } } }
Field Name | Description | Value Type |
---|---|---|
audioEncoder |
Details about audio encoder supported by the device. | object |
audioEncoder. format |
The audio format supported. Accepted Values: AUDIO_L16_RATE_16000_CHANNELS_1 : 16-bit linear PCM, 16-kHz sample rate, single channel, little-endian byte order
|
string |
directive
Topic
OpenMicrophone
The OpenMicrophone
directive instructs the device to open the microphone and prepare to send the audio stream in subsequent messages on the microphone
topic.
The device must respond with a MicrophoneOpened
event, unless the microphone was already open (in which case the directive may be ignored).
The device must continue sending audio data on the microphone
topic until the device receives a CloseMicrophone
directive or the device detects the end of the user's speech. When the device closes the microphone, it must send the MicrophoneClosed
event.
Note: If the device is unable to open the microphone before the given timeoutInMilliseconds
elapses, it must send the OpenMicrophoneTimedOut
event. In that case, it must not send the MicrophoneOpened
event.
Sample Message
{ "header": { "name": "OpenMicrophone", "messageId": "{{STRING}}" }, "payload": { "timeoutInMilliseconds": {{LONG}}, "initiator": {{OBJECT}} } } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
timeoutInMilliseconds |
On push-to-talk and similar devices that require the user's interaction to open the microphone, this value specifies how long the device should wait before timing out and sending the OpenMicrophoneTimedOut event.Note: This does not apply to devices that can automatically open the microphone on receiving this directive. |
long |
initiator |
An optional object generated by AIA that specifies how the directive was initiated. If present, it must be returned verbatim in the corresponding MicrophoneOpened event, if it is sent before timeoutInMilliseconds elapses.If omitted, or if the timeoutInMilliseconds elapsed, it should not be included in any corresponding or subsequent MicrophoneOpened event.Note: The device must treat this object as opaque. Its structure may change without warning and without a change to the Microphone interface version number. |
object |
CloseMicrophone
The CloseMicrophone
directive instructs the device to stop sending audio through the microphone
topic and close the microphone.
The device must respond with a MicrophoneClosed
event, unless the microphone was already closed.
Note: There will be no payload
object in this directive.
Sample Message
{ "header": { "name": "CloseMicrophone", "messageId": "{{STRING}}" } }
event
Topic
MicrophoneOpened
The device must send the MicrophoneOpened
event both
- in response to the
OpenMicrophone
directive and - when the user initiates a voice interaction, such as through an on-device wake word detection or a push-to-talk button press.
The device must send the event as soon as it starts streaming audio on the microphone
topic.
LISTENING
attention state and remaining in that state as long as the microphone is open.The
LISTENING
attention state is entirely controlled by the device. It will not be set via the SetAttentionState
directive. The LISTENING
attention state takes precedence over all other attention states and cannot be overridden by the SetAttentionState
directive until the microphone is actually closed.Notes:
- The device must continue streaming audio on the
microphone
topic until it receives aCloseMicrophone
directive or detects the end of the user's speech. - If the microphone is already open from a previous
MicrophoneOpened
event, this event must not be sent again until aMicrophoneClosed
event has been sent. - If audio is actively playing from the
speaker
topic when the device detects a new microphone interaction, the device may optionally stop playing the audio. If it does so, it must send aSpeakerClosed
event before sending thisMicrophoneOpened
event.
Sample Message
{ "header": { "name": "MicrophoneOpened", "messageId": "{{STRING}}" }, "payload": { "profile": "{{STRING}}", "initiator": { "type": "{{STRING}}", "payload": { "wakeWord": "{{STRING}}", "wakeWordIndices": { "beginOffset": {{LONG}}, "endOffset": {{LONG}} } } }, "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
type |
The Automatic Speech Recognition (ASR) profile associated with the device, reflecting the physical proximity of the user that the device is best suited for. Accepted Values: CLOSE_TALK : 0 to 2.5 feetNEAR_FIELD : 0 to 5 feetFAR_FIELD : 0 to 20+ feet
|
string |
initiator |
An object that specifies how the microphone was opened. If the MicrophoneOpened event is in response to an OpenMicrophone directive (before the timeoutInMilliseconds elapsed, if applicable), the initiator object included in the directive should be returned verbatim here.
|
object |
initiator. type |
The action taken by the user that caused the microphone to be opened. Note: This does not apply when the microphone was opened in response to an OpenMicrophone directive. In those cases, the entire initiator object (including this field) from the directive should be returned in this event verbatim.Accepted Values: PRESS_AND_HOLD : The user is holding a button to capture audio.TAP : The user tapped a button to begin audio capture.WAKEWORD : The on-device wake word engine detected the user's wake word utterance. In this case, 500 milliseconds of pre-roll should be streamed on the microphone topic.
|
string |
initiator. payload |
An object representing additional details about the initiator. Note: It should only be included when type is WAKEWORD or when the microphone was opened in response to an OpenMicrophone directive. |
object |
initiator. payload. wakeWord |
The wake word used to open the microphone, included only when initiator.type is WAKEWORD .Accepted Value: ALEXA
|
string |
initiator. payload. wakeWordIndices |
Identifies where the wake word is in the audio stream on the microphone topic, included only when initiator.type is WAKEWORD .Note: This object is required for cloud-based validation of wake word detection. If the cloud determines that a wake word detection was faulty, it will send a CloseMicrophone directive to the device.
|
object |
initiator. payload. wakeWordIndices. beginOffset |
The byte offset in the microphone audio stream corresponding to the start of the wake word utterance.
|
long |
initiator. payload. wakeWordIndices. endOffset |
The byte offset in the microphone audio stream corresponding to the end of the wake word utterance.
|
long |
offset |
The byte offset in the microphone topic's audio stream at which speech started.Note: The value is inclusive; it is the first byte of audio that contains the user's speech. If initiator.type is WAKEWORD , the value should indicate the first byte of the pre-roll.
|
long |
MicrophoneClosed
The device must send the MicrophoneClosed
event whenever it closes the microphone and stops sending an audio stream on the microphone
topic.
The device may have closed the microphone as a result of receiving the CloseMicrophone
directive, because it detected the end of the user's speech, or because the microphone button was released in a PRESS_AND_HOLD
interaction.
If the microphone was already closed and this event was already sent, it should not be sent again in response to a subsequent CloseMicrophone
directive.
Sample Message
{ "header": { "name": "MicrophoneClosed", "messageId": "{{STRING}}" }, "payload": { "offset": {{LONG}} } }
Payload Parameters
Field Name | Description | Value Type |
---|---|---|
offset |
The byte offset in the microphone topic's audio stream at which the microphone was closed.Note: The value is exclusive; all audio up to, but not including, this offset include user speech. |
long |
OpenMicrophoneTimedOut
The device must send the OpenMicrophoneTimedOut
event if the user did not activate the microphone before the timeoutInMilliseconds
elapsed in an OpenMicrophone
directive. In this case, the device must not send the MicrophoneOpened
event.
This is only applicable on push-to-talk and similar devices that require the user's interaction to open the microphone.
Note: There will be no payload
object in this directive.
Sample Message
{ "header": { "name": "OpenMicrophoneTimedOut", "messageId": "{{STRING}}" } }
microphone
Topic
The device sends binary audio data in the binary stream format to AIA using the microphone
topic.
An ongoing stream of sequential audio segments is published to this topic, with each message containing raw linear PCM samples without framing.
Control messages for audio output to the user are communicated through events and directives on the event
and directive
topics, respectively. The audio data format is specified on the capabilities
topic through the Microphone interface's capability assertion.
After the common header that is a part of every message using the v1
envelope format, messages on the microphone
topic will include a binary stream header with the values specified below.
Audio Data
The device should publish microphone data in relatively small chunks to reduce the latency of Alexa's response. The device's available memory and latency goals will determine the precise sizes, but the general recommendation is send messages on the microphone
topic no more frequently than every 50 milliseconds, consistent with the MQTT publishing rate. This will typically result in audio chunks of about 1.6 KiB in size.
Component | Byte Offset | Size (Bytes) | Name | Description |
---|---|---|---|---|
Binary Stream Header | 0 | 4 | length |
The length in bytes of the data in this binary stream message, including only the audio stream header and payload. (The common header and binary stream header should not be included.) This field is an unsigned 32-bit integer stored in little-endian byte order. |
4 | 1 | type |
The type of the audio binary stream message. Accepted Values: 0 , signifying that the message contains audio from the device's microphone.
|
|
5 | 1 | count |
The 0-indexed number of audio frames in this message. Accepted Values: 0 , signifying that the device is sending raw linear PCM samples without framing.
|
|
6 | 2 | reserved | These bytes are reserved for alignment and backward-compatible future use. They must be set to binary 0s. | |
Audio Stream Header | 8 | 8 | offset |
Byte offset for the start of this segment in the ongoing binary audio stream. This field is an unsigned 64-bit integer stored in little-endian byte order. The number begins at 0 when a new connection is established, and each connection's stream must have contiguous offsets. |
Audio Stream Payload | 16 | audio |
Data bytes for this segment of the ongoing binary audio stream captured from the microphone. The size of this field should be the value specified in the length field minus 8. |