Microphone Capability Interface

The Microphone capability comprises the mechanics that enable a device to send the user's speech to AIA.

It defines messages on the directive and event topics, as well as sending binary audio data on the microphone topic.

Getting Started

AIA Envelope

This capability interface is compatible with v1 of the AIA envelope.

Topic Management

To support Microphone 1.0 messages, the device must participate in the directive, event, and microphone MQTT topics.

Capability Assertion

To use the Microphone 1.0 interface, the device must assert support through the Publish message on the capabilities topic.

Sample Object

{
  "type": "AisInterface",
  "interface": "Microphone",
  "version": "1.0",
  "configurations": {
    "audioEncoder": {
      "format": "{{STRING}}"
    }
  }
}
Field Name Description Value Type
audioEncoder Details about audio encoder supported by the device. object
audioEncoder.
  format
The audio format supported.

Accepted Values:
AUDIO_L16_RATE_16000_CHANNELS_1: 16-bit linear PCM, 16-kHz sample rate, single channel, little-endian byte order
string

directive Topic

OpenMicrophone

The OpenMicrophone directive instructs the device to open the microphone and prepare to send the audio stream in subsequent messages on the microphone topic.

The device must respond with a MicrophoneOpened event, unless the microphone was already open (in which case the directive may be ignored).

The device must continue sending audio data on the microphone topic until the device receives a CloseMicrophone directive or the device detects the end of the user's speech. When the device closes the microphone, it must send the MicrophoneClosed event.

Note: If the device is unable to open the microphone before the given timeoutInMilliseconds elapses, it must send the OpenMicrophoneTimedOut event. In that case, it must not send the MicrophoneOpened event.

Sample Message

{
  "header": {
    "name": "OpenMicrophone",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "timeoutInMilliseconds": {{LONG}},
    "initiator": {{OBJECT}}
    }
  }
}

Payload Parameters

Field Name Description Value Type
timeoutInMilliseconds On push-to-talk and similar devices that require the user's interaction to open the microphone, this value specifies how long the device should wait before timing out and sending the OpenMicrophoneTimedOut event.

Note: This does not apply to devices that can automatically open the microphone on receiving this directive.
long
initiator An optional object generated by AIA that specifies how the directive was initiated.

If present, it must be returned verbatim in the corresponding MicrophoneOpened event, if it is sent before timeoutInMilliseconds elapses.

If omitted, or if the timeoutInMilliseconds elapsed, it should not be included in any corresponding or subsequent MicrophoneOpened event.

Note: The device must treat this object as opaque. Its structure may change without warning and without a change to the Microphone interface version number.
object

CloseMicrophone

The CloseMicrophone directive instructs the device to stop sending audio through the microphone topic and close the microphone.

The device must respond with a MicrophoneClosed event, unless the microphone was already closed.

Note: There will be no payload object in this directive.

Sample Message

{
  "header": {
    "name": "CloseMicrophone",
    "messageId": "{{STRING}}"
  }
}

event Topic

MicrophoneOpened

The device must send the MicrophoneOpened event both

  • in response to the OpenMicrophone directive and
  • when the user initiates a voice interaction, such as through an on-device wake word detection or a push-to-talk button press.

The device must send the event as soon as it starts streaming audio on the microphone topic.

Notes:

  • The device must continue streaming audio on the microphone topic until it receives a CloseMicrophone directive or detects the end of the user's speech.
  • If the microphone is already open from a previous MicrophoneOpened event, this event must not be sent again until a MicrophoneClosed event has been sent.
  • If audio is actively playing from the speaker topic when the device detects a new microphone interaction, the device may optionally stop playing the audio. If it does so, it must send a SpeakerClosed event before sending this MicrophoneOpened event.

Sample Message

{
  "header": {
    "name": "MicrophoneOpened",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "profile": "{{STRING}}",
    "initiator": {
      "type": "{{STRING}}",
      "payload": {
        "wakeWord": "{{STRING}}",
        "wakeWordIndices": {
          "beginOffset": {{LONG}},
          "endOffset": {{LONG}}
        }
      }
    },
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
type The Automatic Speech Recognition (ASR) profile associated with the device, reflecting the physical proximity of the user that the device is best suited for.

Accepted Values:
CLOSE_TALK: 0 to 2.5 feet
NEAR_FIELD: 0 to 5 feet
FAR_FIELD: 0 to 20+ feet
string
initiator An object that specifies how the microphone was opened.

If the MicrophoneOpened event is in response to an OpenMicrophone directive (before the timeoutInMilliseconds elapsed, if applicable), the initiator object included in the directive should be returned verbatim here.
object
initiator.
  type
The action taken by the user that caused the microphone to be opened.

Note: This does not apply when the microphone was opened in response to an OpenMicrophone directive. In those cases, the entire initiator object (including this field) from the directive should be returned in this event verbatim.

Accepted Values:
PRESS_AND_HOLD: The user is holding a button to capture audio.
TAP: The user tapped a button to begin audio capture.
WAKEWORD: The on-device wake word engine detected the user's wake word utterance. In this case, 500 milliseconds of pre-roll should be streamed on the microphone topic.
string
initiator.
  payload
An object representing additional details about the initiator.

Note: It should only be included when type is WAKEWORD or when the microphone was opened in response to an OpenMicrophone directive.
object
initiator.
  payload.
    wakeWord
The wake word used to open the microphone, included only when initiator.type is WAKEWORD.

Accepted Value: ALEXA
string
initiator.
  payload.
    wakeWordIndices
Identifies where the wake word is in the audio stream on the microphone topic, included only when initiator.type is WAKEWORD.

Note: This object is required for cloud-based validation of wake word detection. If the cloud determines that a wake word detection was faulty, it will send a CloseMicrophone directive to the device.
object
initiator.
  payload.
    wakeWordIndices.
      beginOffset
The byte offset in the microphone audio stream corresponding to the start of the wake word utterance. long
initiator.
  payload.
    wakeWordIndices.
      endOffset
The byte offset in the microphone audio stream corresponding to the end of the wake word utterance. long
offset The byte offset in the microphone topic's audio stream at which speech started.

Note: The value is inclusive; it is the first byte of audio that contains the user's speech. If initiator.type is WAKEWORD, the value should indicate the first byte of the pre-roll.
long

MicrophoneClosed

The device must send the MicrophoneClosed event whenever it closes the microphone and stops sending an audio stream on the microphone topic.

The device may have closed the microphone as a result of receiving the CloseMicrophone directive, because it detected the end of the user's speech, or because the microphone button was released in a PRESS_AND_HOLD interaction.

If the microphone was already closed and this event was already sent, it should not be sent again in response to a subsequent CloseMicrophone directive.

Sample Message

{
  "header": {
    "name": "MicrophoneClosed",
    "messageId": "{{STRING}}"
  },
  "payload": {
    "offset": {{LONG}}
  }
}

Payload Parameters

Field Name Description Value Type
offset The byte offset in the microphone topic's audio stream at which the microphone was closed.

Note: The value is exclusive; all audio up to, but not including, this offset include user speech.
long

OpenMicrophoneTimedOut

The device must send the OpenMicrophoneTimedOut event if the user did not activate the microphone before the timeoutInMilliseconds elapsed in an OpenMicrophone directive. In this case, the device must not send the MicrophoneOpened event.

This is only applicable on push-to-talk and similar devices that require the user's interaction to open the microphone.

Note: There will be no payload object in this directive.

Sample Message

{
  "header": {
    "name": "OpenMicrophoneTimedOut",
    "messageId": "{{STRING}}"
  }
}

microphone Topic

The device sends binary audio data in the binary stream format to AIA using the microphone topic.

An ongoing stream of sequential audio segments is published to this topic, with each message containing raw linear PCM samples without framing.

Control messages for audio output to the user are communicated through events and directives on the event and directive topics, respectively. The audio data format is specified on the capabilities topic through the Microphone interface's capability assertion.

After the common header that is a part of every message using the v1 envelope format, messages on the microphone topic will include a binary stream header with the values specified below.

Audio Data

The device should publish microphone data in relatively small chunks to reduce the latency of Alexa's response. The device's available memory and latency goals will determine the precise sizes, but the general recommendation is send messages on the microphone topic no more frequently than every 50 milliseconds, consistent with the MQTT publishing rate. This will typically result in audio chunks of about 1.6 KiB in size.

Component Byte Offset Size (Bytes) Name Description
Binary Stream Header 0 4 length The length in bytes of the data in this binary stream message, including only the audio stream header and payload. (The common header and binary stream header should not be included.)

This field is an unsigned 32-bit integer stored in little-endian byte order.
4 1 type The type of the audio binary stream message.

Accepted Values: 0, signifying that the message contains audio from the device's microphone.
5 1 count The 0-indexed number of audio frames in this message.

Accepted Values: 0, signifying that the device is sending raw linear PCM samples without framing.
6 2 reserved These bytes are reserved for alignment and backward-compatible future use. They must be set to binary 0s.
Audio Stream Header 8 8 offset Byte offset for the start of this segment in the ongoing binary audio stream.

This field is an unsigned 64-bit integer stored in little-endian byte order.

The number begins at 0 when a new connection is established, and each connection's stream must have contiguous offsets.
Audio Stream Payload 16 audio Data bytes for this segment of the ongoing binary audio stream captured from the microphone.

The size of this field should be the value specified in the length field minus 8.