Manage an HTTP/2 Connection with AVS
The Alexa Voice Service (AVS) exposes an HTTP/2 endpoint and supports cloud-initiated directives, which allow you to access Alexa's built-in capabilities, such as timers and alarms, media transport controls, voice-controlled volume adjustment, and Amazon Alexa app integration. This page provides instructions for creating and maintaining an HTTP/2 connection with AVS.
Key Terms and Concepts
- Frame: The basic protocol unit in HTTP/2; each frame serves a different purpose, for example HEADERS and DATA frames form the basis of HTTP requests and responses.
- Stream: An independent, bidirectional sequence of frames exchanged between a client and server within an HTTP/2 connection. For detailed information, see Streams and Multiplexing in RFC 7540.
- Interfaces: AVS exposes interfaces (SpeechRecognizer, AudioPlayer, SynchronizeState, etc.) that provide your product access to Alexa’s built-in skills.
- Downchannel: A stream you create in your HTTP/2 connection, which is used to deliver directives from the cloud to your client. The downchannel remains open in a half-closed state from the device and open from AVS for the life of the connection. The downchannel is primarily used to send cloud-initiated directives and audio attachments to your client.
Note: Your client should only create one downchannel stream per connection.
- Cloud-initiated Directives: Directives sent from the cloud to your client. For example, when a user adjusts device volume from the Amazon Alexa App, a directive is sent to your product without a corresponding voice request.
Prerequisites
Before creating an HTTP/2 connection with AVS, you'll need to:
-
Obtain an Access Token
To access AVS, your product needs to obtain a Login with Amazon (LWA) access token, which grants your product access to the API on a customer's behalf. There are two methods used to obtain an access token for use with AVS.
Remote Authorization is used to authorize devices using a companion website or mobile app.
Local Authorization is used to authorize Alexa directly from an AVS-enabled product.
The LWA access token you obtain must be sent to AVS in the header of each event. If authentication fails for any reason the connection with AVS is closed.
The following is a sample header. In addition to your access token, a boundary term is required in the header of each event sent to AVS.
:method = POST :scheme = https :path = /{{API version}}/events authorization = Bearer {{YOUR_ACCESS_TOKEN}} content-type = multipart/form-data; boundary={{BOUNDARY_TERM_HERE}}
Note: Each product instance must have a uniquedeviceSerialNumber
, which is passed in scope data during authorization. -
Choose an HTTP/2 Client Library
The following HTTP/2 client libraries are recommended for use with AVS:
Language Library C / C++ nghttp2 C / C++ curl and libcurl Java OkHttp Java Netty Java Jetty For a complete list of implementations, see GitHub.
Warning: If using libcurl, your client must make aGET
request to/ping
every 5 minutes to maintain the connection. For details, see Ping and Timeout below.
Base URLs
As of May 22nd, 2019, the default base URLs for AVS have changed. We recommend that all new and existing clients adopt these new URLs; however, the legacy base URLs will continue to be supported.
Base URLs
Region | Supported Countries/Regions | URL |
---|---|---|
Asia | Australia, Japan, New Zealand | https://alexa.fe.gateway.devices.a2z.com |
Europe | Austria, France, Germany, India, Italy, Spain, United Kingdom | https://alexa.eu.gateway.devices.a2z.com |
North America | Canada, Mexico, United States | https://alexa.na.gateway.devices.a2z.com |
Legacy Base URLs
Region | Supported Countries/Regions | URL |
---|---|---|
Asia | Australia, Japan, New Zealand | https://avs-alexa-fe.amazon.com |
Europe | Austria, France, Germany, India, Italy, Spain, United Kingdom | https://avs-alexa-eu.amazon.com |
North America | Canada, Mexico, United States | https://avs-alexa-na.amazon.com |
Creating an HTTP/2 Connection
When your product is powered on it should create a single HTTP2 connection with AVS. This connection is used to handle all directives and events, including anything that is sent to your client on the downchannel stream. For additional details regarding connection management, see server-initiated disconnects below.
Maintaining a connection with AVS requires two things:
- Establishing the downchannel stream
- Synchronizing your product's component states with AVS (SpeechRecognizer, AudioPlayer, Alerts, Speaker, SpeechSynthesizer)
RecognizerState
is only required if your client uses Cloud-Based Wake Word Verification.-
To establish a downchannel stream your client must make a
GET
request to/{{API version}}/directives
within 10 seconds of opening the connection with AVS. The request should look like this::method = GET :scheme = https :path = /{{API version}}/directives authorization = Bearer {{YOUR_ACCESS_TOKEN}}
Following a successful request, the downchannel stream will remain open in a half-closed state from the client and open from AVS for the life of the connection. It is not uncommon for there to be long pauses between cloud-initiated directives.
-
After establishing the downchannel stream, your client must synchronize it's components' states with AVS. This requires making a
POST
request to/{{API version}}/events
on a new event stream on the existing connection (Note: Do not open a new connection). This event stream should be closed when your client receives a response (directive). The following is an exampleSynchronizeState
event::method = POST :scheme = https :path = /{{API version}}/events authorization = Bearer {{YOUR_ACCESS_TOKEN}} content-type = multipart/form-data; boundary={{BOUNDARY_TERM_HERE}} --{{BOUNDARY_TERM_HERE}} Content-Disposition: form-data; name="metadata" Content-Type: application/json; charset=UTF-8 { "context": [ // This is an array of context objects that are used to communicate the // state of all client components to Alexa. See Context for details. ], "event": { "header": { "namespace": "System", "name": "SynchronizeState", "messageId": "{{STRING}}" }, "payload": { } } } --{{BOUNDARY_TERM_HERE}}--
After synchronizing state, your client should be able to use this connection to:
- Send events to and receive directives from AVS
Note: Each event and its associated response are sent on a single event stream. When the response is received the stream should be closed
- Receive cloud-initiated directives on the downchannel stream
Maintaining an HTTP/2 Connection
Once you've established a connection, it's important to understand how to manage event streams, the downchannel stream, ping, timeouts, and server-initiated disconnects.
Things to Consider
Amazon makes the following recommendations for managing and maintaining your HTTP2 connections for your client:
-
Reduce latency: To reduce latency, chunk all captured audio to be streamed to AVS. The stream should contain 10ms of captured audio per chunk for Pulse Code Modulation (PCM) or 20ms for Opus.
Important: Send all captured audio to AVS using either PCM or Opus (using the Opus Speech Encoder) and adhering to the codec specifications in the SpeechRecognizer Interface. -
Concurrent stream limitations: An HTTP2 connection with AVS supports only 10 concurrent streams, including event streams, the downchannel, and pings. Make sure to close event streams as responses are received.
-
Read timeouts: Because AVS requires a downchannel stream to be open between AVS and a client for the life of the connection, set any read timeout for your client to at least 60 minutes.
-
Connection timeouts: If your HTTP2 client has connection pooling or marks connections as idle, adjust the timeout so the connection is not disrupted. If the connection is disrupted, make sure to complete the flow described for creating a connection, which includes re-establishing the downchannel stream and synchronizing state with AVS. Set the timeout to at least 60 minutes to ensure that your connection is not prematurely closed.
Codec Specifications
The following table shows the codec specifications required for your HTTP2 connection. Note that while increasing Opus complexity increases processor load, it can also improve performance.
Specification | PCM | Opus
|
---|---|---|
Number of channels |
Single channel (mono) |
Single channel (mono) |
Sample size |
16-bit linear PCM (LPCM) |
16-bit |
Sample rate |
16 kHz |
16 kHz |
Bitrate |
256 kpbs |
32 kpbs or 64 kpbs, hard constant bitrate |
Byte order |
Little endian |
Little endian |
Frame size |
10 ms |
20 ms |
DATA frame size |
320 bytes |
80 bytes (32kpbs) or 160 bytes (64kpbs) |
Complexity |
N/A |
|
Event Stream Lifecycle
Each new event is sent on its own stream. Typically, these streams will close after Alexa Voice Service has returned directives and corresponding audio attachments to your client.
Requests are handled sequentially. Therefore, new requests should be sent after Alexa has started responding to your previous request (once the previous request returns headers).
- Your product opens a stream and sends a multipart message consisting of one JSON-formatted event and up to one binary audio attachment (zero or one). For more information, see Structuring an HTTP/2 Request.
- AVS returns multipart messages consisting of one more JSON-formatted directives and corresponding audio attachments on the same stream, potentially before streaming is complete. The url attribute that follows
cid:
in aPlay
orSpeak
directive will also appear in the header of the associated audio attachment. - Following a response from AVS the event stream should be closed.
Downchannel Lifecycle
In parallel, directives may be sent to your client on the downchannel. Primarily, the downchannel is used for cloud-initiated directives.
- A
GET
request is made to the directives path within 10 seconds of creating a connection with AVS. - This stream is used to send your client cloud-initiated directives and audio attachments, such as timers, alarms, and instructions originating from the Amazon Alexa app. Unlike an event stream, the downchannel is not immediately closed and is designed to remain open in a half-closed state from the client and open from AVS for prolonged periods of time.
- When the downchannel stream is closed, your client must immediately establish a new downchannel to ensure your client can receive cloud-initiated directives.
Ping and Timeout
Your client must perform one of the following actions, failure to do so will result in a closed connection:
- Send a
PING
frame to AVS every 5 minutes when the connection is idle. - Make a
GET
request to/ping
every 5 minutes when the connection is idle.
Sample Request
:method = GET :scheme = https :path = /ping authorization = Bearer {{YOUR_ACCESS_TOKEN}}
On a failed PING the connection should be closed and a new connection should be immediately created.
GET
request to /ping
every 5 minutes to maintain the connection.Server-initiated Disconnects
When the server initiates a disconnect, your client should:
- Open a new connection and route any new requests through it.
- Close the old connection after all open requests have been processed and their corresponding streams have been gracefully closed.
- Maintain a connection to any stream URL established before the disconnect was initiated (e.g. Amazon Music, Audible, etc.). A stream playing before a server-initiated disconnect occurs should continue to play as long as bytes are available.
If the attempt to create a new connection fails, the client is expected to retry with an exponential back-off.
Next Steps
For information on how to structure a request, see: