Your Alexa Dashboards Settings

Manage an HTTP/2 Connection with AVS

The Alexa Voice Service (AVS) exposes an HTTP/2 endpoint and supports cloud-initiated directives, which allow you to access Alexa’s built-in capabilities, such as timers and alarms, media transport controls, voice-controlled volume adjustment, and Amazon Alexa app integration. This page provides instructions for creating and maintaining an HTTP/2 connection with AVS.

Key Terms and Concepts

  • Frame: The basic protocol unit in HTTP/2; each frame serves a different purpose, for example HEADERS and DATA frames form the basis of HTTP requests and responses.
  • Stream: An independent, bidirectional sequence of frames exchanged between a client and server within an HTTP/2 connection. For detailed information, see Streams and Multiplexing in RFC 7540.
  • Interfaces: AVS exposes interfaces (SpeechRecognizer, AudioPlayer, SynchronizeState, etc.) that provide your product access to Alexa’s built-in skills.
  • Downchannel: A stream you create in your HTTP/2 connection, which is used to deliver directives from the cloud to your client. The downchannel remains open in a half-closed state from the device and open from AVS for the life of the connection. The downchannel is primarily used to send cloud-initiated directives and audio attachments to your client.
  • Cloud-initiated Directives: Directives sent from the cloud to your client. For example, when a user adjusts device volume from the Amazon Alexa App, a directive is sent to your product without a corresponding voice request.

Prerequisites

Before creating an HTTP/2 connection with AVS, you’ll need to:

  • Obtain an Access Token

    To access AVS, your product needs to obtain a Login with Amazon (LWA) access token, which grants your product access to the API on a customer’s behalf. There are two methods used to obtain an access token for use with AVS.

    Remote Authorization is used to authorize devices using a companion website or mobile app.

    Local Authorization is used to authorize Alexa directly from an AVS-enabled product.

    The LWA access token you obtain must be sent to AVS in the header of each event. If authentication fails for any reason the connection with AVS is closed.

    The following is a sample header. In addition to your access token, a boundary term is required in the header of each event sent to AVS.

    :method = POST  
    :scheme = https  
    :path = /{{API version}}/events
    authorization = Bearer {{YOUR_ACCESS_TOKEN}}
    content-type = multipart/form-data;  boundary={{BOUNDARY_TERM_HERE}}
    
  • Choose an HTTP/2 Client Library

    The following HTTP/2 client libraries are recommended for use with AVS:

    Language Library
    C / C++ nghttp2
    C / C++ curl and libcurl
    Java OkHttp
    Java Netty
    Java Jetty

    For a complete list of implementations, see GitHub.

Base URL

Region Supported Countries URL
Asia Japan https://avs-alexa-fe.amazon.com
Europe Austria, Germany, India, UK https://avs-alexa-eu.amazon.com
North America Canada, US https://avs-alexa-na.amazon.com

Creating an HTTP/2 Connection

When your product is powered on it should create a single HTTP2 connection with AVS. This connection is used to handle all directives and events, including anything that is sent to your client on the downchannel stream. For additional details regarding connection management, see server-initiated disconnects below.

Maintaining a connection with AVS requires two things:

  1. To establish a downchannel stream your client must make a GET request to /{{API version}}/directives within 10 seconds of opening the connection with AVS. The request should look like this:

    :method = GET  
    :scheme = https  
    :path = /{{API version}}/directives
    authorization = Bearer {{YOUR_ACCESS_TOKEN}}   
    

    Following a successful request, the downchannel stream will remain open in a half-closed state from the client and open from AVS for the life of the connection. It is not uncommon for there to be long pauses between cloud-initiated directives.

  2. After establishing the downchannel stream, your client must synchronize it’s components’ states with AVS. This requires making a POST request to /{{API version}}/events on a new event stream on the existing connection (Note: Do not open a new connection). This event stream should be closed when your client receives a response (directive). The following is an example SynchronizeState event:

    :method = POST  
    :scheme = https  
    :path = /{{API version}}/events
    authorization = Bearer {{YOUR_ACCESS_TOKEN}}
    content-type = multipart/form-data; boundary={{BOUNDARY_TERM_HERE}}  
    
    --{{BOUNDARY_TERM_HERE}}
    Content-Disposition: form-data; name="metadata"  
    Content-Type: application/json; charset=UTF-8  
    
    {  
        "context": [   
           // This is an array of context objects that are used to communicate the
           // state of all client components to Alexa. See Context for details.
        ],  
        "event": {  
            "header": {  
                "namespace": "System",  
                "name": "SynchronizeState",  
                "messageId": "{{STRING}}"  
            },  
            "payload": {  
            }  
        }  
    }  
    
    --{{BOUNDARY_TERM_HERE}}--
    

After synchronizing state, your client should be able to use this connection to:

  • Send events to and receive directives from AVS
  • Receive cloud-initiated directives on the downchannel stream

Maintaining an HTTP/2 Connection

Once you’ve established a connection, it’s important to understand how to manage event streams, the downchannel stream, ping, timeouts, and server-initiated disconnects.

Things to Consider

  • We encourage streaming captured audio to AVS in 10ms chunks at 320 bytes (320 byte DATA frames sent as single units). Larger chunk sizes create unnecessary buffering, which negatively impacts AVS’ ability to process audio and may result in higher latencies.

    All captured audio sent to AVS should be encoded as:

    • 16bit Linear PCM (LPCM16)
    • 16kHz sample rate
    • Single channel
    • Little endian byte order

    For the complete spec, see the SpeechRecognizer Interface.

  • An HTTP2 connection with AVS only supports 10 concurrent streams. This includes event streams, the downchannel, and ping. Please ensure that event streams are being closed as responses are received.

  • Many libraries have a read timeout for how long a client will attempt to read without receiving any data. Since AVS requires a downchannel stream that needs to remain open between AVS and your client for the life of the connection, and this same stream may go prolonged periods without sending any data to your client, it is important that your read timeout is set to at least 60 minutes.

  • If your HTTP2 client has connection pooling or marks connections as idle, it’s important to adjust the timeout so the connection is not disrupted (and if it is disrupted to complete the flow described for creating a connection, which includes: re-establishing the downchannel stream and synchronizing state with AVS). We recommend setting the timeout to at least 60 minutes to ensure your connection is not prematurely closed.

Event Stream Lifecycle

Each new event is sent on its own stream. Typically, these streams will close after Alexa Voice Service has returned directives and corresponding audio attachments to your client.

Requests are handled sequentially. Therefore, new requests should be sent after Alexa has started responding to your previous request (once the previous request returns headers).

  1. Your product opens a stream and sends a multipart message consisting of one JSON-formatted event and up to one binary audio attachment (zero or one). For more information, see Structuring an HTTP/2 Request.
  2. AVS returns multipart messages consisting of one more JSON-formatted directives and corresponding audio attachments on the same stream, potentially before streaming is complete. The url attribute that follows cid: in a Play or Speak directive will also appear in the header of the associated audio attachment.
  3. Following a response from AVS the event stream should be closed.

Downchannel Lifecycle

In parallel, directives may be sent to your client on the downchannel. Primarily, the downchannel is used for cloud-initiated directives.

  1. A GET request is made to the directives path within 10 seconds of creating a connection with AVS.
  2. This stream is used to send your client cloud-initiated directives and audio attachments, such as timers, alarms, and instructions originating from the Amazon Alexa app. Unlike an event stream, the downchannel is not immediately closed and is designed to remain open in a half-closed state from the client and open from AVS for prolonged periods of time.
  3. When the downchannel stream is closed, your client must immediately establish a new downchannel to ensure your client can receive cloud-initiated directives.

Ping and Timeout

Your client must perform one of the following actions, failure to do so will result in a closed connection:

  • Send a PING frame to AVS every 5 minutes when the connection is idle.
  • Make a GET request to /ping every 5 minutes when the connection is idle.

Sample Request

:method = GET  
:scheme = https  
:path = /ping  
authorization = Bearer {{YOUR_ACCESS_TOKEN}}
   

On a failed PING the connection should be closed and a new connection should be immediately created.

Server-initiated Disconnects

When the server initiates a disconnect, your client should:

  1. Open a new connection and route any new requests through it.
  2. Close the old connection after all open requests have been processed and their corresponding streams have been gracefully closed.
  3. Maintain a connection to any stream URL established before the disconnect was initiated (e.g. Amazon Music, Audible, etc.). A stream playing before a server-initiated disconnect occurs should continue to play as long as bytes are available.

If the attempt to create a new connection fails, the client is expected to retry with an exponential back-off.

Next Steps

For information on how to structure a request, see:

Resources