Alexa.RTCSessionController Interface

The Alexa.RTCSessionController interface describes the messages used by Alexa to interact with endpoints capable of real-time communnication (RTC). The RTCSessionController interface supports 1-way (half duplex) or 2-way (full duplex) communication over audio and video. By using the RTCSessionController interface in your applications, Alexa customers can communicate with a visitor at their front door through their camera and intercom. For more information, see Announcing 2-Way Communication APIs.

Overview

Supported Communication Types

  • 1-way (half duplex) communication allows customers to communicate in two directions, but not simultaneously. For example:
    • A walkie-talkie
    • A push-to-talk door intercom
  • 2-way (full duplex) communication allows customers to communicate in two directions simultaneously. For example:
    • A telephone
    • A telephone door intercom

Utterances

Customers can start communication with a person next to a real time communication device by talking to their Alexa-enabled device (for example, an Echo Show or Echo Spot) or by using the microphone icon when they are in live streaming mode.

Customers can start conversations by saying one of the following:

User: Alexa, answer the front door
User: Alexa, get the call going with the front door
User: Alexa, please call front door
User: Alexa, respond to the front door
User: Alexa, speak to the front door
User: Alexa, talk to my front door camera
User: Alexa, talk to the front door
User: Alexa, talk to the person at the main door

Customers can end conversations by saying one of the following:

User: Alexa, go home
User: Alexa, stop

Prerequisites and SLA Requirements

To use the RTCSessionController API, you need the following:

  • A minimum timeout of one minute is required.

  • For any offer sent to your skill, you must generate an answer within six seconds.

  • Your device or platform must be WebRTC compliant or support the suite of protocols by WebRTC and all supported resiliency mechanisms used in WebRTC. Specifically,

    • Negative acknowledgement (NACK)
    • Picture loss indication (PLI)
    • Full intra request (FIR)
    • Receiver estimated maximum bitrate (REMB)
  • For resource considerations, you must support bundling and rtcp-mux. You use a bundle to send audio and video over the same connection to reduce the number of open sockets.

  • To support full-duplex communication, your device must employ effective algorithms for acoustic echo cancellation (AEC) and noise suppression.

  • To support half-duplex communication, you can use the Push to Talk feature through the typical live view scenario. Declare isFullDuplexAudioSupported as false in the discovery response.

  • To support video, you must use one of the following video codecs:

    • H264 (up to profile high, level 4.1)
  • To support audio, you must use one of the following audio codecs:

    • Opus (preferred codec)
    • PCMU/G.711
    • AAC-LC, HE-AAC
  • For Interactive Connectivity Establishment (ICE) candidates, you can use either UDP or TCP but you must use IPv4.

Signaling Diagram

The RTCSessionController communication is shown in the following signaling diagram.

Diagram showing order of directives and events for RTCSessionController communication

Discovery

When you respond to a discovery request for a skill that controls a real-time communications device, you describe endpoints that support the Alexa.RTCSessionController interface. Use the standard discovery mechanism described in Alexa.Discovery, as shown in the following example:

Discover.Response example containing RTCSessionController

{
    "event": {
      "header": {
        "namespace":"Alexa.Discovery",
        "name":"Discover.Response",
        "payloadVersion":"3",
        "messageId":"ff746d98-ab02-4c9e-9d0d-b44711658414"
      },
      "payload":{
        "endpoints":[
          {
            "manufacturerName": "Sample Manufacturer",
            "modelName": "Sample Model",
            "friendlyName": "My front door camera",
            "description": "A smart front door camera",
            "displayCategories": [ "CAMERA" ],
            "cookie": {
                "key1": "Arbitrary key/value pairs for skill to reference this endpoint",
                "key2": "There can be multiple entries",
                "key3": "Use only for reference",
                "key4": "Do not use to maintain endpoint state"
            },
            "capabilities":
            [
              {
                "type": "AlexaInterface",
                "interface": "Alexa.RTCSessionController",
                "version": "3",
                "configuration": {
                  "isFullDuplexAudioSupported": true
                }
              }
            ]
          }
        ]
      }
    }
}

Payload details

Field Description Type Required
manufacturerName The name of the manufacturer of the device. string Yes
modelName The model name of the device. string No, but strongly recommended
friendlyName A friendly name for the device. string Yes
description A description of the device. string Yes
displayCategories The categories for the skill. Use CAMERA or DOORBELL. An array of strings. Yes
isFullDuplexAudioSupported True if the device supports 2-way (full duplex) communication. False if the device supports 1-way (half duplex) communication. The default is false. boolean No

Directives

InitiateSessionWithOffer Directive

Initiate a real-time communication session with a front door device.

User: Alexa, talk to my front door camera

InitiateSessionWithOffer directive example

{
    "directive": {
        "header": {
          "namespace": "Alexa.RTCSessionController",
          "name": "InitiateSessionWithOffer",
          "messageId": "d1ba3aa7-bff7-4406-9425-f25f04ec8d68",
          "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
          "payloadVersion": "3"
        },
        "endpoint": {
          "scope": {
              "type": "BearerToken",
              "token": "access-token-from-skill"
            },
            "endpointId": "appliance-001",
            "cookie": {
                "keys": "key/value pairs received during discovery",
              }
        },
        "payload": {
          "sessionId" : "the session identifier",
          "offer": {
             "format" : "SDP",
             "value" : "<SDP offer value>"
          }
        }
    }
}

Payload details

Field Description Type Required
sessionId The identifier of the session that wants to connect. A Version 4 UUID Yes
offer An SDP offer. string Yes

SessionConnected Directive

The directive to connect an RTC session. The payload for this message contains the identifier for the RTC session, received from the original InitiateOfferWithSession directive.

SessionConnected directive example

{
    "directive": {
        "header": {
          "namespace": "Alexa.RTCSessionController",
          "name": "SessionConnected",
          "messageId": "d1ba3aa7-bff7-4406-9425-f25f04ec8d68",
          "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
          "payloadVersion": "3"
        },
        "endpoint": {
          "scope": {
              "type": "BearerToken",
              "token": "access-token-from-skill"
          },
          "endpointId": "appliance-001",
          "cookie": {
              "keys": "key/value pairs received during discovery",
            }
        },
        "payload": {
             "sessionId" : "session identifier"
         }
    }
}

Payload details

Field Description Type Required
sessionId The identifier of the session that wants to connect. A Version 4 UUID Yes

SessionDisconnected Directive

The directive to disconnect an RTC session. The payload for this message contains the identifier for the RTC session, received from the original InitiateOfferWithSession directive.

SessionDisconnected directive example

{
    "directive": {
        "header": {
          "namespace": "Alexa.RTCSessionController",
          "name": "SessionDisconnected",
          "messageId": "d1ba3aa7-bff7-4406-9425-f25f04ec8d68",
          "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
          "payloadVersion": "3"
        },
        "endpoint": {
          "scope": {
            "type": "BearerToken",
            "token": "access-token-from-skill"
          },
          "endpointId": "appliance-001",
          "cookie": {
              "keys": "key/value pairs received during discovery",
            }
        },
        "payload": {
            "sessionId" : "session identifier"
        }
    }
}

Payload details

Field Description Type Required
sessionId The identifier of the session that wants to disconnect. A Version 4 UUID Yes

Properties and Events

Properties

There are no reportable properties currently defined for this interface.

AnswerGeneratedForSession Event

If the InitiateOfferWithSession directive was successfully handled, you should respond with a AnswerGeneratedForSession event. The payload for this message contains an SDP answer.

AnswerGeneratedForSession event example

{
    "event": {
        "header": {
            "namespace": "Alexa.RTCSessionController",
            "name": "AnswerGeneratedForSession",
            "messageId": "30d2cd1a-ce4f-4542-aa5e-04bd0a6492d5",
            "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
            "payloadVersion": "3"
        },
        "endpoint": {
            "endpointId" : "appliance-001",
        },
        "payload": {
            "answer": {
                "format" : "SDP",
                "value" : "<SDP answer value>"
            }
        }
    }
}

Payload details

Field Description Type Required
answer An SDP answer. string Yes

SessionConnected Event

If the SessionConnected directive was successfully handled, you should respond with a SessionConnected event. The payload for this message contains the identifier for the RTC session, received from the original InitiateOfferWithSession directive.

SessionConnected event example

{
  "event": {
    "header": {
      "namespace": "Alexa.RTCSessionController",
      "name": "SessionConnected",
      "messageId": "30d2cd1a-ce4f-4542-aa5e-04bd0a6492d5",
      "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
      "payloadVersion": "3"
    },
    "endpoint": {
       "endpointId" :  "appliance-001" ,
    },
    "payload": {
        "sessionId" : "session identifier"
    }
  }
}

Payload details

Field Description Type Required
sessionId The identifier of the session that was connected. A Version 4 UUID Yes

SessionDisconnected Event

If the SessionDisconnected directive was successfully handled, you should respond with a SessionDisconnected event. The payload for this message contains the identifier for the RTC session, received from the original InitiateOfferWithSession directive.

SessionDisconnected event example

{
  "event": {
    "header": {
      "namespace": "Alexa.RTCSessionController",
      "name": "SessionDisconnected",
      "messageId": "30d2cd1a-ce4f-4542-aa5e-04bd0a6492d5",
      "correlationToken": "dFMb0z+PgpgdDmluhJ1LddFvSqZ/jCc8ptlAKulUj90jSqg==",
      "payloadVersion": "3"
    },
    "endpoint": {
       "endpointId" : "appliance-001"
    },
    "payload": {
        "sessionId" : "session identifier"
    }
  }
}

Payload details

Field Description Type Required
sessionId The identifier of the session that was disconnected. A Version 4 UUID Yes

Session Description Protocol Offer/Answer Format

The RTCSessionController interface uses the Session Description Protocol (SDP). For more information, see Session Description Protocol (SDP).

Offer/answer exchange example

v=0
o=- 3747690900 3747690900 IN IP4 0.0.0.0
s=a 2 z
c=IN IP4 0.0.0.0
t=0 0
a=group:BUNDLE audio0 video0
m=audio 1 RTP/SAVPF 96 0
a=candidate:1 1 UDP 2013266430 xxx.xxx.xxx.xxx 8620 typ host
a=candidate:2 1 TCP 1010827775 xxx.xxx.xxx.xxx 45351 typ host tcptype passive
a=candidate:3 2 UDP 2013266429 xxx.xxx.xxx.xxx 50066 typ host
a=candidate:4 2 TCP 1010827774 xxx.xxx.xxx.xxx 65157 typ host tcptype passive
a=candidate:5 2 TCP 1015022078 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:6 1 TCP 1015022079 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=setup:actpass
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=rtpmap:96 opus/48000/2
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=sendrecv
a=mid:audio0
a=ssrc:118039096 cname:user2571875795@host-433aaf59
a=ice-ufrag:AGVf
a=ice-pwd:h3JAYGhIaQ/Nvyaz9dLoz9
a=fingerprint:sha-256 34:D4:54:17:0C:95:2A:79:FF:72:10:21:E9:6E:F3:77:86:2F:8D:6C:33:45:BA:14:1D:43:01:D7:CD:0A:1A:84
m=video 1 RTP/SAVPF 99
a=candidate:4 1 UDP 2013266430 xxx.xxx.xxx.xxx 8620 typ host
a=candidate:5 1 TCP 1015022079 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:4 2 UDP 2013266429 xxx.xxx.xxx.xxx 50066 typ host
a=candidate:6 1 TCP 1010827775 xxx.xxx.xxx.xxx 45351 typ host tcptype passive
a=candidate:5 2 TCP 1015022078 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:6 2 TCP 1010827774 xxx.xxx.xxx.xxx 65157 typ host tcptype passive
b=AS:500
a=setup:actpass
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=rtpmap:99 H264/90000
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=sendrecv
a=mid:video0
a=rtcp-fb:99 nack
a=rtcp-fb:99 nack pli
a=rtcp-fb:99 ccm fir
a=ssrc:3643559644 cname:user2571875795@host-433aaf59
a=ice-ufrag:AGVf
a=ice-pwd:h3JAYGhIaQ/Nvyaz9dLoz9
a=fingerprint:sha-256 34:D4:54:17:0C:95:2A:79:FF:72:10:21:E9:6E:F3:77:86:2F:8D:6C:33:45:BA:14:1D:43:01:D7:CD:0A:1A:84

Error Handling

You should reply with an error if you cannot complete the customer request for some reason. For more details, see Alexa.ErrorResponse.

Interface Description
Alexa.CameraStreamController Describes the messages used retrieve camera streams from camera endpoints.
Alexa.DoorbellEventSource An endpoint that is capable of raising doorbell events.
Alexa.MotionSensor Describes an endpoint that senses physical movement in an area.