Alexa.RTCSessionController Interface
Implement the Alexa.RTCSessionController
interface in your Alexa skill for devices that are capable of real-time communication (RTC). By using the RTCSessionController
interface in your applications, Alexa users can communicate remotely, for example with a visitor at their front door. Users can communicate remotely by using a FireTV or any Echo device, such as an Echo Dot, Echo Plus, Echo Show, or Echo Spot. For more information about security skills, see Smart Home Security Overview.
The RTCSessionController
interface supports 1-way (half duplex) or 2-way (full duplex) communication. For an audio-only scenario, such as an Echo Plus connecting to a front door intercom, communication must be 2-way. For an audio and video scenario, such as an Echo Show connecting to a front door camera, only 1-way video communication is supported, and 1-way or 2-way audio communication is supported.
While you are developing and testing your Alexa skill, you can use the Smart Home debugger to see logs from your WebRTC sessions in real-time. For details, see Smart Home Debugger for WebRTC Skills.
For the list of languages that the RTCSessionController
interface supports, see List of Alexa Interfaces and Supported Languages.
Utterances
When you use the Alexa.RTCSessionController
interface, the voice interaction model is already built for you. Users can start communication with a person next to a real time communication device by talking to their Alexa-enabled device (for example, an Echo Show or FireTV) or by using the microphone icon when they are in live streaming mode.
Users can start conversations by using one of the following utterances:
Alexa, show me the front door camera
Alexa, answer the front door.
Alexa, talk to the front door.
Alexa, talk to the backyard camera.
Alexa, talk to the baby monitor.
Alexa, get the call going with the front door.
Alexa, please call front door.
Alexa, respond to the front door.
Alexa, speak to the front door.
Alexa, talk to my front door camera.
Alexa, talk to the person at the main door.
Users can end conversations by using one of the following utterances:
Alexa, go home.
Alexa, stop.
After the user says one of these utterances, Alexa sends a corresponding directive to your skill.
Overview
Supported communication types
- 1-way (half duplex) communication allows users to communicate in two directions, but not simultaneously. For example:
- A walkie-talkie
- A push-to-talk door intercom
- 2-way (full duplex) communication allows users to communicate in two directions simultaneously. For example:
- A telephone
- A telephone door intercom
Supported resolutions
The supported resolutions are 480p to 1080p.
Prerequisites and SLA requirements
Low latency is critical to an optimal user experience. To use the RTCSessionController
API, you need the following:
-
A minimum timeout of one minute is required.
-
For any offer sent to your skill, you must respond with a SDP answer within six seconds.
-
Minimize the number of ICE candidates to ensure a response time within six seconds.
-
Your device or platform must be WebRTC compliant or support the suite of protocols by WebRTC and all supported resiliency mechanisms used in WebRTC. Specifically,
-
For resource considerations, you must support bundling and rtcp-mux. You use a bundle to send audio and video over the same connection to reduce the number of open sockets.
-
To support full-duplex communication, your device must employ effective algorithms for acoustic echo cancellation (AEC) and noise suppression.
-
To support half-duplex communication, you can use the Push to Talk feature through the typical live view scenario. Declare
isFullDuplexAudioSupported
asfalse
in the discovery response. -
To support video, you must use one of the following video codecs:
- H264 (up to profile high, level 4.1)
-
To support audio, you must use one of the following audio codecs:
- Opus (preferred codec)
- PCMU/G.711
- AAC-LC, HE-AAC
-
For Interactive Connectivity Establishment (ICE) candidates, you can use either UDP or TCP but you must use IPv4. Trickle ICE is not supported, so you must gather all ICE candidates up front and send them in the SDP answer.
Signaling diagram
The RTCSessionController
communication is shown in the following signaling diagram.

Properties
The Alexa.RTCSessionController
interface does not define any reportable properties.
Discovery
You describe endpoints that support Alexa.RTCSessionController
using the standard discovery mechanism described in Alexa.Discovery. In addition, identify if duplex is supported in the configuration of the Alexa.RTCSessionController
capability.
Use CAMERA
or DOORBELL
for the display category. For the full list of display categories, see display categories.
In addition to the usual discovery response fields, for the RTCSessionController
entry in the capabilities array, include a configuration object that contains the following fields.
Field | Description | Type |
---|---|---|
isFullDuplexAudioSupported |
True if the device supports 2-way (full duplex) communication. False if the device supports 1-way (half duplex) communication. The default is false. If your device does not support audio communication, set the value to false and include an `a=sendonly` attribute. | Boolean |
Discover response example
The following example shows a Discover.Response
message for a security camera that supports the Alexa.RTCSessionController
, MediaMetadata, and EndpointHealth interfaces.
{
"event": {
"header": {
"namespace":"Alexa.Discovery",
"name":"Discover.Response",
"payloadVersion": "3",
"messageId": "<message id>"
},
"payload":{
"endpoints":[
{
"endpointId": "<unique ID of the endpoint>",
"manufacturerName": "<the manufacturer name of the endpoint>",
"description": "<a description that is shown in the Alexa app>",
"friendlyName": "My front door camera",
"displayCategories": ["CAMERA"],
"cookie": {},
"capabilities": [
{
"type": "AlexaInterface",
"interface": "Alexa.RTCSessionController",
"version": "3",
"configuration": {
"isFullDuplexAudioSupported": true
}
},
{
"type": "AlexaInterface",
"interface": "Alexa.MediaMetadata",
"version": "3",
"proactivelyReported": true
},
{
"type": "AlexaInterface",
"interface": "Alexa.EndpointHealth",
"version": "3",
"properties": {
"supported": [
{
"name":"connectivity"
}
],
"proactivelyReported": true,
"retrievable": true
}
},
{
"type": "AlexaInterface",
"interface": "Alexa",
"version": "3"
}
]
}
]
}
}
}
Directives
InitiateSessionWithOffer Directive
Support the InitiateSessionWithOffer
directive so that users can initiate a real-time communication session with a front door device.
The following example shows a user utterance:
Alexa, talk to my front door camera
InitiateSessionWithOffer directive payload details
Field | Description | Type |
---|---|---|
sessionId |
The identifier of the session that wants to connect. | A Version 4 UUID. |
offer |
An SDP offer. | String |
InitiateSessionWithOffer directive example
The following example illustrates an InitiateSessionWithOffer
directive that Alexa sends to your skill.
{
"directive": {
"header": {
"namespace": "Alexa.RTCSessionController",
"name": "InitiateSessionWithOffer",
"messageId": "<message id>",
"correlationToken": "<an opaque correlation token>",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "<an OAuth2 bearer token>"
},
"endpointId": "<endpoint id>",
"cookie": {}
},
"payload": {
"sessionId" : "<the session identifier>",
"offer": {
"format" : "SDP",
"value" : "<an SDP offer value>"
}
}
}
}
InitiateSessionWithOffer response event
If you handle a InitiateSessionWithOffer
directive successfully, respond with an AnswerGeneratedForSession
event. You can respond synchronously or asynchronously. If you respond asynchronously, include a correlation token and a scope with an authorization token.
AnswerGeneratedForSession response event payload details
Field | Description | Type |
---|---|---|
answer |
An SDP answer. | String |
AnswerGeneratedForSession response event example
{
"event": {
"header": {
"namespace": "Alexa.RTCSessionController",
"name": "AnswerGeneratedForSession",
"messageId": "<message id>",
"correlationToken": "<an opaque correlation token>",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "<an OAuth2 bearer token>"
},
"endpointId": "<endpoint id>"
},
"payload": {
"answer": {
"format" : "SDP",
"value" : "<an SDP answer value>"
}
}
}
}
InitiateSessionWithOffer directive error handling
If you can't handle a InitiateSessionWithOffer
directive successfully, respond with an Alexa.ErrorResponse event. If the customer needs to configure the camera, return the NOT_SUPPORTED_IN_CURRENT_MODE
error type and include the currentDeviceMode
field with a value of NOT_PROVISIONED
.
SessionConnected Directive
The SessionConnected
directive notifies you that your RTC session is connected.
SessionConnected
directive.SessionConnected directive payload details
Field | Description | Type |
---|---|---|
sessionId |
The identifier for the session from the original InitiateSessionWithOffer directive. |
A Version 4 UUID. |
SessionConnected directive example
The following example illustrates a SessionConnected
directive that Alexa sends to your skill.
{
"directive": {
"header": {
"namespace": "Alexa.RTCSessionController",
"name": "SessionConnected",
"messageId": "<message id>",
"correlationToken": "<an opaque correlation token>",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "<an OAuth2 bearer token>"
},
"endpointId": "<endpoint id>",
"cookie": {}
},
"payload": {
"sessionId" : "<the session identifier>"
}
}
}
SessionConnected response event
If you handle a SessionConnected
directive successfully, respond with an SessionConnected
event. You can respond synchronously or asynchronously. If you respond asynchronously, include a correlation token and a scope with an authorization token.
SessionConnected response event payload details
Field | Description | Type |
---|---|---|
sessionId |
The identifier for the session from the original InitiateSessionWithOffer directive. |
A Version 4 UUID. |
SessionConnected response event example
{
"event": {
"header": {
"namespace": "Alexa.RTCSessionController",
"name": "SessionConnected",
"messageId": "<message id>",
"correlationToken": "<an opaque correlation token>",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "<an OAuth2 bearer token>"
},
"endpointId": "<endpoint id>"
},
"payload": {
"sessionId" : "<the session identifier>"
}
}
}
SessionConnected directive error handling
If you can't handle a SessionConnected
directive successfully, respond with an Alexa.ErrorResponse event.
SessionDisconnected Directive
The SessionDisconnected
directive notifies you that your RTC session is disconnected.
SessionDisconnected directive payload details
Field | Description | Type |
---|---|---|
sessionId |
The identifier for the session from the original InitiateSessionWithOffer directive. |
A Version 4 UUID. |
SessionDisconnected directive example
The following example illustrates a SessionDisconnected
directive that Alexa sends to your skill.
{
"directive": {
"header": {
"namespace": "Alexa.RTCSessionController",
"name": "SessionDisconnected",
"messageId": "<message id>",
"correlationToken": "<an opaque correlation token>",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "<an OAuth2 bearer token>"
},
"endpointId": "<endpoint id>",
"cookie": {}
},
"payload": {
"sessionId" : "<the session identifier>"
}
}
}
SessionDisconnected response event
If you handle a SessionDisconnected
directive successfully, respond with an SessionDisconnected
event. You can respond synchronously or asynchronously. If you respond asynchronously, include a correlation token and a scope with an authorization token.
SessionDisconnected response event payload details
Field | Description | Type |
---|---|---|
sessionId |
The identifier for the session from the original InitiateSessionWithOffer directive. |
A Version 4 UUID. |
SessionDisconnected response event example
{
"event": {
"header": {
"namespace": "Alexa.RTCSessionController",
"name": "SessionDisconnected",
"messageId": "<message id>",
"correlationToken": "<an opaque correlation token>",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "<an OAuth2 bearer token>"
},
"endpointId": "<endpoint id>"
},
"payload": {
"sessionId" : "<the session identifier>"
}
}
}
SessionDisconnected directive error handling
If you can't handle a SessionDisconnected
directive successfully, respond with an Alexa.ErrorResponse event.
State reporting
Alexa sends a ReportState
directive to request information about the state of an endpoint. When Alexa sends a ReportState
directive, you send a StateReport
event in response. The response contains the current state of all of the retrievable properties in the context object. You identify your retrievable properties in your discovery response. For details about state reports, see Understand State and Change Reporting.
Alexa.RTCSessionController
interface does not define any retrievable properties. However, if you also implement other interfaces in the skill for your camera device, you must participate in state reporting for the properties in those interfaces.StateReport response event example
{
"event": {
"header": {
"namespace": "Alexa",
"name": "StateReport",
"messageId": "<message id>",
"correlationToken": "<an opaque correlation token>",
"payloadVersion": "3"
},
"endpoint": {
"endpointId": "<endpoint id>"
},
"payload": {}
},
"context": {
"properties": [
{
"namespace": "Alexa.EndpointHealth",
"name": "connectivity",
"value": {
"value": "OK"
},
"timeOfSample": "2017-02-03T16:20:50.52Z",
"uncertaintyInMilliseconds": 0
}
]
}
}
Change reporting
You send a ChangeReport
event to proactively report changes in the state of an endpoint. You identify the properties that you proactively report in your discovery response. For details about change reports, see Understand State and Change Reporting.
Alexa.RTCSessionController
interface does not define any proactively reportable properties. However, if you also implement other interfaces in the skill for your camera device, you must participate in change reporting for the properties in those interfaces.ChangeReport event example
{
"event": {
"header": {
"namespace": "Alexa",
"name": "ChangeReport",
"messageId": "<message id>",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "<an OAuth2 bearer token>"
},
"endpointId": "<endpoint id>"
},
"payload": {
"change": {
"cause": {
"type": "PERIODIC_POLL"
},
"properties": [
{
"namespace": "Alexa.EndpointHealth",
"name": "connectivity",
"value": {
"value": "UNREACHABLE"
},
"timeOfSample": "2017-02-03T16:20:50.52Z",
"uncertaintyInMilliseconds": 0
}
]
}
}
},
"context": {
"properties": [
]
}
}
Session Description Protocol Offer/Answer Format
The RTCSessionController
interface uses the Session Description Protocol (SDP). For more information, see Session Description Protocol (SDP).
a=sendonly
attribute to hide the microphone.Offer/answer exchange example
v=0
o=- 3747690900 3747690900 IN IP4 0.0.0.0
s=a 2 z
c=IN IP4 0.0.0.0
t=0 0
a=group:BUNDLE audio0 video0
m=audio 1 RTP/SAVPF 96 0
a=candidate:1 1 UDP 2013266430 xxx.xxx.xxx.xxx 8620 typ host
a=candidate:2 1 TCP 1010827775 xxx.xxx.xxx.xxx 45351 typ host tcptype passive
a=candidate:3 2 UDP 2013266429 xxx.xxx.xxx.xxx 50066 typ host
a=candidate:4 2 TCP 1010827774 xxx.xxx.xxx.xxx 65157 typ host tcptype passive
a=candidate:5 2 TCP 1015022078 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:6 1 TCP 1015022079 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=setup:actpass
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=rtpmap:96 opus/48000/2
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=sendrecv
a=mid:audio0
a=ssrc:118039096 cname:user2571875795@host-433aaf59
a=ice-ufrag:AGVf
a=ice-pwd:h3JAYGhIaQ/Nvyaz9dLoz9
a=fingerprint:sha-256 34:D4:54:17:0C:95:2A:79:FF:72:10:21:E9:6E:F3:77:86:2F:8D:6C:33:45:BA:14:1D:43:01:D7:CD:0A:1A:84
m=video 1 RTP/SAVPF 99
a=candidate:4 1 UDP 2013266430 xxx.xxx.xxx.xxx 8620 typ host
a=candidate:5 1 TCP 1015022079 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:4 2 UDP 2013266429 xxx.xxx.xxx.xxx 50066 typ host
a=candidate:6 1 TCP 1010827775 xxx.xxx.xxx.xxx 45351 typ host tcptype passive
a=candidate:5 2 TCP 1015022078 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:6 2 TCP 1010827774 xxx.xxx.xxx.xxx 65157 typ host tcptype passive
b=AS:500
a=setup:actpass
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=rtpmap:99 H264/90000
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=sendrecv
a=mid:video0
a=rtcp-fb:99 nack
a=rtcp-fb:99 nack pli
a=rtcp-fb:99 ccm fir
a=ssrc:3643559644 cname:user2571875795@host-433aaf59
a=ice-ufrag:AGVf
a=ice-pwd:h3JAYGhIaQ/Nvyaz9dLoz9
a=fingerprint:sha-256 34:D4:54:17:0C:95:2A:79:FF:72:10:21:E9:6E:F3:77:86:2F:8D:6C:33:45:BA:14:1D:43:01:D7:CD:0A:1A:84