Alexa.CameraStreamController Interface 3


Implement the Alexa.CameraStreamController interface in your Alexa skill so that users can view the feed from their security cameras. If your security camera device is capable of real-time communication (RTC), implement the Alexa.RTCSessionController interface instead. For more details about security skills, see Smart Home Security Overview.

For the list of languages that the CameraStreamController interface supports, see List of Alexa Interfaces and Supported Languages. For the definitions of the message properties, see Alexa Interface Message and Property Reference.

Utterances

When you use the Alexa.CameraStreamController interface, the Alexa service provides the voice interaction model for you. After the user says one of these utterances, Alexa sends a corresponding directive or report state request to your skill.

The following examples show some user utterances:

Alexa, show the front door camera.
Alexa, stop camera.
Alexa, hide camera.

Alexa, montre la caméra de la porte d'entrée.
Alexa, arrête la caméra.
Alexa, cache la caméra.

Alexa, zeige die Haustür-Kamera.
Alexa, stoppe Kamera.
Alexa, blende Kamera aus.

Alexa, सामने के दरवाज़े का कैमरा दिखाओ
Alexa, कैमरा बंद करो
Alexa, कैमरा छिपाओ

Alexa, mostra la videocamera della porta d'ingresso.
Alexa, chiudi la videocamera.
Alexa, nascondi la videocamera.

アレクサ、玄関のカメラを見せて
アレクサ、カメラを止めて
アレクサ、カメラを隠して

Alexa, mostra a câmera da porta da frente.
Alexa, oculte a câmera.
Alexa, esconda a câmera.

Alexa, muestra la cámara de la puerta principal.
Alexa, deten la cámara.
Alexa, oculta la cámara.
Alexa, esconde la cámara.

Real time streaming protocol usage

You can build an Alexa skill for cloud-enabled cameras that stream video and audio by using the Real Time Streaming Protocol (RTSP).

Prerequisites and SLA requirements

Low latency is critical to an optimal user experience. To use the CameraStreamController API, you need the following:

  • RTSP + RTP streaming protocol.

  • Interleaved TCP on port 443 (for both RTP and RTSP).

  • TCP socket encryption on port 443 using TLS 1.2.

  • Amazon requires the following RTSP commands, DESCRIBE, SETUP, PLAY, and TEARDOWN, and recommends a full RFC compliant implementation.

  • All RTSP URI responses must occur within six seconds after you receive the request.

  • Your Lambda skill must respond to requests within six seconds. For the best user experience, respond within one second. Perform operations, such as waking a camera to begin streaming, asynchronously as background tasks.

  • Network conditions should dictate the frame rate. If network conditions are good, stream at a higher frame rate. If network conditions are poor, stream at a lower frame rate. Lower frame rates might introduce motion blur, but still allows streaming to occur.

  • Under good network conditions, the first frame should render on a device with within six seconds from when the TLS handshake completes. Optimize startup latency by adjusting key frame rates and buffer times of the stream.

Local and remote execution recommendations

You can return a local URI on the same network as your device, or you can return a remote URI accessible from anywhere with an Internet connection. You should return a URI that makes the most sense for your device cloud configuration. Whether you return a local or remote URI, you must meet all requirements including the use of TLS 1.2.

In general, a URI isn't reachable both locally and remotely by default. You can make the URI accessible locally and remotely through domain purchasing or port forwarding. These solutions are technically challenging, so provide this solution only if your customers need both local and remote URI access.

Supported streaming protocols

You can use the following supported protocol values: RTSP, HLS.

Supported authorization types

You can use the following supported authorization type values: BASIC, DIGEST, NONE.

Supported video codecs

You can use the following supported video codec values: H264, MPEG2, MJPEG, JPG.

Supported audio codecs

You can use the following supported audio codec values: G711, AAC, NONE.

Supported resolutions

The supported resolutions are 480p - 1080p.

Properties and objects

CameraStream object

The Alexa.CameraStreamController interface uses the CameraStream object to represent a camera stream.

The following example shows a camera stream.

Copied to clipboard.

{
  "uri": "rtsp://username:password@link.to.video:443/feed1.mp4",
  "expirationTime": "2017-02-03T16:20:50.52Z",
  "idleTimeoutSeconds": 30,
  "protocol": "RTSP",
  "resolution": {
    "width": 1920,
    "height": 1080
  },
  "authorizationType": "BASIC",
  "videoCodec": "H264",
  "audioCodec": "AAC"
  }

The CameraStream object includes the following properties.

Property Description Type
uri URI for the camera stream. For a temporary URI, specify an expiration time in the expirationTime field. If the URI expires, and an error occurs, Alexa calls InitializeCameraStreams again to get a new, unexpired URI. String
expirationTime Time that the stream expires.
Defined in ISO 8601 format, YYYY-MM-DDThh:mm:ssZ.
String
idleTimeoutSeconds Number of seconds of inactivity after which the camera stream times out. Integer
protocol Protocol for the stream; one of the supported protocol values. String
resolution Resolution of the stream. Resolution object.
authorizationType Authorization type for accessing the stream; one of the supported authorization type values. String
videoCodec Video codec for the stream; one of the supported video codec values. String
audioCodec Audio codec for the stream; one of the supported audio codec values. String

CameraStreamConfiguration object

The CameraStreamConfiguration object represents the configurations that your camera supports. You identify the configurations that your camera supports in your discovery response.

The following example shows a CameraStreamConfiguration object.

Copied to clipboard.

{
  "protocols": ["RTSP"],
  "resolutions": [{"width":1920, "height":1080}, {"width":1280, "height":720}],
  "authorizationTypes": ["BASIC"],
  "videoCodecs": ["H264", "MPEG2"],
  "audioCodecs": ["G711"]
},

The CameraStreamConfiguration object includes the following properties.

Property Description Type
protocols Protocols that you support. An array of supported protocol values. Array
resolutions Resolutions that you support. An array of resolution objects. Array
authorizationTypes Authorization types that you support. An array of supported authorization type values. Array
videoCodecs Video codecs that you support. An array of supported video codec values. Array
audioCodecs Audio codecs that you support. An array of supported audio codec values. Array

Resolution object

The Resolution object represents the height and width of a camera stream.

The following example shows a Resolution object.

Copied to clipboard.

{
  "height": 720,
  "width": 1280
}

Discovery

You describe endpoints that support Alexa.CameraStreamController using the standard discovery mechanism described in Alexa.Discovery.

Use CAMERA for the display category. For the full list of display categories, see display categories.

In addition to the usual discovery response fields, for the CameraStreamController entry in the capabilities array, include the following fields.

Field Description Type
cameraStreamConfigurations The configurations that your camera supports. An array of cameraStreamConfiguration objects.

Discover response example

The following example shows a Discover.Response message for a security camera that supports the Alexa.CameraStreamController and Alexa.EndpointHealth interfaces.

Copied to clipboard.

{
  "event": {
    "header": {
      "namespace":"Alexa.Discovery",
      "name":"Discover.Response",
      "payloadVersion": "3",
      "messageId": "Unique identifier, preferably a version 4 UUID"
    },
    "payload":{
      "endpoints":[
        {
          "endpointId": "Unique ID of the endpoint",
          "manufacturerName": "Manufacturer of the endpoint",
          "description": "Description to be shown in the Alexa app",
          "friendlyName": "Front door camera",
          "displayCategories": ["CAMERA"],
          "cookie": {},
          "capabilities": [
            {
              "type": "AlexaInterface",
              "interface": "Alexa.CameraStreamController",
              "version": "3",
              "cameraStreamConfigurations" : [
                  {
                    "protocols": ["RTSP"],
                    "resolutions": [{"width":1920, "height":1080}, {"width":1280, "height":720}],
                    "authorizationTypes": ["BASIC"],
                    "videoCodecs": ["H264", "MPEG2"],
                    "audioCodecs": ["G711"]
                  },
                  {
                    "protocols": ["RTSP"],
                    "resolutions": [{"width":1920, "height":1080}, {"width":1280, "height":720}],
                    "authorizationTypes": ["NONE"],
                    "videoCodecs": ["H264"],
                    "audioCodecs": ["AAC"]
                 }
              ]
            },
            {
              "type": "AlexaInterface",
              "interface": "Alexa.EndpointHealth",
              "version": "3",
              "properties": {
                "supported": [
                  {
                    "name":"connectivity"
                  }
                ],
                "proactivelyReported": true,
                "retrievable": true
              }
            },
            {
              "type": "AlexaInterface",
              "interface": "Alexa",
              "version": "3"
            }
          ]
        }
      ]
    }
  }
}

Directives

Alexa sends the following Alexa.CameraStreamController interface directives to your skill.

InitializeCameraStreams directive

Support the InitializeCameraStreams directive so that users ask to see the feed from their security camera.

The following examples show user utterances:

Alexa, show the front door camera.

Alexa, montre la caméra de la porte d'entrée.

Alexa, zeige die Haustür-Kamera.

Alexa, सामने के दरवाज़े का कैमरा दिखाओ

Alexa, mostra la videocamera della porta d'ingresso.

アレクサ、玄関のカメラを見せて

Alexa, mostra a câmera da porta da frente.

Alexa, muestra la cámara de la puerta principal.

InitializeCameraStreams directive example

The following example illustrates a InitializeCameraStreams directive that Alexa sends to your skill.

{
  "directive": {
    "header": {
      "namespace": "Alexa.CameraStreamController",
      "name": "InitializeCameraStreams",
      "messageId": "Unique version 4 UUID",
      "correlationToken": "Opaque correlation token",
      "payloadVersion": "3"
    },
    "endpoint": {
      "scope": {
        "type": "BearerToken",
        "token": "OAuth2.0 bearer token"
      },
      "endpointId": "Endpoint ID",
      "cookie": {}
    },
    "payload": {
      "cameraStreams": [
        {
          "protocol": "RTSP",
          "resolution": {"width": 1920, "height": 1080},
          "authorizationType": "BASIC",
          "videoCodec": "H264",
          "audioCodec": "AAC"
        },
        {
          "protocol": "RTSP",
          "resolution": {"width": 1280, "height": 720},
          "authorizationType": "NONE",
          "videoCodec": "MPEG2",
          "audioCodec": "G711"
        }
      ]
    }
  }
}

InitializeCameraStreams directive payload

Field Description Type
cameraStreams An array of cameraStream objects that provide information about the stream. Array

InitializeCameraStreams response event

If you handle an InitializeCameraStreams directive successfully, respond with an Alexa.CameraStreamController.Response event. In the context object, include the values of all relevant properties.

The following example shows an Alexa.CameraStreamController response.

Copied to clipboard.

{
  "event": {
    "header": {
      "namespace": "Alexa.CameraStreamController",
      "name": "Response",
      "messageId": "Unique identifier, preferably a version 4 UUID",
      "correlationToken": "Opaque correlation token that matches the request",
      "payloadVersion": "3"
    },
    "endpoint": {
      "scope": {
        "type": "BearerToken",
        "token": "OAuth2.0 bearer token"
      },
      "endpointId": "Endpoint ID"
    },
    "payload": {
      "cameraStreams": [
        {
          "uri": "rtsp://username:password@link.to.video:443/feed1.mp4",
          "expirationTime": "2017-02-03T16:20:50.52Z",
          "idleTimeoutSeconds": 30,
          "protocol": "RTSP",
          "resolution": {"width": 1920, "height": 1080},
          "authorizationType": "BASIC",
          "videoCodec": "H264",
          "audioCodec": "AAC"
        }
      ],
      "imageUri": "https://example.com/image.jpg"
    }
  },
  "context": {
    "properties": [
      {
        "namespace": "Alexa.EndpointHealth",
        "name": "connectivity",
        "value": {
          "value": "OK"
        },
        "timeOfSample": "2017-02-03T16:20:50.52Z",
        "uncertaintyInMilliseconds": 0
      }
    ]
  }
}

InitializeCameraStreams response payload details

Field Description Type Required
cameraStreams An array of cameraStream objects that provide information about the stream. Array Yes
imageUri The URI to a static image from a previous feed of the camera specified in the request. String Yes

InitializeCameraStreams directive error handling

If you can't handle a InitializeCameraStreams directive successfully, respond with an Alexa.ErrorResponse event. If the customer needs to configure the camera, return the NOT_SUPPORTED_IN_CURRENT_MODE error type and include the currentDeviceMode field with a value of NOT_PROVISIONED.

State reporting

Alexa sends a ReportState directive to request information about the state of an endpoint. When Alexa sends a ReportState directive, you send a StateReport event in response. The response contains the current state of all the retrievable properties in the context object. You identify your retrievable properties in your discovery response. For details about state reports, see Understand State and Change Reporting.

StateReport response example

Copied to clipboard.

{
  "event": {
    "header": {
      "namespace": "Alexa",
      "name": "StateReport",
      "messageId": "Unique identifier, preferably a version 4 UUID",
      "correlationToken": "Opaque correlation token that matches the request",
      "payloadVersion": "3"
    },
    "endpoint": {
      "scope": {
        "type": "BearerToken",
        "token": "OAuth2.0 bearer token"
      },
      "endpointId": "Endpoint ID"
    },
    "payload": {}
  },
  "context": {
    "properties": [
      {
        "namespace": "Alexa.EndpointHealth",
        "name": "connectivity",
        "value": {
          "value": "OK"
        },
        "timeOfSample": "2017-02-03T16:20:50.52Z",
        "uncertaintyInMilliseconds": 0
      }
    ]
  }
}

Change reporting

You send a ChangeReport event to report changes proactively in the state of an endpoint. You identify the properties that you proactively report in your discovery response. For details about change reports, see Understand State and Change Reporting.

ChangeReport event example

Copied to clipboard.

{  
  "event": {
    "header": {
      "namespace": "Alexa",
      "name": "ChangeReport",
      "messageId": "Unique identifier, preferably a version 4 UUID",
      "payloadVersion": "3"
    },
    "endpoint": {
      "scope": {
        "type": "BearerToken",
        "token": "OAuth2.0 bearer token"
      },
      "endpointId": "Endpoint ID"
    },
    "payload": {
      "change": {
        "cause": {
          "type": "PERIODIC_POLL"
        },
        "properties": [
          {
            "namespace": "Alexa.EndpointHealth",
            "name": "connectivity",
            "value": {
              "value": "UNREACHABLE"
            },
            "timeOfSample": "2017-02-03T16:20:50.52Z",
            "uncertaintyInMilliseconds": 0
          }
        ]
      }
    }
  },
  "context": {
    "properties": [
    ]
  }
}

Last updated: Apr 21, 2023