Your Alexa Dashboards Settings

Build Smart Home Camera Skills

A smart home skill that works with cloud-enabled cameras enables a customer to say, “Alexa, show me the front door camera” and the customer is then shown a video feed from that camera in their Alexa-enabled device that supports video streaming.

This document provides a list of Alexa devices that can display a camera feed, technical requirements for a camera feed, and details on camera-specific directives and error messages for a smart home skill targeting cameras.

Overview

To create a smart home skill, you provide configuration information in the Amazon Developer Portal and code, which is hosted as an AWS Lambda function (an Amazon Web Services offering). The skill responds to directives from Alexa, and communicates with connected devices, such as cameras, and sends response events back to the the Alexa. To create smart home skills for connected cameras, you should be familiar with:

If you haven’t created a smart home skill in the past, you should review additional documentation as a prerequisite to this document. You will need to be familiar with how to create a smart home skill and write the Lambda function code. For more information see:

To add language support for an existing smart home skill or create a smart home skill that supports multiple languages, there are a few steps you need to take. For more information, see:

Device support by region

In general, your skill for cloud-enabled cameras provides a URI for streaming media. The customer’s Alexa-enabled device will stream the content found at that URI.

The types of devices that can display camera feeds varies by geographic region:

Region Alexa Device Support
United States The Echo Show, all generations of Fire TV, 2nd generation Fire TV Stick, Fire Tablets (7th generation and later)
United Kingdom and Germany All generations of Fire TV, 2nd generation of Fire TV Stick, Fire Tablets (7th generation and later)

Technical and performance requirements

Your cameras must provide a video feed in the correct format and meet the security and performance requirements.

Technical and security requirements for cameras

Following are the technical and security requirements for cameras and video feeds.

Category Requirement
Streaming protocol(s)RTSP + RTP
Transport protocolsInterleaved TCP on port 443 (for both RTP and RTSP)
Feed encryptionTCP socket encryption on port 443 using TLS 1.2
RTSP Command support requiredDESCRIBE, SETUP, PLAY, and TEARDOWN commands are required although a full RFC compliant implementation is recommended
Video formatH.264
Audio formatAAC/G711
Feed authentication methods
  • HTTP Digest authentication within the returned camera stream.
RSTP URI responsivenessAll responses must occur 6 seconds or less after request received

Performance requirements for a camera skill

Low latency is critical to an optimal user experience. Your skill responsiveness when a request for a camera stream is sent from Alexa, and how quickly a camera responds and renders the camera stream have the most impact on latency.

Category Requirement Recommendations
Lambda skill responsiveness Response must occur 6 seconds or less from when a request is received, however for the best user experience, responses should occur less than 1 second after a request is received. Operations like waking a camera to begin streaming should be done asynchronously as background tasks.
URI Stream Responsiveness Under good network conditions, the first frame should be rendered on an device with Alexa 6 seconds from when the TLS handshake completes. Optimize startup latency by adjusting key frame rates and buffer times of the stream.

Local and remote execution recommendations

There are no requirements regarding whether you should return a URI that is on the same local network as the device with Alexa or a remote URI accessible from anywhere with an Internet connection. You should return what makes the most sense for your device cloud configuration. Regardless of your URI choice, all technical requirements must still be met including the use of TLS 1.2.

In general, a URI is not reachable both locally and remotely by default. You can make the URI accessible locally and remotely through domain purchasing or port forwarding. These solutions are technically challenging and so you should provide this kind of solution only if your customers need both local and remote URI access.

Camera Skill Implementation

Your skill must handle directives from Alexa and the skill must meet the security requirements. Your skill code:

  • Handles camera-related directives such as discovery and camera stream URI requests from Alexa as defined in the CameraStreamController interface
  • Communicate with the device cloud (cameras in this scenario), using the token provided
  • Returns a response or error event to Alexa

Customer authorization

The Smart Home Skill API follows the OAuth2.0 specification. Every request sent from the Smart Home Skill API to a smart home skill contains an OAuth access token in the request to enable access to the customer’s device cloud. The device cloud must support the authorization code grant flow type. For more information about skill authentication and user account linking, see Authenticate an Alexa User to a User in Your System

Message structure

The structure of messages for a smart home skill targeting cameras is mostly the same as for other devices. However, when you receive a discovery request or a request to initialize a camera feed, you must provide details about each camera endpoint by describing it with a cameraStream object.

The following table lists the properties of a cameraStream.

Property Description Type Required?
cameraStream.protocol Protocol for the stream such as RTSP string Yes
cameraStream.resolution A resolution object that describes the the resolution of the stream. Contains width and height properties. object Yes
cameraStream.resolution.width Describes the width of the video stream. integer Yes
cameraStream.resolution.height Describes the height of the video stream. integer Yes
cameraStream.authorizationType Describes the authorization type. Possible values are “BASIC”, DIGEST”, or “NONE” string Yes
cameraStream.videoCodec The video codec for the stream. Possible values are “H264”, “MPEG2”, “MJPEG”, or “JPG”. string Yes
cameraStream.audioCodec The audio code for the stream. Possible values are “G711”, “AAC”, or “NONE”. string Yes

For more details about how this information is included in a discovery response event, see CameraStreamController.

Error messages


When your skill can’t respond to a directive because of authentication, customer error, hardware or other issues, it should respond with the correct error message. You send an ErrorResponse that indicates the correct error type. The following table lists scenarios and the error types you should send.

Scenario Correct Error Message Type
The target camera endpoint has been discovered, but not configured by the customer. NOT_SUPPORTED_IN_CURRENT_MODE with a `currentDeviceMode` value of NOT_PROVISIONED
The target camera's battery level is too low for streaming content. ENDPOINT_LOW_POWER
token is not valid, revoked, or is missing from the request INVALID_AUTHORIZATION_CREDENTIAL
token is expired INVALID_AUTHORIZATION_CREDENTIAL
Skill failed to connect to target camera because endpoint is offline or for other reasons ENDPOINT_UNREACHABLE
The target camera endpoint is temporarily unavailable NOT_SUPPORTED_IN_CURRENT_MODE
The target camera endpoint cannot be found NO_SUCH_ENDPOINT
The skill failed due to a runtime error. If possible, you should return a more specific error. INTERNAL_ERROR

Additional resources