Build Smart Home Camera Skills
A smart home skill that works with cloud-enabled cameras enables a customer to say, "Alexa, show me the front door camera" and the customer is then shown a video feed from that camera in their Alexa-enabled device that supports video streaming.
To create a smart home skill, you provide configuration information in the Alexa Skills Kit developer console and code, which is hosted as an AWS Lambda function (an Amazon Web Services offering). The skill responds to directives from Alexa, communicates with connected devices, such as cameras, and sends events to Alexa.
- Device support and regions
- Technical and performance requirements
- Camera skill implementation
- Additional resources
If you haven't created a smart home skill in the past, you should review additional documentation as a prerequisite to this document. You will need to be familiar with how to create a smart home skill and write the Lambda function code. For more information see:
To add language support for an existing smart home skill or create a smart home skill that supports multiple languages, there are a few steps you need to take. For more information, see:
Device support and regions
Your skill for cloud-enabled cameras provides a URI for streaming media. The customer's Alexa-enabled device will stream the content found at that URI.
Camera skills can display camera feeds on the following device types in the following locales.
|Language (locale)||Devices supported|
|English (US), English (UK), German||Echo Show and Echo Spot, All generations of Fire TV, 2nd generation Fire TV Stick, Fire Tablets (7th generation and later)|
|English (CA), English (AU), English (IN)||Echo Spot|
Technical and performance requirements
Your cameras must provide a video feed in the correct format and meet the security and performance requirements.
Technical and security requirements for cameras
Following are the technical and security requirements for cameras and video feeds.
|Streaming protocol(s)||RTSP + RTP|
|Transport protocols||Interleaved TCP on port 443 (for both RTP and RTSP)|
|Feed encryption||TCP socket encryption on port 443 using TLS 1.2|
|RTSP Command support required||DESCRIBE, SETUP, PLAY, and TEARDOWN commands are required although a full RFC compliant implementation is recommended|
|Feed authentication methods||HTTP Digest authentication within the returned camera stream.|
|RSTP URI responsiveness||All responses must occur 6 seconds or less after request received|
Performance requirements for a camera skill
Low latency is critical to an optimal user experience. Your skill responsiveness when a request for a camera stream is sent from Alexa, and how quickly a camera responds and renders the camera stream have the most impact on latency.
|Lambda skill responsiveness||Response must occur 6 seconds or less from when a request is received, however for the best user experience, responses should occur less than 1 second after a request is received.||Operations like waking a camera to begin streaming should be done asynchronously as background tasks.|
|URI Stream Responsiveness||Under good network conditions, the first frame should be rendered on an device with Alexa 6 seconds from when the TLS handshake completes.||Optimize startup latency by adjusting key frame rates and buffer times of the stream.|
Local and remote execution recommendations
There are no requirements regarding whether you should return a URI that is on the same local network as the device with Alexa or a remote URI accessible from anywhere with an Internet connection. You should return what makes the most sense for your device cloud configuration. Regardless of your URI choice, all technical requirements must still be met including the use of TLS 1.2.
In general, a URI is not reachable both locally and remotely by default. You can make the URI accessible locally and remotely through domain purchasing or port forwarding. These solutions are technically challenging and so you should provide this kind of solution only if your customers need both local and remote URI access.
Camera skill implementation
Your skill must handle directives from Alexa and the skill must meet the security requirements. Your skill code:
- Handles camera-related directives such as discovery and camera stream URI requests from Alexa as defined in the CameraStreamController capability
- Communicate with the device cloud (cameras in this scenario), using the token provided
- Returns a response or error event to Alexa
The Smart Home Skill API follows the OAuth2.0 specification. Every request sent from the Smart Home Skill API to a smart home skill contains an OAuth access token in the request to enable access to the customer's device cloud. The device cloud must support the authorization code grant flow type. For more information about skill authentication and user account linking, see Account Linking for Smart Home and Other Domains.
The structure of messages for a smart home skill targeting cameras is mostly the same as for other devices. However, when you receive a discovery request or a request to initialize a camera feed, you must provide details about each camera endpoint by describing it with a
The following table lists the properties of a
||Protocol for the stream such as RTSP||string||Yes|
||A resolution object that describes the the resolution of the stream. Contains
||Describes the width of the video stream.||integer||Yes|
||Describes the height of the video stream.||integer||Yes|
||Describes the authorization type. Possible values are "BASIC", DIGEST", or "NONE"||string||Yes|
||The video codec for the stream. Possible values are "H264", "MPEG2", "MJPEG", or "JPG".||string||Yes|
||The audio code for the stream. Possible values are "G711", "AAC", or "NONE".||string||Yes|
For more details about how this information is included in a discovery response event, see CameraStreamController.
When your skill can't respond to a directive because of authentication, customer error, hardware or other issues, it should respond with the correct error message. You send an ErrorResponse that indicates the correct error type. The following table lists scenarios and the error types you should send.
|Scenario||Correct Error Message Type|
|The target camera endpoint has been discovered, but not configured by the customer.||
NOT_SUPPORTED_IN_CURRENT_MODE with a
|The target camera's battery level is too low for streaming content.||ENDPOINT_LOW_POWER|
|Skill failed to connect to target camera because endpoint is offline or for other reasons||ENDPOINT_UNREACHABLE|
|The target camera endpoint is temporarily unavailable||NOT_SUPPORTED_IN_CURRENT_MODE|
|The target camera endpoint cannot be found||NO_SUCH_ENDPOINT|
|The skill failed due to a runtime error. If possible, you should return a more specific error.||INTERNAL_ERROR|