Build Smart Home Camera Skills
A smart home skill that works with cloud-enabled cameras enables a customer to say, "Alexa, show me the front door camera" and the customer is then shown a video feed from that camera in their Alexa-enabled device that supports video streaming. In addition, you can optionally support camera history so that a customer can ask to view a past recording.
To create a smart home skill, you provide configuration information in the Alexa Skills Kit developer console and code, which is hosted as an AWS Lambda function (an Amazon Web Services offering). The skill responds to directives from Alexa, communicates with connected devices, such as cameras, and sends events to Alexa.
- Device support and regions
- Technical, performance, and security requirements
- Camera skill implementation
- Additional resources
If you haven't created a smart home skill in the past, you should review additional documentation as a prerequisite to this document. You will need to be familiar with how to create a smart home skill and write the Lambda function code. For more information, see Steps to Build a Smart Home Skill.
You can build an Alexa skill for cloud-enabled cameras that stream video and audio using Web Real-Time Communication (WebRTC) or the Real Time Streaming Protocol (RTSP). We recommend that you use WebRTC whenever possible. For more information, see the following API documentation:
To add language support for an existing smart home skill or create a smart home skill that supports multiple languages, there are a few steps you need to take. For more information, see Develop Smart Home Skills for Multiple Languages.
Device support and regions
Your skill for cloud-enabled cameras provides a URI for streaming media. The customer's Alexa-enabled device will stream the content found at that URI.
Camera skills can display camera feeds on the following device types in the following locales.
|Language (locale)||Devices supported|
|English (US), English (UK), German, Japanese||Echo Show and Echo Spot, All generations of Fire TV, 2nd generation Fire TV Stick, Fire Tablets (7th generation and later)|
|English (CA), English (AU), English (IN)||Echo Spot|
Technical, performance, and security requirements
Your cameras must provide a video feed in the correct format and meet the security and performance requirements. The requirements differ depending on which protocol or device capability the camera supports.
- For the requirements for a camera that uses WebRTC (recommended), see supported communication types in the Alexa.RTCSessionController documentation.
- For the requirements for a camera that uses RTSP, see prerequisites and requirements in the Alexa.CameraStreamController documentation.
Performance requirements for a camera skill
Regardless of which API your skill implements, low latency is critical to an optimal user experience. Your skill responsiveness when a request for a camera stream is sent from Alexa, and how quickly a camera responds and renders the camera stream have the most impact on the user experience.
|Lambda skill responsiveness||Responses must occur 6 seconds or less from when a request is received. However, for the best user experience, responses should occur less than 1 second from when a request is received.||Operations like waking a camera to begin streaming should be done asynchronously as background tasks.|
|URI stream responsiveness||Under good network conditions, the first frame should be rendered on a device with Alexa within 6 seconds from when the TLS handshake completes.||Optimize startup latency by adjusting key frame rates and buffer times of the stream.|
Local and remote execution recommendations
There are no requirements regarding whether you should return a URI that is on the same local network as the device with Alexa or a remote URI accessible from anywhere with an Internet connection. You should return what makes the most sense for your device cloud configuration. Regardless of your URI choice, all technical requirements must still be met including the use of TLS 1.2.
In general, a URI is not reachable both locally and remotely by default. You can make the URI accessible locally and remotely through domain purchasing or port forwarding. These solutions are technically challenging and so you should provide this kind of solution only if your customers need both local and remote URI access.
Camera skill implementation
Your skill must handle directives from Alexa and the skill must meet the security requirements. Your skill code:
- Handles camera-related directives such as discovery and camera stream URI requests from Alexa as defined in the RTCSessionController or CameraStreamController capabilities.
- Communicate with the device cloud (cameras in this scenario), using the token provided.
- Optionally notifies Alexa of new or deleted media recordings as defined in the MediaMetadata capability.
- Returns a response or error event to Alexa.
The Smart Home Skill API follows the OAuth2.0 specification. Every request sent from the Smart Home Skill API to a smart home skill contains an OAuth access token in the request to enable access to the customer's device cloud. The device cloud must support the authorization code grant flow type. For more information about skill authentication and user account linking, see Account Linking for Smart Home and Other Domains.
In addition, if your skills supports camera history, you will need to request permission to send Alexa events, which enables you to identify the user a message is associated with. For more information, see Authenticate a Customer to Alexa with Permissions.
The structure of messages for a smart home skill targeting cameras differs depending on which protocol or device capability the camera supports.
- For more information about the structure of messages for a camera that uses WebRTC (recommended), see the Alexa.RTCSessionController documentation.
- For more information about the structure of messages for a camera that uses RTSP, see the Alexa.CameraStreamController documentation.
Camera history and media recordings
Most security cameras create recordings of events such as when motion or sound is detected. To make these available to customers through Alexa, you can implement the
MediaMetadata capability. This capability defines messages for notifying Alexa of new, updated, and deleted media recordings. Alexa can then store a list of recordings and show them to the customer when they ask to see one. For more information, see the Alexa.MediaMetadata documentation.
When your skill can't respond to a directive because of authentication, customer error, hardware problems, or other issues, it should respond with the correct error message. You send an ErrorResponse that indicates the correct error type. The following table lists scenarios and the error types you should send.
|Scenario||Correct Error Message Type|
|The target camera endpoint has been discovered, but not configured by the customer.||
|The target camera's battery level is too low for streaming content.||
|Skill failed to connect to target camera because endpoint is offline or for other reasons.||
|The target camera endpoint is temporarily unavailable.||
|The target camera endpoint cannot be found.||
|The skill failed due to a runtime error. If possible, you should return a more specific error.||