Vielen Dank für Ihren Besuch. Diese Seite ist momentan nur auf Englisch verfügbar. Wir arbeiten an der deutschen Version. Vielen Dank für Ihr Verständnis.

Build Smart Home Camera Skills

A smart home skill that works with cloud-enabled cameras enables a customer to say, "Alexa, show me the front door camera" and the customer is then shown a video feed from that camera in their Alexa-enabled device that supports video streaming.

To create a smart home skill, you provide configuration information in the Alexa Skills Kit developer console and code, which is hosted as an AWS Lambda function (an Amazon Web Services offering). The skill responds to directives from Alexa, communicates with connected devices, such as cameras, and sends events to Alexa.

If you haven't created a smart home skill in the past, you should review additional documentation as a prerequisite to this document. You will need to be familiar with how to create a smart home skill and write the Lambda function code. For more information see:

To add language support for an existing smart home skill or create a smart home skill that supports multiple languages, there are a few steps you need to take. For more information, see:

Device support and regions

Your skill for cloud-enabled cameras provides a URI for streaming media. The customer's Alexa-enabled device will stream the content found at that URI.

Camera skills can display camera feeds on the following device types in the following locales.

Language (locale) Devices supported
English (US), English (UK), German Echo Show and Echo Spot, All generations of Fire TV, 2nd generation Fire TV Stick, Fire Tablets (7th generation and later)
English (CA), English (AU), English (IN) Echo Spot

Technical and performance requirements

Your cameras must provide a video feed in the correct format and meet the security and performance requirements.

Technical and security requirements for cameras

Following are the technical and security requirements for cameras and video feeds.

Category Requirement
Streaming protocol(s)RTSP + RTP
Transport protocolsInterleaved TCP on port 443 (for both RTP and RTSP)
Feed encryptionTCP socket encryption on port 443 using TLS 1.2
RTSP Command support requiredDESCRIBE, SETUP, PLAY, and TEARDOWN commands are required although a full RFC compliant implementation is recommended
Video formatH.264
Audio formatAAC/G711
Feed authentication methods HTTP Digest authentication within the returned camera stream.
RSTP URI responsivenessAll responses must occur 6 seconds or less after request received

Performance requirements for a camera skill

Low latency is critical to an optimal user experience. Your skill responsiveness when a request for a camera stream is sent from Alexa, and how quickly a camera responds and renders the camera stream have the most impact on latency.

Category Requirement Recommendations
Lambda skill responsiveness Response must occur 6 seconds or less from when a request is received, however for the best user experience, responses should occur less than 1 second after a request is received. Operations like waking a camera to begin streaming should be done asynchronously as background tasks.
URI Stream Responsiveness Under good network conditions, the first frame should be rendered on an device with Alexa 6 seconds from when the TLS handshake completes. Optimize startup latency by adjusting key frame rates and buffer times of the stream.

Local and remote execution recommendations

There are no requirements regarding whether you should return a URI that is on the same local network as the device with Alexa or a remote URI accessible from anywhere with an Internet connection. You should return what makes the most sense for your device cloud configuration. Regardless of your URI choice, all technical requirements must still be met including the use of TLS 1.2.

In general, a URI is not reachable both locally and remotely by default. You can make the URI accessible locally and remotely through domain purchasing or port forwarding. These solutions are technically challenging and so you should provide this kind of solution only if your customers need both local and remote URI access.

Camera skill implementation

Your skill must handle directives from Alexa and the skill must meet the security requirements. Your skill code:

  • Handles camera-related directives such as discovery and camera stream URI requests from Alexa as defined in the CameraStreamController capability
  • Communicate with the device cloud (cameras in this scenario), using the token provided
  • Returns a response or error event to Alexa

Customer authorization

The Smart Home Skill API follows the OAuth2.0 specification. Every request sent from the Smart Home Skill API to a smart home skill contains an OAuth access token in the request to enable access to the customer's device cloud. The device cloud must support the authorization code grant flow type. For more information about skill authentication and user account linking, see Account Linking for Smart Home and Other Domains.

Message structure

The structure of messages for a smart home skill targeting cameras is mostly the same as for other devices. However, when you receive a discovery request or a request to initialize a camera feed, you must provide details about each camera endpoint by describing it with a cameraStream object.

The following table lists the properties of a cameraStream.

Property Description Type Required?
cameraStream.protocol Protocol for the stream such as RTSP string Yes
cameraStream.resolution A resolution object that describes the the resolution of the stream. Contains width and height properties. object Yes
cameraStream.resolution.width Describes the width of the video stream. integer Yes
cameraStream.resolution.height Describes the height of the video stream. integer Yes
cameraStream.authorizationType Describes the authorization type. Possible values are "BASIC", DIGEST", or "NONE" string Yes
cameraStream.videoCodec The video codec for the stream. Possible values are "H264", "MPEG2", "MJPEG", or "JPG". string Yes
cameraStream.audioCodec The audio code for the stream. Possible values are "G711", "AAC", or "NONE". string Yes

For more details about how this information is included in a discovery response event, see CameraStreamController.

Error messages

When your skill can't respond to a directive because of authentication, customer error, hardware or other issues, it should respond with the correct error message. You send an ErrorResponse that indicates the correct error type. The following table lists scenarios and the error types you should send.

Scenario Correct Error Message Type
The target camera endpoint has been discovered, but not configured by the customer. NOT_SUPPORTED_IN_CURRENT_MODE with a currentDeviceMode value of NOT_PROVISIONED
The target camera's battery level is too low for streaming content. ENDPOINT_LOW_POWER
token is not valid, revoked, or is missing from the request INVALID_AUTHORIZATION_CREDENTIAL
Skill failed to connect to target camera because endpoint is offline or for other reasons ENDPOINT_UNREACHABLE
The target camera endpoint is temporarily unavailable NOT_SUPPORTED_IN_CURRENT_MODE
The target camera endpoint cannot be found NO_SUCH_ENDPOINT
The skill failed due to a runtime error. If possible, you should return a more specific error. INTERNAL_ERROR

Additional resources