Alexa.SmartVision.ObjectDetectionSensor Interface 1.0

Note: Sign in to the developer console to build or publish your skill.

Implement the Alexa.SmartVision.ObjectDetectionSensor interface in your Alexa skill so that customers can receive notifications when their smart-vision device detects an object, such as a person or package. Customers can enable the class of objects for which they want to receive notifications. Your skill reports smart-vision events to Alexa to notify the customer that the device detected an object of interest.

Typically, you use the Alexa.SmartVision.ObjectDetectionSensor interface with the Alexa.RTCSessionController interface. Also, you can use the Alexa.DataController interface to enable customers to review and delete detection events.

For the list of languages that the Alexa.SmartVision.ObjectDetectionSensor interface supports, see List of Alexa Interfaces and Supported Languages. For the definitions of the message properties, see Alexa Interface Message and Property Reference.

Object detection

Object detection is a computer vision technique to identify objects within an image or video stream. Some smart-vision devices can follow an object through a video stream, aggregate the frames of the same physical object, interpret the contents, and report the detected object information.

Typically, customers can configure the types of objects that they want their device to identify and report. Object detection occurs when the smart-vision device sees an object from the configured object class in the video stream. Your skill reports the object detection event to Alexa, and, then Alexa notifies the customer. The customer can also review the history of reported events in the Alexa app.

If your smart-vision device can detect multiple objects in a single video stream, send an object detection event for each object, as soon as detection occurs, without waiting for the video processing session to end. As the session continues, your smart-vision processing software can add data to the event, aggregate frames of the same physical object, and detect other objects of interest from the same or different object classes in the video stream. You send additional events for other objects of interest that appear in the stream. Event data might include the detection time, the object class, a unique identifier for the detected object, and an image of the object.

Alexa caches data associated with the detected event, such as the event identifier. When the customer opens the Alexa app to review the detection event, Alexa uses the Alexa.DataController interface to retrieve the event data from your skill.

Object classes

The Alexa.SmartVision.ObjectDetectionSensor interface uses nouns from the WordNet® database to define the types of physical objects, called object classes, that the smart-vision device might detect. WordNet is a large lexical database of English words, grouped into related words and concepts, that many smart-vision devices use to identify objects.

The following table shows common object class names used with the Alexa.SmartVision.ObjectDetectionSensor interface. You can use any object class that your device supports.

Object class	Description
`package`	A parcel or bundle. An object or group of objects wrapped in paper or plastic, or packed in a box.
`person`	A human being.

(Source: Princeton University "About WordNet." WordNet. Princeton University. 2010.)

Note: Your smart-vision device must be able to detect at least one class of object.

Utterances

The Alexa.SmartVision.ObjectDetectionSensor interface doesn't define any user utterances. Instead, Alexa communicates with your skill about the object detection classes that the customer configures in the Alexa app.

Properties and objects

The Alexa.SmartVision.ObjectDetectionSensor interface includes the following properties and objects.

Reportable properties

The Alexa.SmartVision.ObjectDetectionSensor interface uses the objectDetectionClasses as the primary property. You identify the properties that you support in your discovery response.

The objectDetectionClasses property defines the objects that the smart-vision endpoint can detect. The property is an array of ClassConfiguration objects.

ClassConfiguration object

The ClassConfiguration object provides information about the class of images that the endpoint can detect.

Property	Description	Type
`imageNetClass`	The class of images that the endpoint can detect. For valid class names, see WordNet database. Class names are nouns, such as person or package.	String

Discovery

You describe endpoints that support Alexa.SmartVision.ObjectDetectionSensor by using the standard discovery mechanism described in Alexa.Discovery.

Set retrievable to true for the properties that you report when Alexa sends your skill a state report request. Set proactivelyReported to true for the properties that you proactively report to Alexa in a change report.

Use CAMERA for the display category. For the full list of display categories, see display categories.

Sensor devices must also implement Alexa.EndpointHealth.

Note: If your skill also supports the Alexa.DataController interface, include one instance of Alexa.DataController only.

Configuration object

In addition to the usual discovery response fields, for Alexa.SmartVision.ObjectDetectionSensor, include a configuration object that contains the following fields.

Property	Description	Type	Required
`objectDetectionConfigurations`	Object detection classes that the endpoint supports.	Array of objects	Yes
`objectDetectionConfigurations[*].imageNetClass`	The class of images that the endpoint can detect. For valid class names, see WordNet database. Class names are nouns, such as person or package.	String	Yes
`objectDetectionConfigurations[*].isAvailable`	Indicates whether you can enable the class on the endpoint. Default value: `true`.	Boolean	No
`objectDetectionConfigurations[*].unavailabilityReason`	Indicates why the class isn't available on the endpoint. Valid values: `SUBSCRIPTION_REQUIRED`.	String	No

Discover response example

The following example shows a Discover.Response message for an Alexa skill that supports the Alexa.SmartVision.ObjectDetectionSensor, Alexa.DataController, and Alexa.EndpointHealth interfaces.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa.Discovery",
            "name": "Discover.Response",
            "payloadVersion": "3",
            "messageId": "Unique identifier, preferably a version 4 UUID"
        },
        "payload": {
            "endpoints": [{
                "endpointId": "Unique ID of the endpoint",
                "manufacturerName": "Sample Manufacturer",
                "description": "Description that appears in the Alexa app",
                "friendlyName": "Your device name, displayed in the Alexa app",
                "displayCategories": ["CAMERA"],
                "additionalAttributes": {
                    "manufacturer": "Sample Manufacturer",
                    "model": "Sample Model",
                    "serialNumber": "Serial number of the device",
                    "firmwareVersion": "Firmware version of the device",
                    "softwareVersion": "Software version of the device",
                    "customIdentifier": "Optional custom identifier for the device"
                },
                "cookie": {},
                "capabilities": [{
                        "type": "AlexaInterface",
                        "interface": "Alexa.SmartVision.ObjectDetectionSensor",
                        "version": "1.0",
                        "properties": {
                            "supported": [{
                                "name": "objectDetectionClasses"
                            }],
                            "proactivelyReported": true,
                            "retrievable": true
                        },
                        "configuration": {
                            "objectDetectionConfiguration": [{
                                    "imageNetClass": "person"
                                },
                                {
                                    "imageNetClass": "package",
                                    "isAvailable": false,
                                    "unavailabilityReason": "SUBSCRIPTION_REQUIRED"
                                }
                            ]
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa.DataController",
                        "instance": "Camera.SmartVisionData",
                        "version": "1.0",
                        "properties": {},
                        "configuration": {
                            "targetCapability": {
                                "name": "Alexa.SmartVision.ObjectDetectionSensor",
                                "version": "1.0"
                            },
                            "dataRetrievalSchema": {
                                "type": "JSON",
                                "schema": "SmartVisionData"
                            },
                            "supportedAccess": ["BY_TIMESTAMP_RANGE"]
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa.EndpointHealth",
                        "version": "3.1",
                        "properties": {
                            "supported": [{
                                "name": "connectivity"
                            }],
                            "proactivelyReported": true,
                            "retrievable": true
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa",
                        "version": "3"
                    }
                ]
            }]
        }
    }
}

AddOrUpdateReport

You must proactively send an Alexa.Discovery.AddOrUpdateReport event if the feature support of your endpoint changes. For example, if the subscription status of a supported object class changes. For details, see AddOrUpdateReport event.

AddOrUpdateReport event example

The following example shows an AddOrUpdateReport message to report the package class no longer requires a subscription.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa.Discovery",
            "name": "AddOrUpdateReport",
            "payloadVersion": "3",
            "messageId": "Unique identifier, preferably a version 4 UUID"
        },
        "payload": {
            "endpoints": [{
                "endpointId": "Unique ID of the endpoint",
                "manufacturerName": "Sample Manufacturer",
                "description": "Description that appears in the Alexa app",
                "friendlyName": "Your device name, displayed in the Alexa app",
                "displayCategories": ["CAMERA"],
                "additionalAttributes": {
                    "manufacturer": "Sample Manufacturer",
                    "model": "Sample Model",
                    "serialNumber": "Serial number of the device",
                    "firmwareVersion": "Firmware version of the device",
                    "softwareVersion": "Software version of the device",
                    "customIdentifier": "Optional custom identifier for the device"
                },
                "cookie": {},
                "capabilities": [{
                        "type": "AlexaInterface",
                        "interface": "Alexa.SmartVision.ObjectDetectionSensor",
                        "version": "1.0",
                        "properties": {
                            "supported": [{
                                    "name": "objectDetectionClasses"
                                }
                            ],
                            "proactivelyReported": true,
                            "retrievable": true
                        },
                        "configuration": {
                            "objectDetectionConfiguration" : [
                                {
                                   "imageNetClass" : "person"
                                },
                                {
                                   "imageNetClass" : "package"
                                }
                            ]
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa.EndpointHealth",
                        "version": "3.1",
                        "properties": {
                            "supported": [{
                                "name": "connectivity"
                            }],
                            "proactivelyReported": true,
                            "retrievable": true
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa",
                        "version": "3"
                    }
                ]
            }]
        }
    }
}

Directives and events

The Alexa.SmartVision.ObjectDetectionSensor interface defines the following directives and events.

SetObjectDetectionClasses directive

Support the SetObjectDetectionClasses directive so that the customer can configure the objects for which they want to receive notifications. The customer can configure the object classes in the Alexa app. For this endpoint, you must disable events for any object class that isn't included in the request.

SetObjectDetectionClasses directive example

The following example shows a SetObjectDetectionClasses directive that Alexa sends to your skill. This example enables detection of objects in the person and package classes.

{
    "directive": {
        "header": {
            "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
            "name": "SetObjectDetectionClasses",
            "payloadVersion": "1.0",
            "messageId": "Unique version 4 UUID",
            "correlationToken": "Opaque correlation token"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id",
            "cookie": {}
        },
        "payload": {
            "objectDetectionClasses": [{
                    "imageNetClass": "person"
                },
                {
                    "imageNetClass": "package"
                }
            ]
        }
    }
}

SetObjectDetectionClasses directive payload

The following table shows the payload details for the SetObjectDetectionClasses directive that Alexa sends to your skill.

Property	Description	Type	Required
`objectDetectionClasses`	Classes of objects that the customer wants the camera to detect. You must disable object detection for any other object classes that your smart-vision camera supports.	Array of `ClassConfiguration` objects	Yes

SetObjectDetectionClasses response

If you handle a SetObjectDetectionClasses directive successfully and you can configure events for the requested object classes, respond with an Alexa.Response and include the resulting supported objectDetectionClasses array.

The following example shows a successful response to the SetObjectDetectionClasses directive.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa",
            "name": "Response",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "correlationToken": "Opaque correlation token that matches the request",
            "payloadVersion": "3"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2.0 bearer token"
            },
            "endpointId": "endpoint id",
            "cookie": {}
        },
        "payload": {}
    },
    "context": {
        "properties": [{
                "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
                "name": "objectDetectionClasses",
                "value": [{
                        "imageNetClass": "person"
                    },
                    {
                        "imageNetClass": "package"
                    }
                ]
            },
            {
                "namespace": "Alexa.EndpointHealth",
                "name": "connectivity",
                "value": {
                    "value": "OK"
                },
                "timeOfSample": "2017-02-03T16:20:50.52Z",
                "uncertaintyInMilliseconds": 0
            }
        ]
    }
}

SetObjectDetectionClasses directive error handling

If you can't handle a SetObjectDetectionClasses directive successfully, respond with an Alexa.SmartVision.ObjectDetectionSensor.ErrorResponse event. You can also respond with a generic Alexa.ErrorResponse event if your error isn't specific to object detection.

ObjectDetection event

Send the ObjectDetection event to the Alexa Event Gateway when your device recognizes an object from one of the configured object classes. For details, see Send Events to the Event Gateway. On receipt of the event, Alexa notifies the customer about the detected object. Also, Alexa caches the event data by eventIdentifier and endpointId so that the customer can later view and delete the event data.

Assign a unique eventIdentifier for each detected object in the video stream and send one event per detected object. Also, produce at most one detection event per detected object class and video stream. After your skill reports an ObjectDetection event, you must wait at least 30 seconds before sending another ObjectDetection event for the same object.

Important: Report detected objects to Alexa as soon as possible to minimize the latency between the event and notifications to the customer. You can update the event data during the camera session by using the Alexa.DataController interface.

ObjectDetection event example

The following example shows an ObjectDetection event that you send to Alexa. This example reports the detection of an object in the person class.

Copied to clipboard.

 {
    "event": {
        "header": {
            "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
            "name": "ObjectDetection",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "payloadVersion": "1.0"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id"
        },
        "payload": {
            "events": [{
                "eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
                "imageNetClass": "person",
                "timeOfSample": "2021-07-02T16:20:50.52Z",
                "uncertaintyInMilliseconds": 0,
                "objectIdentifier": "573409df-5486-7c52-b4ba-d361860bac73",
                "frameImageUri": "https://example.com/frames/frame1.jpg",
                "croppedImageUri": "https://example.com/images/image1.jpg"
            }]
        }
    }
}

ObjectDetection event payload

The following table shows the payload details for the ObjectDetection event.

Property	Description	Type	Required
`events`	Objects that the endpoint detected.	Array of objects	Yes
`events[*].eventIdentifier`	Uniquely identifies the event in the event history. You can use the identifier to retrieve and delete the event from the camera stream. Generate an event identifier for each detected object.	Version 4 UUID String	Yes
`events[*].imageNetClass`	Class of the detected object. For valid class names, see WordNet database.	String	Yes
`events[*].objectIdentifier`	Uniquely identifies the physical object detected in this session. Generate an identifier for each detected object. If your device doesn't distinguish between detected objects, don't include this identifier.	Version 4 UUID String	No
`events[*].frameImageUri`	URI to the frame that shows the detected object. The frame gives context to the scene where the device detected the object. If the device extracts frames, you can retrieve the frames by using the `Alexa.DataController` interface.	String	No
`events[*].croppedImageUri`	URI to the cropped image centered on the detected object. If the device extracts cropped images, you can retrieve the images by using the `Alexa.DataController` interface.	String	No
`events[*].timeOfSample`	Time the endpoint detected the object. Defined in ISO 8601 format, `YYYY-MM-DDThh:mm:ssZ`.	String	Yes
`events[*].uncertaintyInMilliseconds`	Uncertainty of the `timeOfSample` in milliseconds. This field represents the number of milliseconds before or after the endpoint detected the object. For example, uncertainty due to the transmission delay between the action in front of the camera and the corresponding object detection processing software.	Number	No

ObjectDetection response

If your skill proactively sent an ObjectDetection event and Alexa handles the event successfully, your skill receives HTTP status code 202 Success. On error, Alexa sends the appropriate HTTP status code.

The following table shows the HTTP status codes sent in response to the event.

Status	Description
`202 Success`	Operation succeeded.
`400 Invalid Request`	Indicates that the request is invalid or badly formatted. Verify the event payload and check for any missing or invalid fields.
`401 Unauthorized`	Indicates that the request didn't include the authorization token or the token is invalid or expired.
`403 Forbidden`	Indicates that the authorization token doesn't have sufficient permissions or the skill is disabled.
`404 Not Found`	Indicates that the skill doesn't exist in the corresponding stage.
`413 Request Entity Too Large`	Maximum number or size of a parameter exceeds the limit.
`429 Too Many Requests response`	Number of requests per minute is too high. Use exponential back-off and retry the request.
`500 Internal Server Error`	An error occurred on the server. The skill can retry by using exponential back-off.
`503 Service Unavailable`	Server is busy or unavailable. The skill can retry by using exponential back-off.

Event update and deletion

As soon as your device recognizes a configured object, you send the ObjectDetection event to Alexa. On receipt of the event, Alexa sends a notification to the customer. As the video stream continues, you can update the event data by using the Alexa.DataController interface to send a DataReport event to Alexa. For example, you might want to aggregate frames of the same physical object. Alexa doesn't send a notification when you update the event data.

You can also use the Alexa.DataController interface to delete data stored on Alexa. For example, you send a DataDeleted event to Alexa when the customer deletes the detection event directly from your camera or your camera app.

Smart-vision data schema example

You send smart-vision event data as an array in the data property of the DataReport event.

The following example shows a DataReport event from a smart camera that includes data for two frames and the associated images of the detected object.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa.DataController",
            "name": "DataReport",
            "instance": "DataController-SmartVisionData",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "correlationToken": "Opaque correlation token that matches the request",
            "payloadVersion": "1.0"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "Opaque correlation token"
            },
            "endpointId": "endpoint id"
        },
        "payload": {
            "paginationContext": {
                "nextToken": "token"
            },
            "dataSchema": {
                "type": "JSON",
                "schema": "SmartVisionData"
            },
            "data": [{
                    "eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
                    "imageNetClass": "person",
                    "mediaId": "2c3409df-d686-4a52-9bba-d361860bac61",
                    "objectIdentifier": "4a3409df-d686-4a52-9bba-d361860bacbf",
                    "frameIndex": 2,
                    "frameWidthInPixels": 1980,
                    "frameHeightInPixels": 1080,
                    "frameImageUri": "https://example.com/frames/frame1.jpg",
                    "croppedImageUri": "https://example.com/images/image1.jpg"
                },
                {
                    "eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
                    "imageNetClass": "person",
                    "mediaId": "2c3409df-d686-4a52-9bba-d361860bac62",
                    "objectIdentifier": "4a3409df-d686-4a52-9bba-d361860bacbf",
                    "frameIndex": 5,
                    "frameWidthInPixels": 1980,
                    "frameHeightInPixels": 1080,
                    "frameImageUri": "https://example.com/frames/frame2.jpg",
                    "croppedImageUri": "https://example.com/images/image2.jpg"
                }
            ]
        }
    }
}

Smart-vision data schema definition

The following table shows the JSON data schema defined by the Alexa.SmartVision.ObjectDetectionSensor interface.

Property	Description	Type	Required
`eventIdentifier`	Uniquely identifies the event in the event history. You can send updated data for the same camera session and customer.	Version 4 UUID String	Yes
`imageNetClass`	Class of the detected object.	String	Yes
`mediaId`	Uniquely identifies the media recording in which the event occurred.	Version 4 UUID String	No
`frameIndex`	Frame number.	Integer	No
`frameWidthInPixels`	Width of the frame in pixels.	Integer	No
`frameHeightInPixels`	Height of the frame in pixels.	Integer	No
`objectIdentifier`	Uniquely identifies the physical object detected in this session. Generate an identifier for each detected object. If your device doesn't distinguish between detected objects, don't include this identifier.	Version 4 UUID String	No
`frameImageUri`	URI to the frame that shows the detected object. The frame gives context to the scene where the device detected the object. If the device extracts frames, you can retrieve the frames by using the `Alexa.DataController` interface.	String	No
`croppedImageUri`	URI to the cropped image centered on the detected object. If the device extracts cropped images, you can retrieve the images by using the `Alexa.DataController` interface.	String	No

State reporting

Alexa sends a ReportState directive to request information about the state of an endpoint. When Alexa sends a ReportState directive, you send a StateReport event in response. The response contains the current state of all retrievable properties in the context object. You identify your retrievable properties in your discovery response. For details about state reports, see Understand State and Change Reporting.

StateReport response example

In this example, the smart-vision endpoint supports the person and package object detection classes.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa",
            "name": "StateReport",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "correlationToken": "Opaque correlation token that matches the request",
            "payloadVersion": "3"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id",
            "cookie": {}
        },
        "payload": {}
    },
    "context": {
        "properties": [{
                "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
                "name": "objectDetectionClasses",
                "value": [{
                        "imageNetClass": "person"
                    },
                    {
                        "imageNetClass": "package"
                    }
                ],
                "timeOfSample": "2024-07-03T11:20:50.52Z",
                "uncertaintyInMilliseconds": 0
            },
            {
                "namespace": "Alexa.EndpointHealth",
                "name": "connectivity",
                "value": {
                    "value": "OK"
                },
                "timeOfSample": "2024-07-03T10:45:00.52Z"
                "uncertaintyInMilliseconds": 0
            }
        ]
    }
}

Change reporting

You send a ChangeReport event to report changes proactively in the state of an endpoint. You identify the properties that you proactively report in your discovery response. For details about change reports, see Understand State and Change Reporting.

The payload contains the values of properties that have changed, the context contains the values of other relevant properties.

ChangeReport event example

The following example shows a ChangeReport event after the customer changes their preference for which objects to detect.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa",
            "name": "ChangeReport",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "payloadVersion": "3"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id"
        },
        "payload": {
            "change": {
                "cause": {
                    "type": "PHYSICAL_INTERACTION"
                },
                "properties": [{
                    "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
                    "name": "objectDetectionClasses",
                    "value": [{
                        "imageNetClass": "person"
                    }],
                    "timeOfSample": "2024-07-03T10:20:50.52Z",
                    "uncertaintyInMilliseconds": 0
                }]
            }
        }
    },
    "context": {
        "properties": [{
            "namespace": "Alexa.EndpointHealth",
            "name": "connectivity",
            "value": {
                "value": "OK"
            },
            "timeOfSample": "2024-07-03T10:19:02.12Z",
            "uncertaintyInMilliseconds": 60000
        }]
    }
}

Was this page helpful?

Provide feedback

Last updated: Aug 23, 2024

Alexa.SmartVision.ObjectDetectionSensor Interface 1.0

Object detection

Object classes

Utterances

Properties and objects

Reportable properties

ClassConfiguration object

Discovery

Configuration object

Discover response example

AddOrUpdateReport

AddOrUpdateReport event example

Directives and events

SetObjectDetectionClasses directive

SetObjectDetectionClasses directive example

SetObjectDetectionClasses directive payload

SetObjectDetectionClasses response

SetObjectDetectionClasses directive error handling

ObjectDetection event

ObjectDetection event example

ObjectDetection event payload

ObjectDetection response

Event update and deletion

Smart-vision data schema example

Smart-vision data schema definition

State reporting

StateReport response example

Change reporting

ChangeReport event example

Related topics

Was this page helpful?