Skill defects and errors can degrade interaction quality, causing frustration, and diminishing the customer experience. Amazon’s research has found that new smart home customers are more likely to abandon their devices in the initial weeks if they face issues with smart home devices. Consider a scenario where customers try to turn on their living room lights using Alexa, only to be met with "Sorry, living room light isn’t responding.". They tend to lose trust in smart home technology and potentially abandon the skill and the product.
In this article, we identify the common error types and offer solutions to address them so that third-party (3P) device makers/developers like you can take appropriate steps to deliver high-quality smart home experiences.
Common error types
#1: ENDPOINT_UNREACHABLE
This error type is reported when the endpoint has either gone offline or has been disconnected from the network.
Solution: If you know that a device has been in the same state for long period of time, it's safe to assume the device is non-functional. In this instance, we advise prompting the customer to reconnect to the device from your app or to remove it if it is no longer in use. If you decide to remove such a device, you would then need to send a DeleteReport so that Alexa can remove the endpoint, reducing unnecessary traffic to and from Alexa.
#2: SKILL_RESPONSE_TIMEOUT_EXCEPTION
This error can occur when the skill back-end times out after waiting for a response for more than eight seconds. However, this error can also occur when too many requests are sent at once, and the Lambda function gets overwhelmed. For example, you can experience decreased success rates for your control directives due to large spikes in traffic, primarily caused by customer routines executing at specific times (e.g., 7 AM for morning routines). Without proper planning, these large spikes in requests can overwhelm a provider's services. Additionally, Amazon Web Services (AWS) Lambda has a few limitations regarding scaling. If those limits are exceeded, Lambda will start rejecting requests (throttling) to manage the load. As a result, it would appear to the customer that the routine has not executed as expected.
Solution: You can address this error by following the below steps:
#3: INTERNAL_ERROR
This error type reports any errors that might be encountered due to any internal service failures on your end that you cannot avoid. However, we have observed that this error is sometimes misreported. We recommend that you always send a more specific error type, if possible. For skill errors, the more accurate the error type, the better Alexa can guide customers through appropriate steps and a offer a speech response.
Solution: You can address this error by following the below steps:
#4: INVALID_SKILL_RESPONSE_EXCEPTION
Invalid responses to skill directives can negatively impact the Alexa Smart Home customer experience. For example, a malformed response object or missing properties on the response payload. In scenarios where the skill response does not comply with the expected JSON format as per the documented payload schema comprising object definitions, Alexa encounters an INVALID_SKILL_RESPONSE_EXCEPTION.
Solution: You can address this error through the following steps:
#5: NO_SUCH_ENDPOINT
Skills might receive directives related to an endpoint that were deleted from the 3P app in the past as there is a possibility that Delete reports to the Alexa Event Service can fail due to expired or invalid Login with Amazon (LWA) tokens, or due to infrastructure failures such as server errors on the Alexa event service.
Solution: If your skill receives directives with endpoints that are not on the customers' 3P linked account, we recommend reporting NO_SUCH_ENDPOINT and a DeleteReport with the endpoint ID so that the endpoint can be removed from the customers' Alexa account.
Dashboard to track error types
Now that we have shared the common error types and the mitigation steps, you can leverage the error dashboard in the Alexa Developer Console to review the top errors that are contributing to a negative experience for your customers. Clicking on the errors will bring up the most recent messageIDs that are linked to them. Even if you are purging logs every few days, the dashboard will provide you with the most recent data. The dashboard also allows you to download message IDs related to the errors. These message IDs can then be used to look up logs on AWS Lambda to find the root cause of the error and fix issues in your skill implementation. Detailed steps can be found here.
Click on the download icon to download the message Ids to root cause the issue
You can use Cloudwatch logs insights to search for the messageIds if the regular cloudwatch log search fails. Take a look at an example query below.
fields @timestamp, @message | filter @message like /<messageId>/