As customers give and receive new Alexa-enabled devices around the holidays, you may experience an increase in skill usage. It’s important to scale your backend to prepare for such traffic spikes.
Most skills use AWS Lambda compute service for backend processing and Amazon DynamoDB or Amazon Simple Storage Service (S3) for persistence of data. If you use AWS for backend services, here are some best practices to help you scale your AWS Lambda function and other AWS services to handle spikes in skill usage. Follow the tips below to prepare for scaling your AWS backend services for load and set up Amazon CloudWatch alarms for monitoring.
Concurrent executions refers to the number of executions of your function code that are happening at any given time. AWS Lambda has a default safety throttle of 1,000 concurrent executions per account per region. This means if you have multiple Lambda functions for multiple skills under the same account, all skills together cannot exceed 1,000 concurrent executions. To prevent accessory functions from taking over concurrency that should be dedicated to your Alexa skill, AWS Lambda lets you set a maximum concurrency on a per-function basis. To request a limit increase for concurrent executions, create an AWS support case. For more information, refer to our guide, How to Scale Your Alexa Skill Using Amazon Web Services.
By default, each Lambda function, has a 128 MB memory and 3-second timeout. You can find this in the Basic settings under Configuration tab for each Lambda function in your AWS Console. Increase the Timeout from the default of 3 seconds to 7 seconds. The Alexa Service timeout is 8 seconds, so make sure you get a response back from Lambda before Alexa times out.
Increase the Memory (MB) allocation depending upon your needs and expected traffic. Consider if your code is CPU-bound or memory-bound, it might be cost effective to allocate more memory. Allocating more memory will provide more CPU power to your function and it can execute it faster. This can result in an improved customer experience, with faster response times; as well as lower running cost by reducing the execution time for your Lambda function. This also has an impact on concurrent executions. When functions run faster, there is less concurrent executions for the same amount of invocations.
We recommend configuring CloudWatch alarms on the “Throttles” metric for Lambda. It measures the number of Lambda function invocation attempts that get throttled due to invocation rates exceeding the customer’s concurrent limits (error code 429). Failed invocations may trigger a retry attempt that succeeds. Setup CloudWatch monitoring and alerts for your Lambda function so that you can monitor your function health and get alerts when something goes wrong, like when you’re being throttled.
Even if you are generating application logs related to Amazon DynamoDB events, it is still recommended that you create CloudWatch alarms on your Amazon DynamoDB tables and their corresponding Global Secondary Index(es) (GSI) to get additional information. Amazon DynamoDB is a fully managed database service and as such there are only metrics related to application calls to Amazon DynamoDB for you to monitor. At a minimum, consider creating alarms for ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, WriteThrottleEvents, ReadThrottleEvents, SuccessfulRequestLatency, and SystemErrors.
When you establish an AWS account, the account has default limits for the maximum read capacity units and write capacity units that you can provision across all of your Amazon DynamoDB tables in a given region. Provisioned capacity limits can be increased at any time, and there’s virtually no limit to the size and throughput an Amazon DynamoDB table can reach. Also, there are per-table limits that apply when you create a table. For more information, view the Limits page in the Amazon DynamoDB Developer Guide. You can view initial default limits in the documentation. Your account might have different limits than the initial limits if you have worked with AWS support in the past to get those limits increased.
By following these recommendations, you can avoid any possible service degradation. Download our guide, How to Scale Your Alexa Skill with Amazon Web Services, to prepare your skill for growth.
Enable more customers to engage with your skill, and get paid for eligible skills that customers love more. Every month, developers can earn money for eligible skills that drive the highest customer engagement in seven eligible skill categories. Learn more and start scaling your skill today.
Special thanks to Kashif Imran, Senior Solutions architect at AWS, for his contributions to this blog post and the AWS scaling guide.