When building for voice, speed matters. When you speak to a friend you typically expect a fast response back which reassures you that your friend is listening and engaged in your conversation. Similarly, when customers talk to Alexa, they expect her to respond quickly. For your Alexa skill, high performance is important for engaging customers and retaining their interest.
In this post, we present a list of AWS Lambda best practices and Amazon DynamoDB and Amazon S3 resources for creating high performance Alexa skills. Depending on your needs, skill traffic and availability across different regions, implement the relevant best practices to address any issues identified and improve overall User Perceived Latency (UPL).
If you would like to refresh your knowledge or familiarize yourself with the basic concepts, check out the AWS Lambda Getting Started and Using AWS Lambda with Alexa Skills guides.
Tip #1: Improve your Alexa Skill Latency by Provisioning the Right Amount of Memory
Lambda allocates CPU proportionally to the memory you allocate to your function. If the memory you provision is higher than 1.8GB, it will assign multiple CPUs. When choosing the amount of memory for your function consider:
- Whether your function is memory or CPU bound. For CPU intensive functions provisioning more memory may lead to faster function execution and therefore less cost even if not all the memory allocated is used.
- Whether you will benefit from multiple core processing based on your code and chosen language. If you do not use multi-threading you might not see improvement after 1.8GB. It is important to test your function at different memory values and compare memory usage and duration to identify the optimal amount of memory.
Tip #2: Adjust the Function Time Out
Your AWS Lambda function time out value should be less than 8 seconds, which is the maximum amount of time Alexa waits for a skill to respond. You can load test your skill to understand how long your function takes to execute and detect issues with any API calls to other services that may not be set up to handle the Lambda scaling. Keep in mind that overprovisioning may increase the function duration and as a result your concurrency requirements and costs.
Tip #3: Avoid Request Throttling by Setting the Concurrency
Your Alexa Skill AWS Lambda function receives a request every time the user interacts with your skill, when an event is dispatched to it, or when it gets invoked by another service. The term concurrency refers to the number of instances serving requests to your Lambda function at any given time. Your AWS account has by default a concurrency limit of 1,000 invocations across all functions in a region and Lambda starts throttling requests when this limit is exceeded. To ensure you have configured the right values for concurrency use the formula below to estimate your requirements:
Concurrency = requests per second * function duration
You can use AWS Lambda metrics on CloudWatch such as ConcurrentExecutions, Invocations, and Duration to monitor concurrency. You can also set up a CloudWatch Alarm to get notified if a metric threshold is surpassed and AWS Budget to monitor costs. If your concurrency exceeds the default limit, you can submit a request in the Support Center console to increase it. You can also use reserved concurrency for a function to limit capacity and control scaling and downstream impact on services or databases your function calls.
Tip #4: Add Caching for Improved Latency
By caching responses to users’ requests, you reduce the number of Lambda function invocations and improve your skill latency. Read this blog post to find out how to use API Gateway as a passthrough API server and store data so that it can be served faster.
Tip #5: Avoid Cross-Regional Calls
Assume a skill that is available in US and Germany which points to a Lambda function deployed in the US East-N. Virginia (us-east-1) AWS Region. Skill users in Germany may experience higher response times compared to those in the US as their skill requests will trigger a cross-regional call from their region to us-east-1 where the skill Lambda function is deployed.
Cross-regional calls may increase your skill latency. For skills that are available in multiple geographical areas, it is best practice to create Lambda functions in the closest AWS regions supported by Alexa. When configuring your skill endpoint in the developer console, fill in the ARN to be used in each area. For example, for US and Germany you need an ARN in us-east-1 for North America and one in eu-west-1 for Germany.
Tip #6: Optimize Your Function Code
Follow the Best Practices for Working with AWS Lambda Functions - Function Code. In your skill, consider removing any non-blocking execution including analytics code from the front-end unless you use more than 1.8GB memory (multi core processing).
Tip #7: Improve User Experience by Addressing Cold Starts
Cold start is the time your Lambda function requires to serve the first request to your skill and during which it downloads your skill’s code, initiates a new container, bootstraps the runtime, and starts the skill’s function. Language choice, package size, and code all have impact. To improve cold start times:
- Optimize your function code: If your code is not optimized it might lead to higher cold start times. (See also Tip #6 above.)
- Reduce your Lambda package file: When adding dependencies to ASK SDK or AWS SDK, select only the modules or services you need rather than the full SDKs. If you have multiple skills that share common code such as SDKs or other libraries, you can move them to AWS Lambda Layers and exclude them from your deployment package.
- Increase allocated memory: This will increase CPU capacity and may help cold start times.
If you are using Java, check out the Running APIs Written in Java on AWS Lambda blog post.
Amazon Simple Storage Service (S3)
If you host your files on S3, cross-regional calls between your AWS Lambda function and S3 may cost in terms of latency. Make sure that your S3 bucket is in the same region as your Lambda function and if you are hosting in multiple regions, create an S3 bucket in each region. You can use cross-region replication to maintain copies in different regions. If you are still experiencing issues, check the size of your file and try to reduce it if it is too large. For audio files, you could use different codecs and audio compression techniques and if necessary, split a file into smaller ones and play them in a queue. If you still observe delays, you can use Amazon Cloudfront which leverages regional Edge caches to improve performance. Find out more about Cloudfront pricing.
When using DynamoDB for data storage, you have the option choose the capacity mode for your tables. If you get consistent traffic or you can predict the load, provisioned mode (default) is recommended. With provisioned mode, you can increase or decrease the default limits for throughput capacity depending on the read/write activity of your skill. You can also optionally allow DynamoDB auto scaling to manage throughput capacity. If your skill traffic is less predictable or you see spikes in traffic, consider the on-demand mode. Refer to this blog post for a performance test for on-demand capacity. Review the DynamoDB Best Practices for further suggestions on how to improve performance.
High performance is essential for creating an engaging, natural voice-first experience. Get started today and follow the above best practices for your existing or new Alexa skills to create compelling voice interactions that retain customer interest.
- AWS Lambda Developer Guide
- AWS Lambda Scaling Guidelines
- Getting Started with Amazon Simple Storage Service
- DynamoDB Best Practices
- Best Practices for Building Alexa Smart Home Camera Skills