When building for voice, speed matters. When you speak to a friend you typically expect a fast response back which reassures you that your friend is listening and engaged in your conversation. Similarly, when customers talk to Alexa, they expect her to respond quickly. For your Alexa skill, high performance is important for engaging customers and retaining their interest.
In this post, we present a list of AWS Lambda best practices and Amazon DynamoDB and Amazon S3 resources for creating high performance Alexa skills. Depending on your needs, skill traffic and availability across different regions, implement the relevant best practices to address any issues identified and improve overall User Perceived Latency (UPL).
Lambda allocates CPU proportionally to the memory you allocate to your function. If the memory you provision is higher than 1.8GB, it will assign multiple CPUs. When choosing the amount of memory for your function consider:
Your AWS Lambda function time out value should be less than 8 seconds, which is the maximum amount of time Alexa waits for a skill to respond. You can load test your skill to understand how long your function takes to execute and detect issues with any API calls to other services that may not be set up to handle the Lambda scaling. Keep in mind that overprovisioning may increase the function duration and as a result your concurrency requirements and costs.
Your Alexa Skill AWS Lambda function receives a request every time the user interacts with your skill, when an event is dispatched to it, or when it gets invoked by another service. The term concurrency refers to the number of instances serving requests to your Lambda function at any given time. Your AWS account has by default a concurrency limit of 1,000 invocations across all functions in a region and Lambda starts throttling requests when this limit is exceeded. To ensure you have configured the right values for concurrency use the formula below to estimate your requirements:
Concurrency = requests per second * function duration
You can use AWS Lambda metrics on CloudWatch such as ConcurrentExecutions, Invocations, and Duration to monitor concurrency. You can also set up a CloudWatch Alarm to get notified if a metric threshold is surpassed and AWS Budget to monitor costs. If your concurrency exceeds the default limit, you can submit a request in the Support Center console to increase it. You can also use reserved concurrency for a function to limit capacity and control scaling and downstream impact on services or databases your function calls.
By caching responses to users’ requests, you reduce the number of Lambda function invocations and improve your skill latency. Read this blog post to find out how to use API Gateway as a passthrough API server and store data so that it can be served faster.
Assume a skill that is available in US and Germany which points to a Lambda function deployed in the US East-N. Virginia (us-east-1) AWS Region. Skill users in Germany may experience higher response times compared to those in the US as their skill requests will trigger a cross-regional call from their region to us-east-1 where the skill Lambda function is deployed.
Cross-regional calls may increase your skill latency. For skills that are available in multiple geographical areas, it is best practice to create Lambda functions in the closest AWS regions supported by Alexa. When configuring your skill endpoint in the developer console, fill in the ARN to be used in each area. For example, for US and Germany you need an ARN in us-east-1 for North America and one in eu-west-1 for Germany.
Follow the Best Practices for Working with AWS Lambda Functions - Function Code. In your skill, consider removing any non-blocking execution including analytics code from the front-end unless you use more than 1.8GB memory (multi core processing).
Cold start is the time your Lambda function requires to serve the first request to your skill and during which it downloads your skill’s code, initiates a new container, bootstraps the runtime, and starts the skill’s function. Language choice, package size, and code all have impact. To improve cold start times:
If you are using Java, check out the Running APIs Written in Java on AWS Lambda blog post.
If you host your files on S3, cross-regional calls between your AWS Lambda function and S3 may cost in terms of latency. Make sure that your S3 bucket is in the same region as your Lambda function and if you are hosting in multiple regions, create an S3 bucket in each region. You can use cross-region replication to maintain copies in different regions. If you are still experiencing issues, check the size of your file and try to reduce it if it is too large. For audio files, you could use different codecs and audio compression techniques and if necessary, split a file into smaller ones and play them in a queue. If you still observe delays, you can use Amazon Cloudfront which leverages regional Edge caches to improve performance. Find out more about Cloudfront pricing.
When using DynamoDB for data storage, you have the option choose the capacity mode for your tables. If you get consistent traffic or you can predict the load, provisioned mode (default) is recommended. With provisioned mode, you can increase or decrease the default limits for throughput capacity depending on the read/write activity of your skill. You can also optionally allow DynamoDB auto scaling to manage throughput capacity. If your skill traffic is less predictable or you see spikes in traffic, consider the on-demand mode. Refer to this blog post for a performance test for on-demand capacity. Review the DynamoDB Best Practices for further suggestions on how to improve performance.
High performance is essential for creating an engaging, natural voice-first experience. Get started today and follow the above best practices for your existing or new Alexa skills to create compelling voice interactions that retain customer interest.