Picture this: You’ve worked hard to design, build, and test your Alexa skill. Everything is working great. It passes certification with flying colors. But soon after launch, you start hearing reports that your skill is behaving badly. Customers who ask for information about New York are hearing information about Seattle. Customers who ask for cat facts are getting dog facts. Customers who give the right answer to a trivia question are being told that it’s the wrong answer. Chaos! What’s going wrong?
These could all be symptoms of poor state management. Let me share some of the basic concepts of state management, best practices to embrace, and pitfalls to avoid.
Alexa skills are, by design, inherently stateless. Unless you do some work, each request/response loop between the Alexa service and your web service is a clean slate with no knowledge of what happened before.
But there are many times when you need to make decisions on what to do next based on what happened previously. You need to know some information about who the user is and what they have done before. You may need to remember their current choices and actions to inform future decisions. In short, you need to do some state management.
Let me share how to apply state management at three different levels: the application level, the session level, and the user level.
Examples of application-level state management include settings that are truly global across your entire application. These are settings that need to apply to all users in all situations. These may include things like environment variables or connection strings.
If you are using AWS Lambda, be aware that any global variables you declare in your Lambda code will get reused by different users when the Lambda container is recycled. This is behavior that you can’t control. But if you don't plan for it, you'll discover exciting new bugs as soon as your Lambda starts running at scale under the real user load! This explains why one customer may ask for cat facts, and end up hearing dog facts instead - because a Lambda container initialized with a ‘factType=dog’ variable got recycled. It also explains why you might not have caught these bugs in testing – because you didn’t test under enough load to force container re-use.
If you truly need application-level state management, you can rely on Lambda global variables. This is most beneficial when setting up these variables is costly and involves reading from a slow data store. In cases like these, you can limit this performance hit to only the cases where the global variables are not already initialized.
For more reading on this topic, refer to Frederik Willaert's LinkedIn post.
To better understand session-level state management, let's start with a definition of an Alexa session. The blue lights on your Alexa device are your friend! An Alexa session starts when the user invokes your skill, turning on the light ring. This is the request. Your skill will formulate a response which will then be spoken to the user.
With this response, you control what happens next. Your skill's response will include the shouldEndSession boolean parameter. When set to true, the session ends with the delivery of the response, turning off the blue lights. But if this parameter is false, you're indicating that Alexa should open the stream for another request following the delivery of the response. The blue light will be on, and your session remains active.
If the user doesn't respond, the reprompt that you specified will be delivered and the stream will remain open for one last chance at a new request. If the user still doesn't respond, the stream will close and the session will end when the blue light turns off.
During this time while your skill's session is active, requests sent to Alexa are directed to the sandbox of your skill. You'll notice that even utterances like "volume up" are directed to your skill. This is true until your session ends.
One exception to the above is the use of the AudioPlayer interface. If your skill uses this interface, your skill session ends as soon as your skill sends a play directive. Your skill may get reinvoked through one of the built-in intents for playback control or any other invocation pattern, but this results in a new skill session.
You can manage state within the context of your Alexa session using the session object. LaunchRequest, IntentRequest, and SessionEndedRequest all include a session object in the JSON definition. Any session-level data that you need to track can be sent with your response object in the sessionAttributes property as key/value pairs. They will be passed back to you with the next request object.
There are some built-in properties that may also be useful to you: session.user.userID and session.user.accessToken.
The Alexa Node.js SDK provides additional tools to make it even easier to work with session state, allowing you to easily switch between session state and user state with a very minor code changes.
If you have a simple multi-turn game, you may be fine using the Alexa session state to persist the game state while the session is in progress. If you are only relying on the Alexa session state, each new session means the start of a new game.
However, if you want a game to span multiple sessions, you'll need to manage state at the user level. You can do this in a lightweight, best-effort way by observing the UserID returned in the request body. Just be aware that this ID will change if the user disables and re-enables your skill. That's what makes this approach best-effort.
The user state is the state that you maintain for the same user across multiple sessions. Compared to the session state, the user state has the ability to offer an improved experience by allowing a game to continue across sessions. In other applications, it becomes essential to track the state at the user level. For example, you may need to share state between multiple applications. For example, your customers may interact with multiple web services, mobile apps, and chatbots in addition to your Alexa skill. If you need to persist any information about the user across these different end points, the Alexa session state will not accomplish this. In this situation, you must manage the state externally.
If you require user-level state management, you'll have to maintain an external data store to persist this state information. Your external data store could be a DynamoDB table, S3 bucket, or any other external data store. The advantages are many, but they come at a cost. There is a financial cost to running these services. There’s also a performance (latency) cost in reading/writing from external data stores. There are also additional factors to consider, like code complexity, maintenance, scaling, and so on. You will also need to consider the security and privacy implications of persisting potentially sensitive user information.
The process of selecting a specific external data store is beyond the scope of this article. The criteria will be influenced by the different endpoints that need to share the data, performance and availability requirements, etc. For example, you can read about some of the pros and cons of using Dynamo DB vs. S3 in our FAQ.
User-level state management also requires a permanent user identifier. The only way to obtain a user identifier that persists across enabling and disabling your skill is to use account linking.
When working with state management, make sure to keep these two tips in mind:
If you follow these guidelines, you’ll never get your customer’s cat facts and dog facts mixed up again!
The Alexa Skills Kit (ASK) enables developers to build capabilities, called skills, for Alexa. ASK is a collection of self-service APIs, documentation, tools, and code samples that make it fast and easy for anyone to add skills to Alexa.
Developers have built more than 12,000 skills with ASK. Explore the stories behind some of these innovations, then start building your own skill. Once you publish your skill, mark the occasion with a free, limited-edition Alexa dev shirt. Quantities are limited.