Editor’s Note: The Alexa Skills Kit (ASK) offers multiple ways to personalize your skill experience to offer more engaging and contextual interactions for customers. Today we welcome a community expert in voice design—John Gillilan, Alexa Champion and founder of bondad.fm—to share some best practices.
Recently at Alexa Live, Alexa Senior Solutions Architect Eric Fahsl and I discussed some key design principles and some powerful Alexa features that enable skill builders like yourself to deliver more personalized experiences to your users. With over 90,000 published Alexa skills, it’s more important than ever to be really thoughtful about not only how you design your interactions, but how you’re leveraging all that the Alexa Skills Kit (ASK) has to offer when building your responses. In this post, I share a recap of what we shared during the session. You can also watch the full 45-minute session below or click the time-stamped links throughout this post to jump to specific examples.
To kick off our session, we looked at five characteristics of a personalized skill experience:
- Answer any question only once
- Preferences should become defaults
- Context is kept throughout the session
- Language should feel natural
- Users feel their experience is unique
To bring these characteristics to life in your skill, consider these three focus areas: memory, variety, and localization.
Like many of you, I host my skills on AWS Lambda, Amazon’s serverless compute service. It’s a great choice because it integrates really well with Alexa, it only runs when it’s needed, and it scales up automatically to handle increased traffic to your skills. But it’s important to understand that by design, AWS Lambda is stateless – meaning that it’s up to you as a developer to maintain short-term and/or long-term memory as needed.
Over the course of a real conversation, we pick up on things from the other person – little cues, references, and contexts. And we naturally store these things in our short-term memory as we navigate and bounce around the conversation. Session attributes are a great way to create a similar sense of state within a single skill session. There are all sorts of powerful use cases for session attributes, including intent specific error states (@ 18:00) and context-specific handler code leveraging the canHandle() function in the ASK Software Development Kit (SDK) (@ 20:21). But no matter the use case, session attributes disappear when the session closes. It’s like shaking an Etch-A-Sketch. Poof! Sayonara!
Persistent attributes, on the other hand, allow you to recall information in subsequent sessions. I use Amazon DynamoDB because it’s so well integrated with the ASK Software Development Kit (SDK), but you can use your own data store if you’re feeling adventurous! Persistent attributes are a great way to store user preferences, current track & timestamp in AudioPlayer, or progress & achievements within a game. Some other creative uses include contextual, dynamic welcome messaging (@ 25:52), and intent-level user onboarding (@ 27:44).
One of the foundations of human conversation is how we naturally create variants to express the same thing. It’s this dynamic flow of language that makes it fun and engaging to talk with people. Eric and I looked at three tactics to achieve this when building skills: randomization, contextualization, and language adaptation (@ 31:55).
When implementing dialog randomization, I would challenge you to think even more modular than the string level, as randomizing multiple segments within a single output will lead to countless permutations across your skill experience. I’ve found that “non-happy path” dialog, such as error states in particular, can really benefit from this – as hearing the same generic error message verbatim multiple times in a row is a surefire way to leave users feeling that your skill isn’t really understanding them.
Alexa is currently available in multiple locales (with US Spanish and Portuguese right around the corner), so zoom out and think globally! One best practice is to decouple your dialog from your handler code and leverage libraries like i18n to do the string picking for you based on a user’s locale. Even if you’re just shipping in your local region to begin with, this will help future proof your architecture if you decide to embrace more languages down the line.
But localization doesn’t just mean literal translation. Language is cultural. And culture is nuanced! Localizing your skill is about understanding the way people speak in each region. This is especially critical for localized interaction model design, but it’s just as applicable to your dialog. Failure to consider this will make your skill experience feel a little disconnected from the people you’re trying to speak with. Check out @ 39:55 to hear about my adventures with British stones when localizing the NFL’s skill for the UK.
There's always something new to learn about the Alexa Skills Kit and new features to explore. My advice is to bite off one thing at a time, try your best to understand it, and think about how it might bring the next skill you’ve been imagining to life. Because at the end of the day, the constant evolution of voice technology is in service of the things that we’ve yet to create.
Thanks for reading. You can find me in the chat during Alexa office hours on Twitch or on Twitter. Feel free to say hello!