As part of my role as an Alexa evangelist, I get to meet talented developers who have embraced voice as their medium of choice to innovate. Gal Shenar is one of those developers. With more than 35 live Alexa skills and 250,000 monthly active users, Gal has “cracked the code” for building highly engaging Alexa skills.
Gal’s story of how he got started with Alexa is very inspiring to me. After learning how to build Alexa skills during a workshop with veteran Alexa evangelist Rob McCauley, he became excited by the potential to create delightful voice experiences. Shortly after this workshop, he started publishing his creations to the Alexa Skills Store. Today he’s the creator of engaging voice-first games such as Escape the Room, Escape the Airplane, and Angel Investor. He also owns his own voice development studio called Stoked Skills.
I met Gal at AWS re:Invent 2018 and was thrilled to have the opportunity to chat with him about games, voice design, and building a business. Gal knows what it takes to build highly engaging, and high-converting, voice-first games. His lessons learned are valuable to anyone looking to build voice gaming experiences.
Here are some of the best practices he shared about building voice-first games for Alexa.
Once you have an idea, don’t be shy, says Gal. Try it out, see if it works well on voice. If it doesn’t, your users will let you know. This is the frame of mind that drove Gal to successfully bring to Alexa something nobody else thought feasible: a voice-driven escape game. Traditionally, these are detail-rich games to be enjoyed in person. But he was able to port the concept over to Alexa when he created his Escape the Room skill.
Playing a good voice game “is like reading a book, but [the user plays a big part in] how the story evolves,” says Gal. He figured out that an escape-the-room voice game made sense, as a naturally communal experience on Alexa. The whole family could solve the puzzles together, picturing the rooms in their minds in slightly different and personal ways.
Thinking about your audience is as important as developing the right idea. Define your audience, but be flexible enough to account for the fact that your users will vary a lot in terms of age, ability, and technical prowess. You never know who will be using your skill and ideally you want your skill to be adaptable to all the possible users.
When developing an Alexa skill, you have three basic choices on how to voice render your content. At one extreme, you have voice acting. At the other, you have Alexa as the sole narrator. Recording voice actors is great to create the precise atmosphere for your game, but it does not afford you a lot of flexibility.
For example, if you have to add new content, you have to go back to the voice actor. Also, you can’t easily echo back user input during an in-skill conversation. On the opposite spectrum, with Alexa as the sole voice for your game you have total content reproduction flexibility (check SSML here). Alexa’s voice, while familiar, may not trigger the same sense of immersion as a voice that’s unique to your game.
Another option is Amazon Polly, which provides the best of both worlds. Polly is an AWS service that converts texts into life-like speech and lets you iterate faster. Polly easily integrates with Alexa skills and you can select one (or many) of the dozens of different voices and supported locales.
Listening to too much speech can be slow and interest-draining for your users. Sound effects can do two things for you: they are great at creating atmospheres and they can convey context to the user in quick way. Think about the difference in ambient creation between describing the opening of an old creaking door, and the hearing of the actual sound. Or the immediacy of using a negative buzzer versus repeating “that’s the wrong answer” for a quiz game.
Sprinkling the right amount of sound effects into your speech will delight your users with variety and convey a lot of information in very little time. If you’re looking for sound effects, start with the Alexa Skills Kit Sound Library.
One of the things that make Gal’s skills enjoyable to play is that the game dynamics are fluid. To do this well, he recommends spending a lot of time imagining, projecting, and visualizing how your users will play. This include simplifying when needed.
For his Escape the Room skill, for instance, he reduced the number of actions a player can perform. In some rooms, he does not allow the user to pick up items, opting to automatically pick up of items when they are found. No need for that unnecessary step of picking up something you saw, since not everything will be able to be picked up. Remove unneeded friction.
When building his first Escape the Room voice game, Gal quickly realized that he needed to account for scale and reusability. He decided to invest the time to design a modular framework to separate the voice experience from the content. Once he got that right, he had a consistent UI scaffold, ready to take on new content. The way he did this was to identify and implement a series of primitives (look around, look at something, use item) that basically represented a mini game with contextual autonomy.
Each room in Escape the Room is structured in the same way, with files for each direction and objects in the room. Each object and direction inherits from a class that already has all of underlying code to make it behave properly, so all he needs to provide is descriptions for each object and how they behave when interacting with items. Gal achieved his end goal to build a mini game structure to just fill with new JSON.
As much as we would like, users are not likely going to use our skills the way we thought they would. There are almost infinite variations in the way people speak. It’s called natural language for a reason! So we have to account for the unexpected and handle it in a graceful manner that does not create friction. Gal goes a step forward and thinks in term of directing his users to those parts of his skills he knows that can provide a delightful experience. He does that with a carefully balanced guidance: too much “helpful” speech early on and the drop offs will increase, too little and the users will get stuck.
Handling the unexpected user input gracefully is not enough though. Great skill developers like Gal act upon those temporary failures and improve their game. Read the reviews your users will leave and act upon that feedback. That’s a treasure trove for you to take your skill in the right direction. As Gal says, “The reviews have been one of the most useful forms of feedback to improve my skills.”
There’s a lot we can do with voice-first games. And also new ways to make money with Alexa. Gal found a creative way to monetize Escape the Room and Escape the Plane with in-skill purchasing by providing premium access to hints to solve the puzzles. More on that, and other monetization considerations, on an upcoming post about my conversation with Gal on monetizing Alexa skills.
To learn more about building voice-first games for Alexa, register for our upcoming webinar: How to Build an Alexa Game Skill That Your Customers Will Love on March 4 (10am-12pm PT).