So, you want to build a game on Alexa. If you don't, you should! We just kicked off the Alexa Games Skills Challenge (read the blog announcement post).
You're pumped, that's great! But before you dive into the coding, below are a couple of principles that hopefully will help you scope and refine your game ideas in the context of voice. Keep in mind that although they may apply to building games in general, most are specific to building games for voice using the Alexa Skills Kit and should follow the voice design best practices found in the Amazon Alexa Voice Design Guide.
Making a game fun seems obvious right? Well, bear with me. Remember we are building for a very specific form of interface, that predominantly uses two parts of our body: our sense of hearing to receive communication from the game, and our voice to formulate inputs back. Also, Alexa is inherently a turn-based form of interaction as every user input has to be processed and sent to your skill back end, which then responds. Therefore it's important to choose game mechanics that are not degraded by the type of interactivity that Alexa offers.
A great example of a game skill that understood these constraints is Yes Sire. Currently with a 4.8 star rating and 435 reviews in the Alexa Skills Store on amazon.co.uk, it really proves that simplicity can go a long way if the core mechanic does not conflict with the medium. Quoting their (brilliant) description: “you sit as a medieval lord of the realm, presented with an ever-expanding array of difficult choices. Make good choices and stay in power as long as you can!”. The game play is fun and can get quite difficult, but the core mechanic is simple: just decide “yes” or “no” to all incoming decisions to make. Every decision the player makes has a cascading effect on the resources available to the kingdom and the player's social standing.
Another example is Opearlo's Guess My Name (4.3/5 stars, 93 reviews on amazon.co.uk). The skill chooses a random famous character and players have to give the name as soon as they know it. Given their game is usually played in a group setting, they came up with an interesting game mechanic: the skill continuously outputs progressively easier hints in a single output, similar to a countdown. It's a race against time: as soon as you know what character Alexa is thinking of, you shout it out, like “Alexa, are you Barack Obama?” They observed that in focus groups users would naturally speak to one another trying to determine what character was being thought of. This would have caused issues with the voice recognition had the game regularly prompted the user for input at every hint.
You don't need to copy Yes Sire to make a hit game, and extremely successful games like Trivia Hero allow open answers to trivia questions (although they put in a lot of effort to make sure even the wrong answers are added to the voice model—more on that later).
Remember, our ears are the main sensory organ for the majority of games on Alexa. Another industry that has consistently engineered their content for the ears is radio — and they have been around for over a hundred years. Learn from them! Listen to your favorite radio channel and try to notice all the subtle jingles and sound effects that are added to enhance the auditory experience.
When it comes to audio you have to make a design decision: do you use Alexa's voice (i.e. the text-to-speech engine) or do you record custom audio for every response your skill will have? Or, will you do something in the middle, such as using other generated voices (e.g. from Amazon Polly) or only use custom audio for parts of the game?
Going full custom audio gives your game a unique “signature” as well as offers absolute control on inflection, pronunciation and emphasis. If done correctly, it will also sound very natural and immerse the user in a world that a text-to-speech voice will not be able to match. Checkout One Piercing Note: A RuneScape Quest for a brilliant example. The drawbacks of designing a game with pre-recorded human audio are that once the audio is recorded, the copy cannot be changed without re-recording or editing the audio files. It can be costly, especially if you are using professional voice actors and you want to continuously update the game with fresh content.
Leveraging text-to-speech is not as natural as a professional voice actor, but offers several advantages. The first is that content can be dynamically generated on the fly, just by changing the text. For some game settings, a computer-generated voice may even fit better than a human one (e.g. if you are building a space game and Alexa assumes the role of a ship assistant or AI co-pilot).
Leverage SSML
Although Alexa does a pretty good job of reading your text and punctuation to determine how to read your content, don't forget you have a ton of controls on how Alexa's text-to-speech engine interprets your text thanks to a standard markup language called SSML: speech synthesis markup language. Wrapping your text responses with these markup tags will essentially instruct Alexa how to “read” your text. For example you can:
amazon:effect
audio
tag
break
tagemphasis
tags
and p
tags (respectively)phoneme
)prosody
tagsay-as
Implementing a great game doesn't stop when the Alexa certification team gives your skill the green light. Ensure that your skill has replay-ability or the content is updated regularly so that players naturally feel compelled to return to your game over time. Here are some tips on how to get users to come back:
Back in the day, I remember playing Medal of Honor: Allied Assault Multiplayer Spearhead Demo (quite a mouthful). Despite the demo offering only two maps, I played for months on end thanks to the fact that the game was multiplayer (I actually applied to be part of an exclusive sniper clan ST1 Nox). What made the game so fun was that the other human players were unpredictable, and therefore gave infinite replay-ability to the limited demo game. Had it been single player, I probably would have stopped after the first hour.
Keeping in mind our constraints from our first tip above, try to find ways to interconnect your players' experiences and make use of this incredible source of replay-ability factor: not knowing what other people will do.
The simplest way to do this is with leaderboards. Keep track of high scores, and tell users how they are performing with respect to others. You can even make your games exclusively multiplayer by interconnecting player decisions (a la EVE Online).
One word of caution: Remember that the Alexa experience is inherently turn based, in that users need to wait for Alexa to process their input, send it to the skill, and respond back. Therefore, avoid choosing game mechanics that rely on quasi real-time interactions. Opt instead for asynchronous interactions that are not extremely time dependent. For example if you decide to build a strategy game, each player could have a whole day to complete their turn before advancing the game tick by 1 at midnight.
Also keep in mind the context in which users typically use Alexa devices: in the home and in the car. What do those two contexts have in common? There may be multiple people present! And usually, those extra people are friends or members of the family. These settings are a perfect opportunity to play fun social games that bring people together. A great example is Would You Rather (4.6/5 stars). This skill has perfectly understood the usual context of an Alexa device, and used it to its advantage. The game is simple yet sparks long and debated conversations among the players and users love it!
For this reason, when possible, try to make the experience compatible with local multiplayer. This will simplify the back-end architecture since the multiplayer state is confined to a single instance of the game. For example, a quiz game like Trivia Hero (4.5/5 stars) supports up to 20 local players!
You can also consider adding support for the Echo Buttons, a programmable Alexa Gadget that pairs with your Alexa device. Each button illuminates and can be pressed to trigger a variety of play experiences. Check out some of the code samples. Keep in mind however, that although the buttons will add a whole new dimension to your game, your skill should gracefully handle the possibility that not everyone has those devices.
Customers love listening to stories (whether it be via Audible or story-based game skills such as One Piercing Note or The Magic Door). Remember that all Alexa skills should be voice-first, regardless of whether you are making use of the multimodal displays. This means that your words will be the main interface. Some see not having a screen as a disadvantage. I see it as being able to harness the highest resolution graphics processor in existence: our imagination.
Immerse your users into your game world by dedicating a good amount of time on detail, backstory and above all: good writing. As I always say: write for the ears, not the eyes. What sounds good in our heads as we type it out does not always sound as good when spoken out loud. Act out every interaction, regardless how short, and test it out with real people before committing it to your game.
Going back to tip four on local multiplayer: remember that Alexa-enabled devices may be around other family members, and some of your biggest fans could be the ones that didn't buy the device itself: kids. Compared to grumpy adults (trust me, I am one), children have more fervent imaginations, shorter learning curves and a higher tolerance for imperfection (read: more forgiving). If you adapt your game to be kid-friendly, or even entirely dedicated to children, you will not regret it! As Spiderman's uncle said: with great power comes great responsibility. Be mindful of your audience and try to add positive messages and remember that every moment in a kid's life is a learning one.
If you follow the best practices above, you're on your way to building an engaging Alexa game skill. And as more and more customers use your skill, they will expect a frictionless voice experience. Here are a few additional tips for ensuring your skill is ready for high customer usage.
Don't underestimate the work that goes into building a solid voice model. If your skill starts receiving more traffic, the chance that someone says something you haven't thought of goes up exponentially. Put in the work ahead of time for catching all sorts of bugs and missed utterances by leveraging the beta functionality before submitting for certification.
Even once the skill is live, make sure to monitor it constantly and catch any missed utterances that should be added or at least handled gracefully. View your users' anonymised intent history, and continuously improve your skill.
As mentioned in first tip, Labworks.io's Trivia Hero is an open-ended trivia game and consequently they needed to ensure that even the wrong answers to the game were understood correctly, so that they could be accurately said back to the user. In general, having good confidence of what the user said also makes it easier to provide contextual replies to the user (e.g. Alexa: “What is the capital of England?” → User: “Liverpool” → Alexa: “Liverpool is a city in England, but London is the capital).
If you decided to use your custom server (instead of the recommended AWS Lambda backend), make sure it's ready to scale, or that it can absorb high transactions per second. If you publish your game across several countries, your could potentially have hundreds of thousands of unique users per day, if not more. If you plan to use AWS Lambda, make sure you're aware of the limits of the free tier and be sure to sign up for AWS promotional credits.
Like your server, your database could also become a bottleneck. Ensure you stress-test your database of choice and if you see it's buckling once you start bringing the heat, consider scaling it or using one specifically engineered to scale such as Amazon DynamoDB.
Your “Measure” tab on Alexa Developer Console offers great analytics, including a full-blown path analysis of your users based on intents. This gives you an overview of how users typically flow through your skill, allowing you to pinpoint areas of high drop-off or friction (e.g. leading to the Help intent).
One thing you might want to monitor yourself is what attributes (or slots) users choose over others. For example if your game offers to choose the preferred animal as a character type, logging what animals have been chosen will give you a good pulse on what people like, what animals need adding, and which popular animals may even merit additional dedicated content! For example, 169labs, a skill development agency based in Germany, use Slack to keep a pulse on missed slot values. For example: if a user requests a value that is not handled by the backend, the skill posts a message to Slack so the team is instantly alerted. By setting in place mechanisms similar to this, if users start requesting a specific value (e.g. unicorn), as a developer, you can prioritize adding a unicorn character to your game.
Developers in the United Kingdom, Republic of Ireland, Germany, and Austria can enter the Alexa Skills Challenge: Games with BeMyApp. This is an opportunity for you to create entertaining, engaging, and educational game skills for Alexa and compete for over €50,000 in prizes. All you need to do is publish an Alexa skill within the Games, Trivia, and Accessories category in the Alexa Skills Store on amazon.de or amazon.co.uk. Click here to learn more and enter the challenge.
If you have questions about the skill you're building the Alexa Skills Challenge: Games, reach out to me on Twitter at @muttonia or join me during office hours below to get answers. You can also check out the following resources: