Designing New Multimodal Games with the Alexa Web API

Joe Muoio Dec 04, 2020
Design Tips & Tools Game Skills Multimodal

The Alexa Web API for Games opens up a new frontier in gaming with Alexa, giving you tools to create new multimodal experiences for your players. Whether you’re enhancing an existing voice-only game or creating a new skill with the on-screen experience as the star of the show, you can follow these design best practices to make your skill a memorable gaming experience.

Building on universal multimodal best practices

Through Alexa Presentation Language (APL), we've learned what best practices make for great multimodal experiences. Your Web API for Games skill can build on these design insights to make your game even more accessible and fun to play. Here are some best practices for any multimodal Alexa skill:

Be voice-first, but not voice-only

Whether you’re building a game only playable on Alexa Web API-capable devices or enhancing a voice-only game with the Alexa Web API; Alexa experiences are voice-driven, and your players will expect to play the game primarily with their voice. This means:

  • Core game play is voice-driven: Even if looking at the screen is required to play, make sure voice interactions drive the main game.
    • Each touch (or remote) target on screen should have a voice command analog. Don’t force your player to interact by touch/remote to enjoy the game.
    • Allow players to easily switch between modalities if it makes sense for your game. Players may prefer (or need) to use their voice for some interactions and touch for others at different times, so consider adding optional corresponding touch interactions for some core game controls. You have full control over most DOM events, so you should incorporate these into your design.
  • The on-screen experience is timely, useful, and delightful: The most significant moments in the game play are punctuated with both useful and delightful interactions. Ensure your visuals are displaying important, contextually relevant data such as the score of the game, or visual feedback on who just won the round. Focus your design energies on the most impactful moments, characters, and user interactions. For instance, when working with characters, consider animating them, or celebrate the player when they finish a level with some timely audio and animation.

Design for accessibility, discoverability, and readability

Since the Alexa Web API for Games reaches FireTV and Echo show devices specifically, design for how customers use these devices. Customers frequently view TV and Echo Show screens from distances of three feet to ten feet.

  • Design for optimal viewing distances: Be sure that the important parts of your game are viewable at 10 feet, and the rest of the game’s on-screen components are viewable from 3 feet away. Use appropriate fonts in sizes and weights that will be viewable to your players in their context: Their living rooms, kitchens, desks, etc. In addition, consider best practices when working with the pixel density on various devices. Learn more about typography best practices in the Alexa Design Guide.
  • Choose a useful, accessible color palette: Use colors to convey meaning and visual hierarchy. Colors which stick out more should be used on more important elements of the design. Use a contrast ratio of 4.5:1 or better for your foreground and background colors. You can use online calculators for this. Learn more about color best practices in the Alexa Design Guide.
  • Use the layout to help the player: The most important information, controls, and action should take up the most real estate on the screen. Make sure screens with similar functions look similar to the player. For instance, if you have an on-screen heads-up display (HUD) for stats that appears in the bottom-right quadrant of the screen during game play, it should always appear there when the player asks for that information. Learn more about how to use a grid to create clear layouts in the Alexa Design Guide.
  • Make your game accessible to all players: Support an intent for telling the customer what is on screen, if possible. While you may not be able to capture everything in a succinct voice response, respond with what is most important to the player’s decision on what to do next. Learn more about how to design accessible skills in the Alexa Design Guide.

Design an engaging experience for all devices

It may not be possible to realize your full game design on all Alexa-enabled devices. Speaker-only devices, like the Echo Dot, and older devices like the Echo Spot do not have the Alexa Web API for Games support. Here are some best practices for games that can only be played on Alexa Web API capable devices:

  • Support features with robust audio descriptions for devices without a screen: Enable features which make sense to be available on audio-only devices. For instance, if you’re tracking the player’s score, milestones, and achievements, they should be able to ask any device “Alexa, ask Joe’s Word Game who has the top score” and Joe’s Word Game on a speaker-only device should have an answer.
  • Tell customers what the skill can support, and where they can play the game: If a player accesses your skill with a command you can’t support on a speaker-only device, direct them to what the skill can do on that device and what the player can do on an Alexa Web API capable device. For instance, if a player asks “Alexa, play Joe’s Word Game” on a device without a screen, Joe’s Word Game should say something like “You can play a round of Joe’s word game on any Echo Show or Fire TV. Want to hear the latest scores?”

For your voice-only experience, follow best practices for any voice-first/voice-only skill for Alexa experiences that do not have a screen. Learn more in the Alexa Design Guide.

Web API for Games specific guidance

Because the Alexa Web API for Games unlocks new kinds of experiences for your players with features not available for other skills, you’ll need to consider a few additional design possibilities for these interactions.

  • Consider session duration and how players will play your game. You can leave the web application open for 30 minutes without interaction. You can use this extended session time for playing web audio, animations, and more. If the customer says anything to stop the skill session explicitly (such as “quit”), the web application will be closed.
  • Use web audio to have dynamic control over the playback of your sounds over multiple turns in the conversation. You can play many simultaneous audio tracks and adjust the volume, seek, pause, and loop the audio sources in conjunction with your animations and the Alexa text-to-speech from your skill backend. If you use the Alexa JavaScript library utility, FetchAndDemux, you can even play the Alexa text-to-speech with the power of web audio.
  • If you are using web audio, consider that it does not reduce volume by default. Be sure to listen to the voice events to reduce the volume so it does not interrupt your player when they are trying to speak. Also, listen to the speech events, so you can reduce the volume of web audio when Alexa text to speech is playing.
  • Consider using the microphone APIs to request for a mic to open. This lets you design non-conversational experiences. For instance, you can open the microphone in response to an on-screen button press or after some web audio stops playing.
  • Design fallback scenarios for push to talk only devices if you instruct the customer to say the wake word as a part of the game play. For example, on a FireTV stick you may want to show an overlay telling the player to “press the microphone button now” instead of “say Alexa” since the device does not have wake word detection.

Skill side authoritative design

If you are creating a skill-side authoritative Alexa game, you are putting the core of your game logic in the back end. The cloud side will be the authority of the game and the HTML side of the experience will be purely additive. If you have built any Alexa skill before, you will be familiar with this approach as it is the only approach you can take without the Alexa Web API for Games! This comes with the main benefit of being easy to create a voice-only experience, allowing you to reach more players with your game. Here are some design considerations if you want to build a skill-side authoritative game:

  • Make the game fully available in a voice-only context, if possible. For some games, the visuals might be vital to the experience even if it built as skill side authoritative. But, if you can provide the core gaming experience, do it.
  • Design your visuals to react to the different skill side changes. When an intent handler is invoked, play a corresponding animation or update the screen to reflect the new state. This requires sending your state information from the backend as things change with the HandleMessage directive.
  • Use audio generated from the skill side for consistency. You can also manage a combination of web audio and text to speech by leveraging the fetchAndDemux API for more granular control when syncing visuals and text to speech. If you want to use even more web audio, consider that this will not be available to the voice-only experience without additional complexity. APL for Audio is a great tool, as well, if you need dynamic audio in your game and want to apply this to all Alexa-enabled devices.

HTML authoritative

In an HTML authoritative skill, the main game logic will be handled on the web application in JavaScript running on the device. This is really nice for time or frame based games and would be familiar to you if you have made JavaScript games for the web. In this kind of experience, the core game will not be available on all Alexa-enabled devices. But, you have more freedom to experiment with novel games and can even port a JavaScript game you have already made — as long as it makes sense to play it by voice!

  • Consider using more web audio. You are not bound by the 240 second limit imposed on audio by Alexa services since the browser on device will be playing this audio. Without a requirement to have parity with a voice-only experience, the complexity in building two audio experiences is gone and you can make use of dynamic audio more easily.
  • Route voice commands to the web application for processing. Your Alexa skill side can remain fairly simple if you route the core game intents to the WebView with the send message directive. Design the game from the perspective of the web application and add voice commands to all important actions in the game.
  • Time-based games like simulation games become a possibility. While it is still a possibility in any Alexa game skill, now you can make items move around on screen in real time and interact with the customer without trips to the cloud.


The Alexa Web API for Games gives you a large design space to explore. While much of the general multimodal skill design wisdom still holds true, the possibilities open up with the power that the Alexa Web API for Games gives. Design for voice first and follow best practices to make your game accessible and fun to play. You have more tools at your disposal with web audio, the microphone interface, and extended skill session APIs to make some novel and immersive games. Let me know what you are building next @JoeMoCode on Twitter.

Related Articles