Web API for Games: Tips and Tricks (January 2023 Edition)

Ryan Matthews Jan 24, 2023
Alexa Skills

Today, there are over 100 million Alexa-enabled devices around the world. The number of active users of Alexa has more than doubled since 2019, and engagement with third-party Alexa-powered devices has doubled since 2020. Many of Alexa’s customers love playing games in a variety of categories — from trivia games like Question of the Day to game shows like The Price is Right — on their devices, and in this new series, you’ll learn some tips and tricks on how to build great games on Alexa that focus on richer screen experiences leveraging the Web API for Games.


This first blog touches upon art of the possible, shares some success stories from our top developers, and dives into performance optimizations you can make to keep your customers coming back for more. In the next blog, we'll go deeper on performance and speak specifically on sample framework integrations. Stay tuned!


Selecting the right Alexa technology for your game

When developing games for Alexa devices with screens, Alexa provides two technologies for rendering your presentation: Alexa Presentation Language (APL), and the Web API for Games. Depending on your game’s needs, you may use either or both technologies.

Alexa Presentation Language (APL)

APL is a compact, declarative design language targeted at creating flexible graphical layouts with basic touch interaction, using minimal device resources. All Alexa devices with screens are capable of displaying APL, and APL provides tools for adapting your design to the vast variety of Alexa enabled screens, across many form factors, from many manufacturers. APL’s document model complements the Alexa Skills Kit voice model, allowing you to send a new layout with every skill response. APL is designed around a data binding concept, making it easy to dynamically adjust your layouts by defining slots in your layout, which will be swapped at presentation time by consulting a separate JSON object you specify.

APL includes components for arranging bitmap and vector images and text, as well as components that can host audio and video playback. APL provides a local command processing facility that allows you to specify how messages are sent to your skill’s endpoint in response to customer interactions, as well as make presentation adjustments to indicate you have done so. The Alexa Developer Console provides a comprehensive APL editor with syntax checking and preview.

If your game is primarily voice driven and you’d like to augment it with some simple imagery and optional button press interactions, then to reach the largest number of Alexa devices, consider implementing an APL version. To learn more, the trainings and workshops are designed to help you deploy an APL skill in as little as 5 hours. APL Ninja is a great community driven resource for exchanging tips and tricks with other builders.

Web API for Games

The Web API for Games lets your Alexa skill deploy a web app to compatible Alexa devices, which will host your app in a Chromium based web view. Your web app can leverage any standard web technologies like HTML, CSS, JavaScript, WebAudio, and WebGL to create traditional, fully animated, 2D and 3D, device side processed, interactive games. Using the Alexa JavaScript API, your web app can send and receive authenticated messages to your skill’s endpoint, letting you include Alexa voice intents and Text To Speech processing in your game.

Because Web API game development uses standard web technology, you can easily leverage any existing JavaScript modules like language processing tools, texture generators, or event routers, existing HTML5 presentation frameworks for graphics and audio like ThreeJS, PixiJS, HowlerJS, or CreateJS, or even entire game engines like Phaser, or BabylonJS. You can host your web app at any publicly visible HTTPS location, and you can make additional HTTP calls to load assets and use backend services anywhere on the web.

The Web API for Games is only supported on select Alexa devices that have sufficient processing power to host a web app. This includes all Amazon Echos Show (excluding the Echo Spot), and Amazon Fire TV Stick and Cube devices 2018 and later. If you’d like to create a game that requires GPU accelerated 2D and 3D rendering and animation, touch interactions that require custom logic like gestures or drag and drop, sound mixing that carries between player interactions, or game logic that requires a simulation loop, then consider making a Web API version.

Working across APL, Web API, and devices without screens

If you’ve decided that your game can support and would benefit from both the reach of APL and the richness of Web API, then you can provide both versions to your customer. The ASK skill request will identify which capabilities the device in use has, and you can decide on the fly which of your experiences to offer. Similarly, if your skill does not find any display capabilities in the skill request, you may either offer a voice-only version of your game, fall back to offering some alternative interaction like reminding them of their high score, or just explain to the customer that they need a device with a screen in order to play. Remember, customers may discover your skill from a wide variety of sources, if they can’t play your game right now, give them a reason to come back later on a different device!

Here's a good way of thinking about the two multimodal frameworks:


  Alexa Presentation Language (APL) Web API
Availability All Alexa devices with screens All Echo Show and Fire TV devices
Languages APL, JSON HTML, CSS, JavaScript, TypeScript, WebGL Shaders, etc 
Tooling Alexa Developer Console APL editor and simulator, VS Code integration Any web development environment, desktop web browser developer tools
Example Games Jeopardy, Mini Games, Star Trek Trivia Volley Solitaire, The Vortex, Song Quiz

Getting started with the Web API

First, check if a device supports HTML. Before sending a presentation directive, you should check whether the device has the capability to support HTML and then fallback to APL or provide a voice-only version of your game, if unsupported. This is the same supportedInterfaces block that can be checked for APL-capable skills today.

In Typescript, checking for HTML support might look like:

export function supportsHTMLInterface(handlerInput: HandlerInput): boolean {
    const { context } = handlerInput.requestEnvelope;
    const htmlInterface: boolean = (context.System &&
        context.System.device &&
        context.System.device.supportedInterfaces &&
        context.System.device.supportedInterfaces['Alexa.Presentation.HTML']) as boolean;
    return htmlInterface;

Once you determine that the requesting device supports the interface, you can send an Alexa.Presentation.HTML.Start directive to spin up a web view on your device and get started (tech docs).


The remainder of this guide is not intended for beginners. For that, refer to the Getting Started with the Web API for Games blog or check out our Hello World boilerplate repo to start developing now.

Going further with the Web API: Reference games

We’ve built 3 reference games that demonstrate the flexibility of what you can do with the Web API, and serve as great starting points in your game development journey. Github repo links for each of the games below will be provided in a future blog update but in the interim, enable the games and get inspired!

Controlling a character with your phone

WebRTC is an open source framework allowing the real-time communication between web applications and mobile devices. In Slime Time (click to try it), you can pair your phone with your AED via a QR code and control the character on-screen using your phone as a virtual d-pad while engaging with the in-game characters with your voice.

Match-three mechanics and gesture support

A very popular mobile game mechanic is called match-three where the customer tries to form lines, chains or groups of three or more of the same elements. Match-three are great casual games as they’re easy to understand, simple to play, and entertaining. Puzzle Hunters (click to try it) is a classic take on this genre and combines team building with pet collection mechanics to make for an addictive CX. This game implements gesture support.

Logic puzzles and idle/casual games

Nonograms are picture logic puzzles in which cells in a grid must be colored or left blank according to the numbers at the side of the grid to reveal a hidden pixel art-like picture. Number Painter (click to try it) is a sample implementation of these types of logic puzzles that also demonstrate how delightful a casual game CX can be when played with your voice on Alexa.

More Inspiration: What other types of games can I build with the Web API?

The developer community has created a plethora or voice-forward HTML5-based games on Alexa to help get you inspired. Trivia and game show titles, RPG space simulators, board games and companion skills, casual and idle games, and brain teasers are just a few of the genres available. These titles below demonstrate high fidelity, 2D and 3D graphics and complex layouts that are highly interactive. Just say, "Alexa, open {skill name}", to play.

Turning a hobby into a business

For additional insight on how some of our developers have scaled their hobbies into successful businesses using the Web API, check out the following developer spotlights:

Web API Tips and Tricks

Now that you have a good idea of what you can build, how do you get started? You can think of Web API game development on an Echo Show or FireTV device as being comparable to developing for the web on mid tier mobile architectures. When rendering, you’ll want minimal texture counts per object, simplified shaders, few material inputs, etc. A lot of Echo devices won’t be able to do physically based lighting at full resolution and frame rate, so you’ll want to revert to older diffuse and specular models. For audio, you’ll be able to mix multiple streams, but won’t be able to use complex effects chains, especially those that rely on CPU heavy operations like script nodes.

Here are a few tips and tricks to get started.

Which framework/s to develop with

The first decision to make is whether you need a full game engine or whether one or more smaller libraries might do based on the complexity of your development team and game. A game engine like Phaser might offer integrated object scripting, physics, level layout, persistence abstraction, material definition, asset optimization, parameterized animation, localization, and much more, as well as custom tools to edit and build all of your data. On the other hand, game engines tend to be large, opinionated, complex pieces of software. If your needs are simple, they may be met and optimized more simply by combining basic rendering and audio libraries like ThreeJS and Howler.

Whether engine or library, the frameworks that work best on Echo Shows and FireTVs are those that are web native, and have a small resource footprint. Here are a few of our favorites:

React or basic HTML+CSS

  • React: A declarative, efficient, and flexible JavaScript library for building user interfaces that lets you compose complex UIs from small and isolated pieces of code called components. Components tell React what to render on-screen and as they can leverage JavaScript expressions, they’re extremely flexible. 

WebGL Renderers

  • ThreeJS: An easy to use, lightweight, cross-browser, general purpose 3D library.
  • PixiJS: An open source, web-based 2D rendering system that provides a scene graph, interaction manager, and support for shader programs. One of the major features that distinguishes PixiJS from other web-based rendering solutions is speed - making it a great choice for Alexa game development.

Game Engines

  • Phaser: A comprehensive 2D game engine that is highly flexible, has an active community, and recently released a web-based IDE to help manage scenes, sprites, animations and more.

Serving assets while you iterate on your game

Once in production, you’ll expect Alexa devices to cache your web assets to prevent unnecessary repeat fetches from your server. During development though, you’ll instead want every HTTP Get to go back to your server, so that you can confidently preview each iteration of your work. Here’s a few techniques for defeating that on device cache:

You can add a meaningless query string to the end of any of your request URIs. This works, no matter what other ways you’ve previously loaded the asset. For instance in your Alexa.Presentation.HTML.Start directive you could include the following:

let url = `${myBaseURL}?timestamp=${Date.now()}`

You can also configure the cache policy for your HTML page in its headers. This will only work if the page is fetched from the server. If the device already has an older page cached, then it will not see the new information. These headers might look like:

<meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
<meta http-equiv="Pragma" content="no-cache">

You could also modify the HTTP responses from your server, including the following headers. Note, this also will not work if the device has already cached the asset.

Cache-Control: no-cache

Finally, the most complex version of cache control, but one that is also beneficial during production is content hashing. You can calculate a simple hash of the contents of each asset, e.g., using the MD5 algorithm, and then append it to the filename. This produces a forever unique version of the filename that refers to just that version. This allows multiple versions of your assets to live side by side, and for your application to select a specific one based on the contents.

If you find yourself needing to completely clear the HTTP cache, please hard reboot your Echo device.

Choose a hosting solution

The Web API will only read your assets over HTTPS, with a valid SSL certificate, but otherwise you are free to host your game anywhere on the web. During development, you may want to use a simple object store with a web front like AWS S3 to host your assets. You could also use a reverse proxy like ngrok to serve directly from your development machine. When publishing to customers, you’ll want to consider the bandwidth and availability necessary to serve many customers around the clock, at low latency. Consider using AWS Amplify, a full-stack hosting solution complete with a content delivery network (CDN).

Read the Getting Started With... guide for other options.

Performance optimization

While it is convenient to develop on the desktop, always test on real Alexa devices to get an accurate understanding of your game’s performance. Do not forget that there is a wide range of devices, with a wide range of performance characteristics. It’s a good idea to test your game on at least an Echo Show and a Fire TV.

While testing your game in the browser on your computer, make use of Chrome’s DevTools to display a frames per second (FPS) meter and other helpful information like memory used. This overlay can be enabled in the More Options > Rendering menu and enabling Frame Rendering Stats:



When measuring performance on device, you will want to implement a Javascript performance monitor like Stats.js. This low footprint (though still performance impacting!) module offers a variety of helpful configurations though the FPS and memory monitor are the two most helpful. Keep in mind that your computer is much more powerful than Echo Shows, so use the FPS as a relativistic metric - identify where there are slow downs in your game as these will be exacerbated on device.

To inspect logs and errors, it’s also useful to echo them into a debug div that you float on top of your game on device. Consider also investing in a FireTV device, which allows you to enable the Android Debug Bridge (ADB), that when paired with a computer, allows you to use Chrome’s DevTools and access console output. Otherwise, you can leverage a service like RemoteJS to send your console logs to the cloud.

To understand your in game memory usage, you can use the alexa.performance.getMemoryInfo() method from the Alexa SDK. It is not recommended to use this method in the live skill as it can impact performance. 

Alexa devices come in a range of resolutions. If possible, detect the screen resolution from the HTML window element when your game starts, and then select from lower and higher variations of your assets to avoid wasting resources on smaller devices.

Consider under-sampling your canvas and textures to achieve higher frame rates for games that would benefit from that, as opposed to sharper visuals. On FireTVs, for example, your content could be displayed on a 4K TV but might look just as good, and render much faster on a 720p canvas. You can achieve this by setting the canvas element’s dimensions explicitly, and then setting its CSS to fill the screen. Experiment with different combinations of texture and canvas resolution to see where your specific game is uniquely bottlenecked.

Alexa skills do well when the customer can jump in and out of them quickly. Some Alexa features, like Reminders, In Skill Purchasing (ISP), and Shared Activities also require Alexa to take over presentation and communicate with the customer, requiring game skills to reinitialize their presentation when control returns to them. As a result, you should optimize your game to draw its first frame as soon as possible, deferring all other code, image, video, and audio asset loads until later. 

Try to schedule your asset loads such that they are loaded as late as possible, in parallel to other activities in the game. For instance, you would want to finish loading and displaying the background for your main menu, before starting on the background for round 1, and avoid loading round 2’s background before you’re ready to present round 1. Find a good balance between (1) lots of small assets that you can swap in and out, optimizing load order as well as memory usage by letting the disk cache hold onto them for you, and (2) reducing the number of assets to optimize the number of HTTP requests and draw calls in your game, by combining smaller assets into larger sheets. Texture Packer is an example of a tool for combining images for this purpose in Phaser, while AudioSprite shows how audio sprites can be implemented in CreateJS. Both techniques are generalizable and customizable to your game’s unique needs.

Consider adding support for portrait mode on the Echo Show 15

The Echo Show 15 is our largest Show to date and it comes with a unique trick: the screen can be positioned in either portrait or landscape orientations. Most frameworks like Phaser and ThreeJS have support to detect orientation and perform some action like locking or asking users to flip their device. Using responsive web design techniques to automatically resize, hide, shrink, or enlarge UI elements based on the viewport dimensions, your skill can support portrait mode with minimal effort.

The larger investment here is around design and other creative CX decisions on whether your game will just support portrait mode or do something unique to take advantage of that new orientation like loading a mini map in portrait instead of skewing the viewport.

For implementation details, refer to this blog

Song Quiz (try it)

Syncing visuals with speech

A compelling animation technique available to games built with the Web API is the ability to synchronize text-to-speech (TTS) audio with screen animations, for example, synchronizing lip movements or the phasing in/out of a robotic voice. 

The general workflow for achieving this is to use the transformers object, which defines how to convert speech into an audio stream and provides a URL to that stream to your web app, to feed into the Alexa JS API’s fetchAndDemuxMP3method. Utilizing mark tags in the returned mp3’s meta data allows you to synchronize presentation and speech together in your chosen framework. Breaking that down, here are the individual steps you can take using vanilla Javascript:

  • Add the <mark name=“”/> tag to embed named queues in an audio source (details)
  • In the web app, demux the streams and you get (1) a JSON list of all the tags and (2) an audio buffer to play back using Web API 
  • Start playing the audio, and record the start time
  • Set up a requestAnimationFrame loop. At each frame, subtract the start time from the current time to get the elapsed time, and then walk through the tag list, emitting an event into the web app for every tag that occurs before the elapsed time that hasn’t processed before.

Wrapping it up

We hope you learn something new from these helpful tips, and we’ll be publishing another blog soon with more content. We look forward to seeing what you build! If you need help come find us on the Alexa Community Slack.

Recommended Reading

Introducing the Skill Quality Coach for developers
Amazon announces new Skill Developer Accelerator Program for Alexa
The acoustics behind Alexa in space