Add Voice Control and Speech to the Web App

You can use the Alexa Web API for Games to add Alexa speech and voice commands to your web-based game. This document describes some common scenarios for adding these elements to your game.

Make Alexa speak to the user

Your web app can make Alexa talk to the user while they interact with web app. This might be just to respond to the user's touch interactions or share information about what's happening in the game. Alexa can also prompt the user for a spoken response, as described later.

For example:

User touches the "fire" button on the screen.
Alexa: Firing the torpedoes…(sound effects)…sorry, looks like you missed, you'll have to wait till your next turn to try again! (As Alexa speaks, the display on the web app changes.)
Web app presents new graphics and waits for the user's touch input.

To make Alexa speak to the user

  1. In the web app, call alexa.skill.sendMessage() to send the skill a message.
  2. In your skill code, create a handler for the Alexa.Presentation.HTML.Message request generated by the sendMessage() call. This handler returns a response with:
    • The outputSpeech Alexa should say.
    • The shouldEndSession property left undefined (not set).

    This tells Alexa to speak the text, then leave the session without opening the microphone.

  3. In the web app, register listener functions to respond to Alexa events. Alexa notifies your app when speech starts and stops:

Prompt the user for voice input

Your web app can make Alexa prompt the user for voice input during the game, such as in response to a button press in the game. For example:

User touches the "fire" button on the screen.
Alexa: Firing the torpedoes…(sound effects)…sorry, looks like you missed, do you want to try that again?
Alexa opens the microphone to listen to the user's response.
User: Yes (The skill gets a normal intent from the interaction model, such as AMAZON.YesIntent.)
…game continues…

When deciding how to prompt the user for voice input, it's important to consider what methods they have for initiating speech on their own. If their device has a button available that gives the user push-to-talk functionality, it might be more natural for them to initiate conversations by pressing or pressing-and-holding the button. Alternatively, if the device supports wake word activation, the user might be able to use the wake word to play your game hands-free. You can check what methods the device supports by inspecting the alexa.capabilities.microphone object.

To prompt the user for voice input

  1. In the web app, call alexa.skill.sendMessage() to send the skill a message.
  2. In your skill code, create a handler for the Alexa.Presentation.HTML.Message request. This handler returns a response with:
    • The outputSpeech Alexa should say.
    • A reprompt to use if the user doesn't respond.
    • The shouldEndSession property set to false.

    This tells Alexa to speak the text, then open the microphone for the user's response.

  3. In the web app, register listener functions to respond to Alexa events. Alexa notifies your app when speech starts/stops and when the microphone opens/closes:

These steps trigger a normal Alexa skill interaction. Alexa speaks the outputSpeech, then opens the microphone for a few seconds to listen for the user's response. If the user's response is not understood, Alexa speaks the reprompt and then opens the microphone again. If the user still does not respond, Alexa closes the microphone, but keeps the session open because the web app is still displayed on the screen.

After the user responds to the prompt with an utterance that resolves to an intent in your model, your skill gets an IntentRequest. An intent handler in your skill should handle this request. For example, your intent handler might return a response that contains:

  • An Alexa.Presentation.HTML.HandleMessage to tell the web app relevant information from the user's spoken response.
  • (optional) outputSpeech if you want Alexa to say something to the user.
  • The shouldEndSession property set to either undefined (when you don't need to open the microphone for another response) or false (when you do want to open the microphone for additional spoken input).

Finally, in your web app, call alexa.skill.onMessage() to register a callback to respond to the incoming message.

Prompt the user for voice input via the JavaScript API

On devices that support wake word activation, you have the ability to programmatically request the microphone to open by using the alexa.voice.requestMicrophoneOpen API.

To prompt the user for voice input

  1. In the web app, check that alexa.capabilities.microphone.supportsWakeWord is true.

  2. If the device supports wake word activation, call alexa.voice.requestMicrophoneOpen.

  3. You can register callbacks for onOpened, onClosed, and onError to update your game state.

  4. After the user stops talking, their speech resolves to an intent in your interaction model, and is delivered to your skill.

Note these two common error responses:

  1. microphone-already-open: Indicates that the user already opened the microphone through some other means (such as by saying the wake word).

  2. request-open-unsupported: Indicates that the device doesn't support this API, and that you should avoid it for the rest of the time the app is running. This probably means that you called the API on a device that doesn't support wake word activation.

Get user-initiated voice input

While your web app is on the screen, the user can use the wake word to speak to Alexa at any time. Your skill should expect user-initiated voice input while the web app is active.

User touches the screen to select several targets. Web app responds with normal sound effects and graphics.
User: Alexa, fire at the targets! (Since the skill session is open, the user can invoke an intent in your skill with just the wake word and an utterance.)
Skill receives an IntentRequest corresponding to the "fire at the targets" utterance.
Alexa: Roger, firing the torpedoes now!
Sound effects and graphics.

To get user-initiated voice input

  1. In your skill's interaction model, add intents with sample utterances that users might speak when playing your game.
  2. In your intent handlers for these intents, return:
    • An Alexa.Presentation.HTML.HandleMessage to tell the web app relevant information from the user's spoken request.
    • (optional) outputSpeech if you want Alexa to say something to the user.
    • The shouldEndSession property set to either undefined (when you don't need to open the microphone for another response) or false (when you do want to open the microphone for additional spoken input).
  3. In your web app, call alexa.skill.onMessage() to register a callback to respond to the incoming message.

Use transformers to render voice natively in HTML

The Alexa.Presentation.HTML.Start and the Alexa.Presentation.HTML.HandleMessage take an optional transformers array. A transformer converts either SSML or plain text into an audio stream and provides a URL to that stream to your web app. You can use the fetchAndDemuxMP3 function to fetch and demux this audio stream to extract the audio buffer and speech marks.