Receive Voice Input to an Echo Button Skill

Unlike typical custom skills, Echo Button skills must be able to handle two types of user input: button presses and the user's voice. To monitor button presses, you start an Input Handler. Your skill must also be prepared to handle voice input at any time. This topic describes how your Echo Button skill should respond to Alexa based on whether the skill specifically expects voice input.

Types of Voice Input

When a skill is in session, there are two ways that users can provide voice input to the skill. If the microphone is open, the user can speak to the skill directly, without prefacing the speech with "Alexa". If the microphone is closed, the user can still provide voice input, but must preface their request with "Alexa".

Listening for Voice Input

This section contains information about how your skill can prepare to receive voice input and button input.

Opening the Microphone

If your skill specifically expects voice input, your skill should do the following, for the best user experience:

  • Include text-to-speech (TTS) that asks the user a question.
  • Set shouldEndSession to false. This preserves the current session and opens the microphone at the end of the response, so that it is ready for the user to speak another intent. Note that if the user doesn't respond, Alexa will issue a reprompt (if you provided one) but if the user still doesn't respond, the session will close.

Without Opening the Microphone

It is common for Echo Button skills to reach a point at which they need to keep the session open, but do not want the microphone to open and attempt to recognize speech. For example, a skill might give the user time to solve a puzzle by pressing Echo Buttons.

If your skill expects button presses but does not specifically expect voice input, your skill should do the following, for the best user experience:

  • Include text-to-speech (TTS) that lets the user know what to do.
  • Set shouldEndSession to null, or don't define it at all. This keeps the session open without opening the microphone.
  • Include a GameEngine.StartInputHandler directive to monitor button presses.
  • If the Input Handler timeout specified in your GameEngine.StartInputHandler directive is longer than 5 seconds, the response must also include an audio file so that the user knows that the skill is still in session. The following example shows how to play a 30-second ticking sound, which is available for you to use:

    <speak>
       Ready, set, go!
       <audio src="https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3" /> 
    </speak> 
    
    Sound Effect SSML

    Rhythmic ticking (30s)

    <audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3'/>

Example

This example is a typical occurrence within a "first to buzz in" trivia game that uses Echo Buttons. Alexa asks a trivia question, a user presses their button to buzz in, and then the user answers the question. In this case, the following interactions occur:

  1. Alexa asks the trivia question – To set this up, the skill's response includes speech, starts an Input Handler, and sets shouldEndSession to null. This setup waits for a button press without opening the microphone.

  2. A user buzzes in – The Game Engine sends an Input Handler event to the skill to notify it of the button press.

  3. Alexa prompts the user for an answer – The skill responds to the Input Handler event with shouldEndSession set to false to open the microphone for the user to say the answer. The skill might also include speech such as "Player 1, what is your answer?"

Input Handlers and Reprompts

As with any Alexa skill, your Echo Button skill can specify a reprompt in its response. Alexa speaks the reprompt if the microphone has been open for eight seconds without user input. After eight seconds, Alexa speaks the reprompt and the microphone opens for eight more seconds.

If, after the reprompt, the user still doesn't respond, then the microphone turns off and the session closes. The session is notified of this through the SessionEndedRequest, but it is not given a chance to reopen the session or interact with the user in any other way. Note that if the skill did not specify a reprompt at all, then the skill exits after the initial eight seconds without user input.

With Echo Button skills, there are additional things that you need to consider when working with reprompts:

  • A button event will cancel any pending reprompt. For example, a reprompt set by the response to a buttonDown event can be cancelled by the subsequent buttonUp event. The reprompt can also be cancelled by an Input Handler timeout event.
  • You cannot rely solely on the reprompt feature of the text or SSML response to reprompt the user to press a button. You must start an Input Handler to monitor for button presses. The handler for the Input Handler timeout event can reprompt the user for input.