Receive Voice Input to a Gadget Skill

Unlike typical custom skills, skills for Alexa Gadgets must be able to handle two types of input from the user: the user's voice, and gadget input. For example, a robot gadget might enable the user to physically raise the robot's arm or, alternatively, say, "Raise my robot's arm."

To monitor gadget input, your skill must send a CustomInterfaceController.StartEventHandler directive in response to any request from Alexa. Your skill must also be prepared to handle voice input at any time. This topic describes how your gadget skill should respond to Alexa based on whether the skill specifically expects voice input.

Types of voice input

When a skill is in session, there are two ways that users can provide voice input to the skill. If the microphone is open, the user can speak to the skill directly, without prefacing the speech with "Alexa." If the microphone is closed, the user can still provide voice input, but must preface their request with "Alexa."

Listening for voice input

This section contains information about how your skill can prepare to receive voice input and gadget input.

Opening the microphone

If your skill specifically expects voice input, your skill should do the following, for the best user experience:

  • Include text-to-speech (TTS) that asks the user a question.
  • Set shouldEndSession to false. This preserves the current session and opens the microphone at the end of the response, so that it is ready for the user to speak another intent. Note that if the user doesn't respond, Alexa will issue a reprompt (if you provided one) but if the user still doesn't respond, the session will close.

Without opening the microphone

It is common for gadget skills to reach a point at which they need to keep the session open, but do not want the microphone to open and attempt to recognize speech. For example, a skill might give the user time to solve a puzzle by pressing lighted buttons on a gadget.

If your skill expects gadget input but does not specifically expect voice input, your skill should do the following, for the best user experience:

  • Include text-to-speech (TTS) that lets the user know what to do.
  • Make sure that the directive doesn't include a value for shouldEndSession.
  • Include a CustomInterfaceController.StartEventHandler directive to start an event handler to monitor gadget input.
  • If the expiration.durationInMilliseconds specified in your CustomInterfaceController.StartEventHandler directive is longer than 5 seconds, the response must also include an audio file so that the user knows that the skill is still in session. The following example shows how to play a 30-second ticking sound, which is available for you to use:

    <speak>
       Ready, set, go!
       <audio src="https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3" /> 
    </speak> 
    
    Sound Effect SSML

    Rhythmic ticking (30s)

    <audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3'/>

Example

The following example is a typical occurrence within a trivia game that uses lighted push-button gadgets. Alexa asks a trivia question, a user presses their button to buzz in, and then the user answers the question. In this case, the following interactions occur:

  1. Alexa asks the trivia question – To set this up, the skill's response includes speech and a CustomInterfaceController.StartEventHandler directive. The response doesn't include a value for shouldEndSession. This setup waits for gadget input without opening the microphone.

  2. A user buzzes in – Alexa sends a CustomInterfaceController.EventsReceived request to the skill to notify it of the gadget input.

  3. Alexa prompts the user for an answer – The skill responds to the CustomInterfaceController.EventsReceived request with a response that sets shouldEndSession to false to open the microphone for the user to say the answer. The skill might also include speech such as "Player 1, what's your answer?"

Event handlers and reprompts

As with any Alexa skill, your gadget skill can specify a reprompt in its response. Alexa speaks the reprompt if the microphone has been open for a few seconds without user input. After a few seconds, Alexa speaks the reprompt and the microphone opens for a few more seconds.

If, after the reprompt, the user still doesn't respond, the microphone turns off and the session closes. To notify the skill about the session closure, Alexa sends the skill a SessionEndedRequest, but the skill isn't given a chance to reopen the session or interact with the user in any other way. If the skill didn't specify a reprompt at all, the skill exits after the initial few seconds without user input.

With gadget skills, there are additional things that you need to consider when working with reprompts: