Receive Voice Input to a Gadget Skill

Note: Sign in to the developer console to build or publish your skill.

Note: On December 31, 2021, we paused support for third-party device makers working with Alexa Gadgets, while we work to create an even better developer and customer experience. Please stay tuned to the Amazon developer portal for updates. In the interim, please visit the landing pages for Alexa Voice Service, Alexa Connect Kit, and Alexa Skills Kit to discover ways you can provide new customer experiences with voice.

Unlike typical custom skills, skills for Alexa Gadgets must be able to handle two types of input from the user: the user's voice, and gadget input. For example, a robot gadget might enable the user to physically raise the robot's arm or, alternatively, say, "Raise my robot's arm."

To monitor gadget input, your skill must send a CustomInterfaceController.StartEventHandler directive in response to any request from Alexa. Your skill must also be prepared to handle voice input at any time. This topic describes how your gadget skill should respond to Alexa based on whether the skill specifically expects voice input.

Types of voice input
Listening for voice input
Event handlers and reprompts

Types of voice input

When a skill is in session, there are two ways that users can provide voice input to the skill. If the microphone is open, the user can speak to the skill directly, without prefacing the speech with "Alexa." If the microphone is closed, the user can still provide voice input, but must preface their request with "Alexa."

Important: Regardless of the type of input that the skill expects, it is the skill's responsibility to ensure that the user knows that the skill is still in session. For example, if the skill is waiting for the user to interact with the gadget, the skill must play audio so that the user knows that they are still interacting with the skill.

Listening for voice input

This section contains information about how your skill can prepare to receive voice input and gadget input.

Opening the microphone

If your skill specifically expects voice input, your skill should do the following, for the best user experience:

Include text-to-speech (TTS) that asks the user a question.
Set shouldEndSession to false. This preserves the current session and opens the microphone at the end of the response, so that it is ready for the user to speak another intent. Note that if the user doesn't respond, Alexa will issue a reprompt (if you provided one) but if the user still doesn't respond, the session will close.

Without opening the microphone

It is common for gadget skills to reach a point at which they need to keep the session open, but do not want the microphone to open and attempt to recognize speech. For example, a skill might give the user time to solve a puzzle by pressing lighted buttons on a gadget.

If your skill expects gadget input but does not specifically expect voice input, your skill should do the following, for the best user experience:

Include text-to-speech (TTS) that lets the user know what to do.
Make sure that the directive doesn't include a value for shouldEndSession.
Include a CustomInterfaceController.StartEventHandler directive to start an event handler to monitor gadget input.

If the expiration.durationInMilliseconds specified in your CustomInterfaceController.StartEventHandler directive is longer than 5 seconds, the response must also include an audio file so that the user knows that the skill is still in session. The following example shows how to play a 30-second ticking sound, which is available for you to use:

<speak>
   Ready, set, go!
   <audio src="https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3" /> 
</speak> 

Sound Effect	SSML
Rhythmic ticking (30s)	`<audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3'/>`

Example

The following example is a typical occurrence within a trivia game that uses lighted push-button gadgets. Alexa asks a trivia question, a user presses their button to buzz in, and then the user answers the question. In this case, the following interactions occur:

Alexa asks the trivia question – To set this up, the skill's response includes speech and a CustomInterfaceController.StartEventHandler directive. The response doesn't include a value for shouldEndSession. This setup waits for gadget input without opening the microphone.
A user buzzes in – Alexa sends a CustomInterfaceController.EventsReceived request to the skill to notify it of the gadget input.
Alexa prompts the user for an answer – The skill responds to the CustomInterfaceController.EventsReceived request with a response that sets shouldEndSession to false to open the microphone for the user to say the answer. The skill might also include speech such as "Player 1, what's your answer?"

Event handlers and reprompts

As with any Alexa skill, your gadget skill can specify a reprompt in its response. Alexa speaks the reprompt if the microphone has been open for a few seconds without user input. After a few seconds, Alexa speaks the reprompt and the microphone opens for a few more seconds.

If, after the reprompt, the user still doesn't respond, the microphone turns off and the session closes. To notify the skill about the session closure, Alexa sends the skill a SessionEndedRequest, but the skill isn't given a chance to reopen the session or interact with the user in any other way. If the skill didn't specify a reprompt at all, the skill exits after the initial few seconds without user input.

With gadget skills, there are additional things that you need to consider when working with reprompts:

A gadget event of any type (CustomInterfaceController.EventsReceived or CustomInterfaceController.Expired) cancels any pending reprompt.
You cannot rely solely on the reprompt feature of the text or SSML response to reprompt the user to interact with the gadget. You must send a CustomInterfaceController.StartEventHandler directive to monitor for gadget input. You can send a response to the CustomInterfaceController.Expired request to reprompt the user for input.

Was this page helpful?

Provide feedback

Last updated: Feb 14, 2022