Home > Alexa > Alexa Skills Kit

Handling Requests Sent by Alexa

Introduction

Introduction

Most of the coding tasks for the cloud-based service for a custom skill are related to:

  • Handling the different types of requests sent by the Alexa service.
  • Returning appropriate responses to these requests.

This document describes the types of requests and provides examples of responses.

About the Code Samples in this Topic and the Java Library

You can code your service in any programming language. The Alexa Skills Kit includes an optional Java library, which provides helper classes and methods you can use if you want to build your service in Java. Code samples shown in this topic use the Java library as an example, but using the Java library or coding in Java is not required.

The Java library provides a Speechlet interface with methods you can implement for each type of request your service receives. The class that manages accepting requests from the Alexa service and dispatching them to your Speechlet depends on how you are hosting your service:

  • When deploying your service as an AWS Lambda function on AWS Lambda (a service offering by Amazon Web Services), you configure the function with a handler. You can create this handler by extending the SpeechletRequestStreamHandler class provided in the library. The handler then dispatches requests by calling the appropriate Speechlet methods (onLaunch(), onIntent(), and so on).
  • When deploying your service as a web service on a cloud provider, you extend the SpeechletServlet class. This class is an implementation of a Java EE servlet that handles serializing and deserializing the body of the HTTP request and calls the appropriate Speechlet methods based on the request (onLaunch(), onIntent(), and so on).

For several complete Java examples, see the amzn/alexa-skills-kit-java repository. A Java Speechlet can be hosted either as a Lambda function or as a web service:

Verifying that the Request is Intended for Your Service

Before your web service or Lambda function accepts a request, you should verify that the request is actually intended for your service. This protects your endpoint from someone else discovering your endpoint address, configuring their own skill with that endpoint, and using that configuration to send requests to your service.

To do this validation, every request sent by Alexa includes an application ID. You can check this ID to ensure that the request was intended for your service.

Getting the Application ID for a Skill

The identifier is provided in the developer portal:

  1. Log in to the developer portal and navigate to the Alexa section by clicking Apps & Services and then clicking Alexa in the top navigation.
  2. Find the skill in the list and click Edit.
  3. Click Skill Information.
  4. Note the Application ID displayed on the page.

Verifying that Application ID in the Request Matches Your ID (Java Library)

If you are using the SpeechletServlet class in the Java library, the class handles application ID verification for incoming requests. You need to set the application ID or a comma-separated list of IDs in the system property:

com.amazon.speech.speechlet.servlet.supportedApplicationIds

If the applicationId provided with the request does not match an ID provided in this property, the SpeechletServlet does not call any methods, but instead returns an HTTP error code (400 Bad Request). Leaving this property blank turns off application ID verification. This is acceptable for development and testing, but it is recommended that you provide your application ID to enable this check before publishing your skill to end users.

You can set the supportedApplicationIds system property by passing an argument to the JVM when the web service starts up:

  • If you are running your service within Eclipse using a Launcher class (such as the Launcher.java class provided in the samples), you can add a VM argument to the run configuration:

    -Dcom.amazon.speech.speechlet.servlet.supportedApplicationIds=comma-separated-list-of-ids

  • If you are starting your application using an Ant script, add the following property within the <java> tag of your script:

      <java classname="Launcher" classpathref="java.sdk.classpath" fork="true">
          <sysproperty key="com.amazon.speech.speechlet.servlet.supportedApplicationIds"
                       value="comma-separated-list-of-ids" />
      </java>
    
  • If you are hosting your service within Elastic Beanstalk, you can configure system properties in the Elastic Beanstalk console. See the “Setting Java System Properties for Elastic Beanstalk” section in Deploying a Web Service for a Custom Skill to AWS Elastic Beanstalk.

If you are using the SpeechletRequestStreamHandler class in the Java library to host your Speechlet in AWS Lambda, you need to extend the SpeechletRequestStreamHandler class and pass the supported IDs and your Speechlet to the class’s constructor. For example:

public final class HelloWorldSpeechletRequestStreamHandler extends SpeechletRequestStreamHandler {
    private static final Set<String> supportedApplicationIds = new HashSet<String>();
    static {
        /*
         * This Id can be found on https://developer.amazon.com/edw/home.html#/ 
         * "Edit" the relevant
         * Alexa Skill and put the relevant Application Ids in this Set.
         */
        supportedApplicationIds.add("amzn1.echo-sdk-ams.app.[unique-value-here]");
    }

    public HelloWorldSpeechletRequestStreamHandler() {
        super(new HelloWorldSpeechlet(), supportedApplicationIds);
    }
}

Verifying that Application ID in the Request Matches Your ID (Other Languages)

The applicationId is provided as part of the session object in the JSON body of the request:

{
  "version": "string",
  "session": {
    "new": boolean,
    "sessionId": "string",
    "application": {
      "applicationId": "string"
    },
    ...(remainder of request not shown)    
  }
}

Within your code, retrieve this applicationId property from the JSON body of the request and compare it to the ID assigned to the skill. If the applicationId provided with the request does not match your ID, reject the request by returning an HTTP error code.

Types of Requests Sent by Alexa

Your service must accept and respond to three different types of requests:

  • LaunchRequest
  • IntentRequest
  • SessionEndedRequest

If you choose to use the AudioPlayer interface to stream audio, Alexa also sends AudioPlayer and PlaybackController requests.

Determining the Request Type (Java Library)

When you use the Java library, the SpeechletServlet or SpeechletRequestStreamHandler class determines the type of request sent by Alexa and calls the corresponding Speechlet method:

  • onLaunch()
  • onIntent()
  • onSessionEnded()

The method calls include arguments for an object representing the type of request (LaunchRequest, IntentRequest, or SessionEndedRequest) and an object representing the current session (Session).

When using the AudioPlayer interface, your Speechlet class should also implmement the AudioPlayer and PlaybackController interfaces. These interfaces provide similar methods for routing these requests, such as onPlaybackStarted().

Determining the Request Type (Other Languages)

If you write your web service or Lambda function using a language other than Java, your code needs to inspect the incoming request, determine the request type, then handle each type appropriately. The request type is a property of the request object in the JSON:

{
  "version": "1.0",
  "session": {
    ...(session properties not shown)
  },
  "request": {
    "type": "LaunchRequest",
    "requestId": "request.id.string",
    "timestamp": "string"
  }
}

The type property of the request can be:

  • LaunchRequest
  • IntentRequest
  • SessionEndedRequest
  • A type prefixed with AudioPlayer (for example, AudioPlayer.PlaybackStarted). See AudioPlayer Requests.
  • A type prefixed with PlaybackController (for example, PlaybackController.NextCommandIssued). See PlaybackController.

See the Standard Request Types Reference for details about the JSON structure of these requests.

LaunchRequest

Your service receives a LaunchRequest when the user invokes the skill with the invocation name, but does not provide any command mapping to an intent. For example:

User: Alexa, talk to Daily Horoscopes

For skills that just do one thing (such as telling a joke), the service can take action without requesting more information from the user. Services that need more information from the user may need to respond with a prompt. For guidance on how to design good prompts, see the Voice Design Handbook.

In this Java example, the service writes information to a log and then calls a method to return a welcome response.

@Override
public SpeechletResponse onLaunch(final LaunchRequest request, final Session session)
        throws SpeechletException {

    log.info("onLaunch requestId={}, sessionId={}", request.getRequestId(),
    session.getSessionId());
    return getWelcomeResponse();
}

A LaunchRequest always starts a new session.

IntentRequest

Your service receives an IntentRequest when the user speaks a command that maps to an intent. The request object sent to your service includes the specific intent and any defined slot values.

In this context, an intent represents a high-level action that fulfills a user’s spoken request. Intents can optionally have arguments called slots that collect additional information needed to fulfill the user’s request. Intents are specific to Alexa and do not share the same structure as Android intents.

For details about defining intents and slots, see Defining the Voice Interface and Custom Interaction Model Reference.

Note that an IntentRequest can either start a new session or continue an existing session, depending on how the user begins interacting with the skill:

  1. The user asks Alexa a question or states a command, all in a single phrase. This sends a new IntentRequest and starts a new session:

    User: Alexa, Ask Daily Horoscopes for Gemini.

    In this case, no LaunchRequest is ever sent to your service. The IntentRequest starts session instead.

  2. Once a session is already open, the user states a command that maps to one of the intents defined in your voice interface. This sends the IntentRequest within the existing session:

    User: Alexa, talk to Daily Horoscopes (This command sends the service a LaunchRequest and opens a new session)
    Alexa: You can ask for a horoscope or a lucky number. Which will it be? (response to the LaunchRequest)
    User: Give me the horoscope for Gemini (This command sends the service an IntentRequest with the existing, open session)

In this Java example, the onIntent() method retrieves the intent name and then calls different methods depending on whether the intent is MyColorIsIntent or WhatsMyColorIntent.

@Override
public SpeechletResponse onIntent(final IntentRequest request, final Session session)
        throws SpeechletException {
    log.info("onIntent requestId={}, sessionId={}", request.getRequestId(),
            session.getSessionId());

    // Get intent from the request object.
    Intent intent = request.getIntent();
    String intentName = (intent != null) ? intent.getName() : null;

    // Note: If the session is started with an intent, no welcome message will be rendered;
    // rather, the intent specific response will be returned.
    if ("MyColorIsIntent".equals(intentName)) {
        return setColorInSession(intent, session);
    } else if ("WhatsMyColorIntent".equals(intentName)) {
        return getColorFromSession(intent, session);
    } else {
        throw new SpeechletException("Invalid Intent");
    }
}

Note that the Alexa Skills Kit provides a collection of built-in intents for very common actions, such as stopping, canceling, and providing help. If you choose to use these intents, your code for handling IntentRequest requests needs to handle them as well. For details about the built-in intents, see Implementing the Built-in Intents.

SessionEndedRequest

Your service receives a SessionEndedRequest when a currently open session is closed for one of the following reasons:

  1. The user says “exit”.
  2. The user does not respond or says something that does not match an intent defined in your voice interface while the device is listening for the user’s response.
  3. An error occurs.

Note that setting the shouldEndSession flag to true in your response also ends the session. In this case, your service does not receive a SessionEndedRequest.

Your service cannot send back a response to a SessionEndedRequest.

AudioPlayer and PlaybackController Requests

Your service receives the AudioPlayer and PlaybackController requests only if you are using the AudioPlayer interface to stream audio. Alexa sends AudioPlayer requests to notify your service about the current status of the playback. The PlaybackController request are sent when the user interacts with the device using hardware buttons, such as a remote control.

See AudioPlayer Requests and PlaybackController Requests for details about the JSON format of these requests.

Returning a Response

Your service returns responses for LaunchRequest and IntentRequest requests. Each response must adhere to the response format documented in JSON Interface Reference for Custom Skills.

When you use the Java library, the SpeechletResponse class represents a valid request response. Create an object of this class and set its properties to create your response.

Your response can include:

  • Text to be converted to speech and spoken back to the user. You can provide plain text or text marked up with SSML.
  • A card to be displayed in the Amazon Alexa App. A card can include a plain text title and body content, and optionally a single image. See Including a Card in Your Skill’s Response for details about including a card in your response.
  • Text to be converted to speech and spoken to the user if a re-prompt is needed (Reprompt class in the Java library, reprompt object in the JSON). This is used if your service keeps the session open after sending the response, but the user does not say anything that maps to an intent defined in your voice interface while the audio stream is open.
  • A set of directives specifying device-level actions to take using a particular interface, such as the AudioPlayer interface for streaming audio. For details about the directives you can include in your response, see the AudioPlayer Interface Reference.
    • In the Java library, use the SpeechletResponse.setDirectives() method to add a directive to your response. The library includes classes for each possible directive (such as AudioPlayer.PlayDirective).
    • In the JSON, add an array containing the JSON for each directive to the directives property.
  • A flag indicating whether the session closes or remain opens after Alexa reads back the text in your response (SpeechletResponse.setShouldEndSession() in the Java library, shouldEndSession property in the JSON).

In addition, a response can include attributes to save with the current session. This is useful for persisting data within a session.

  • In the Java library, use the Session.setAttribute() and Session.getAttribute() methods to set and retrieve session attributes. A Session object is passed into the onLaunch(), onIntent(), and onSessionEnded() methods.
  • If you are not using the Java library, set session attributes with the sessionAttributes property in the JSON.

The Java library includes some convenience methods on the SpeechletResponse class for constructing response:

  • newTellResponse - constructs a response to tell the user something. This response closes the session after playing the response.
  • newAskResponse - constructs a response to ask the user a question. This response keeps the session open for the user to respond.

See the Custom Skill Java Library Reference for details about these methods.

The following Java example extracts a color value from a slot, saves it in the session attributes, and then builds a response that includes both text-to-speech and an Amazon Alexa App card:

private SpeechletResponse setColorInSession(final Intent intent, final Session session) {
    // Get the slots from the intent.
    Map<String, Slot> slots = intent.getSlots();

    // Get the color slot from the list of slots.
    Slot favoriteColorSlot = slots.get(COLOR_SLOT);
    String speechText, repromptText;

    // Check for favorite color and create output to user.
    if (favoriteColorSlot != null) {
        // Store the user's favorite color in the Session and create response.
        String favoriteColor = favoriteColorSlot.getValue();
        session.setAttribute(COLOR_KEY, favoriteColor);
        speechText =
                String.format("I now know that your favorite color is %s. You can ask me your "
                        + "favorite color by saying, what's my favorite color?", favoriteColor);
        repromptText =
                "You can ask me your favorite color by saying, what's my favorite color?";

    } else {
        // Render an error since we don't know what the users favorite color is.
        speechText = "I'm not sure what your favorite color is, please try again";
        repromptText =
                "I'm not sure what your favorite color is. You can tell me your favorite "
                        + "color by saying, my favorite color is red";
    }

    return getSpeechletResponse(speechText, repromptText, true);
}

...

private SpeechletResponse getSpeechletResponse(String speechText, String repromptText,
        boolean isAskResponse) {
    // Create the Simple card content.
    SimpleCard card = new SimpleCard();
    card.setTitle("Session");
    card.setContent(speechText);

    // Create the plain text output.
    PlainTextOutputSpeech speech = new PlainTextOutputSpeech();
    speech.setText(speechText);

    if (isAskResponse) {
        // Create reprompt
        PlainTextOutputSpeech repromptSpeech = new PlainTextOutputSpeech();
        repromptSpeech.setText(repromptText);
        Reprompt reprompt = new Reprompt();
        reprompt.setOutputSpeech(repromptSpeech);

        return SpeechletResponse.newAskResponse(speech, reprompt, card);

    } else {
        return SpeechletResponse.newTellResponse(speech, card);
    }
}

Including Short Pre-Recorded Audio in your Response

You can embed short recorded audio within your service’s response by including the URL of an MP3 file. The Alexa service then plays the MP3 while rendering the response. This is useful in several scenarios:

  • Providing a response using a voice associated with your brand, rather than the standard Alexa voice. For example:
    • A skill for news headlines might use audio of a recognizable reporter reading back a news headline.
    • A skill that ties in with a video game might use the voice of a game character to deliver pre-recorded responses.
  • Providing sound effects alongside normal text-to-speech responses, especially in games and other skills designed for entertainment.
  • Incorporating recognizable jingles and other sounds to help users associate the skill with your brand.
  • Providing responses that are not well-supported by the current text-to-speech processing, such as reading back responses in non-English languages in a translator skill.

To include pre-recorded audio, you provide your response in SSML and use an <audio> tag to specify the URL of the MP3 file. For example, note the following SSML:

<speak>
    Welcome to Car-Fu. 
    <audio src="https://carfu.com/audio/carfu-welcome.mp3" /> 
    You can order a ride, or request a fare estimate. Which will it be?
</speak> 

When Alexa renders this response, it would sound like this:

Alexa: Welcome to Car-Fu.
(the specified carfu-welcome.mp3 audio file plays)
Alexa: You can order a ride, or request a fare estimate. Which will it be?”

The audio files you include within your service’s response must be very short to ensure a good user experience. You can include up to five audio files in a single response. The combined total time for all audio files in a single response cannot be more than ninety (90) seconds. For specific requirements and limitations on the audio files, see Speech Synthesis Markup Language (SSML) Reference.

For more about using SSML in your responses and other supported tags, see Speech Synthesis Markup Language (SSML) Reference.

Handling Possible Input Errors

Unlike a visual interface, a voice interface cannot prevent users from entering invalid data. In addition, misunderstandings when interpreting spoken language may introduce errors in slot values. Your code needs to check for these errors and handle them appropriately.

Invalid Custom Slot Type Values

For slots that are defined as custom slot types, it is possible to get values that are not part of the list of values defined for the type.

For example, a custom LIST_OF_SIGNS slot type might define the twelve Zodiac signs. If the user says “what is the horoscope for blue”, Alexa may send your service a request with the word “blue” in the Sign slot. In this case, your code should test the Sign value and give the user some indication that they did not provide a valid zodiac sign. See Voice Design Best Practices for recommendations around handling dialog errors.

Invalid Numeric, Date, Time and Duration Slots

The slot types AMAZON.NUMBER, AMAZON.DATE, AMAZON.TIME, and AMAZON.DURATION are all intended to convert the speech into different data types. When using these slots, incorrect user input or spoken language understanding errors can prevent this conversion from taking place, so the values in these slots may be invalid.

For example, an intent for simple arithmetic might use AMAZON.NUMBER slots like this:

SampleNumberAddIntent   add {xValue} and {yValue}

Alexa attempts to convert the data provided in xValue and yValue into digits. However, there is no way to force the user to only include words representing numeric values for the xValue and yValue slots when speaking this phrase. A user might say nonsense, such as “add red and blue”, or Alexa might misinterpret the user’s words and be unable convert them into digits.

Because of this, your code should always validate the slot data before using it. For example:

// xValue and yValue are slots already retrieved from
// the intent. Make sure they are not null first.
if (xValue != null && yValue != null) {
    // Get the values from the slots. Slot values are always provided as
    // strings
    String xString = xValue.getValue();
    String yString = yValue.getValue();

    // Verify the strings contain numeric values and then parse the
    // strings to actual numbers
    Float xFloat = (StringUtils.isNumeric(xString)) ? Float
            .parseFloat(xString) : null;
    Float yFloat = (StringUtils.isNumeric(yString)) ? Float
            .parseFloat(yString) : null;

    if (xFloat != null && yFloat != null) {

        // Both slots contained valid numbers, so continue
        // with normal processing...
        ...

    } else {
        // One or both of the slots did not contain a numeric value
        speechOutput = String
                .format("Sorry, I did not hear the numbers you wanted to add.");
    }
}

Intents with a Variable Number of Slots

It is possible to design an intent that has one or more slots, but then specify sample utterances that don’t use all of those slots. For example, a GetHoroscope intent might be defined with both Sign and Date slots and the following sample utterances:

GetHoroscope what is the horoscope for {Sign}
GetHoroscope what is the horoscope for {Sign} for the month of {Date}
GetHoroscope get me my horoscope

In this case, the GetHoroscope intent sent to your service might include no slot values, one slot value for Sign, or two slot values for both Date and Sign.

Therefore, before attempting to use a slot value, verify that it is not null.

Next Steps

See all coding topics:

Other topics:

Reference materials: