Your Alexa Dashboards Settings

Voice Interface and User Experience Testing for a Custom Skill

Voice interface and user experience testing focuses on:

  • Testing the user experience to ensure that the skill is aligned with several key features of Alexa that help create a great experience for customers.
  • Reviewing the intent schema, the set of sample utterances, and the list of values for any custom slot types you have defined to ensure that they are correct, complete, and adhere to voice design best practices.

    These components are defined on the Interaction Model page for your skill in the developer portal.

These tests address the following goals:

  • Increase the different ways end users can phrase requests to your skill.
  • Evaluate the ease of speech recognition when using your skill (was Alexa able to recognize the right words?)
  • Improve language understanding (when Alexa recognizes the right words, did she understand what to do?).
  • Ensure that users can speak to Alexa naturally and spontaneously.
  • Ensure that Alexa understands most requests you make, within the context of a skill’s functionality.
  • Ensure that Alexa responds to users’ requests in an appropriate way, by either fulfilling them or explaining why she can’t.

Many of these tests verify that your skill adheres to the design guidelines described in Alexa Voice Design Guide. You may want to review those guidelines while working through this section. For recommendations for sample utterances, see Best Practices for Sample Utterances and Custom Slot Type Values.

Note that many of these tests require that you have a device for voice testing. If you do not have a device with Alexa, you can use the Test Simulator (beta) to test your Alexa skill.

This document is oriented towards skills that do not include a screen or touch component.

To return to the high-level testing checklist, see Certification Requirements for Custom Skills.

4.1. Session Management

Every response sent from your skill to the Alexa service includes a flag indicating whether the conversation with the user (the session) should end or continue. If the flag is set to continue, Alexa then listens and waits for the user’s response. For Amazon devices such as Amazon Echo that have a blue light ring, the device lights up to give the user a visual cue that Alexa is listening for the user’s response. On Echo Show, the bottom of the screen flashes blue. On Echo Spot, a blue light ring flashes around the circular screen.

This test verifies that the text-to-speech provided by your skill and the session flag work together for a good user experience. Responses that ask questions leave the session open for a reply, while responses that fulfill the user’s request close the session.

Test Expected Results

1.

Invoke the skill without specifying an intent, for example:

  • Open <Invocation Name>.

Respond to the prompt provided by the skill and verify that you get a correct response.

After every response that asks the user a question, the session remains open and the device waits for your response.

After every response that completes the user’s request, the interaction ends.

2.

Test a variety of intents – both those that ask questions and those complete the user’s request.

After every response that asks the user a question, the session remains open and the device waits for your response.

After every response that completes the user’s request, the interaction ends.

4.2. Intent and Slot Combinations

A skill may have several intents and slots. This test verifies that each intent returns the expected response with different combinations of slots.

Test Expected Results

1.

Test the skill’s intent responses using different combinations of slot values.

You can use one of the one-shot phrases for starting the skill, for example:

  • Ask <Invocation Name> to <do something>

Be sure to invoke every intent, not just those that are typically used in a one-shot manner.

Evaluate the response for each intent

The response is appropriate for the context of the request.

For example, if the request includes a slot value, the response is relevant to that information. If a request to that same intent does not include the slot, the response uses a default or asks the user for clarification

You may want to use a table of intent and slot values to track this test and ensure that you test every intent and slot combination. For example:

Intent Slot Combination Sample Utterance to Test
IntentName SlotOne This is an utterance to test this intent and slot one
IntentName SlotTwo This is an utterance to test this intent and slot two
IntentName SlotOne
SlotTwo
This is an utterance to test this intent with both slot one and slot two
Each additional valid intent and slot combination -  

4.3. Intent Response (Design)

A good user experience for a skill depends on the skill having well-designed text-to-speech responses. Alexa Voice Design Guide: What Alexa Says provides recommendations for designing your skill’s responses. This test verifies that your skill’s responses meet these recommendations.

You can use the same set of intent and slot combinations used for the Intent and Slot Combinations test.

Test Expected Results

1.

Test the skill’s intent responses using different combinations of slot values.

You can use one of the one-shot phrases for starting the skill, for example:

  • Ask <Invocation Name> to <do something>

Be sure to invoke every intent, not just those that are typically used in a one-shot manner.

Try a variety of sample utterances for each intent.

If the skill vocalizes any examples for users to try, use those examples exactly as instructed by the skill.

Evaluate the response for each intent

The response meets each of the following requirements:

  • Answers the user’s request in a concise, terse manner.
  • Provides information in consumable chunks.
  • Does not include technical or legal jargon.
  • Responses from intents that are not typically used in a one-shot manner provide a relevant response or inform users how to begin using the skill.
  • The response is spoken in the same language used by the Alexa account. For instance, when testing with an account configured with German, the text-to-speech responses are in German. When testing with an account configured with English (US), the text-to-speech responses are in English.

For a better user experience, the response should also meet these recommendations:

  • Easy to understand
  • Written for the ear, not the eye

You can use the same set of intent and slot combinations used for the Intent Response (Intent and Slot Combinations) test.

4.4. Supportive Prompting

A user can begin an interaction with your skill without providing enough information to know what they want to do. This might be either a no intent request (the user invokes the skill but does not specify any intent at all) or a partial intent request (the user specifies the intent but does not provide the slot values necessary to fulfill the request).

In these cases, the skill must provide supportive prompts asking the user what they want to do. This test verifies that your skill provides useful prompts for these scenarios.

Test Expected Results

1.

Invoke the skill with no intent. You can do this by using a phrase that sends a LaunchRequest rather than an IntentRequest. For example:

  • Open <Invocation Name>.

Verify that you get a prompt, then respond to the prompt and verify that you get a correct response.

  • The skill prompts you for information about what you want to do.
  • The prompt includes the skill’s name so you know you are in the right place.
  • The prompt gives you specific options about what to do, but is brief. If the skill has many functions, the prompt gives the most common.
  • The prompt does not give verbose instructions telling the user what to say (such as “to do xyz, say xyz”). The prompt is concise.
  • When you respond to the prompt, the skill continues prompting until all needed information is collected, then provides a contextualized, non-error response.
  • If no information is needed from users after launch, the skill completes a core function and closes the session.

2.

Invoke the skill with a partial intent. You can do this by using a phrase that invokes the intent without including all the required slot data. For example:

  • Ask <Invocation Name> to <do something> (leave out slot data in the command)

Verify that you get a prompt, then respond to the prompt and verify that you get a correct response.

If the skill does not define any slots, you can skip this test, as it is not possible to send a partial intent.

  • The skill prompts you for the missing information.
  • The prompt gives you specific options about what to do, but is brief. If the skill has many functions, the prompt gives the most common.
  • The prompt does not give verbose instructions telling the user what to say (such as “to do xyz, say xyz”). The prompt is concise.
  • When you respond to the prompt, the skill continues prompting until all needed information is collected, then provides a contextualized, non-error response.

See What Alexa Says for recommendations for designing prompts.

4.5. Invocation Name

Users say the invocation name for a skill to begin an interaction. Inspect the skill’s invocation name and verify that it meets the invocation name requirements described in Choosing the Invocation Name for a Custom Skill.

4.6. One-Shot Phrasing for Sample Utterances

Most skills provide quick, simple, “one-shot” interactions in which the user asks a question or gives a command, the skill responds with an answer or confirmation, and the interaction is complete. In these interactions, the user invokes your skill and states their intent all in a single phrase.

The ask and tell phrases are the most natural phrases for starting these types of interactions. Therefore, it is critical that you write sample utterances that work well with these phrases and are easy and natural to say.

In these tests, you review the sample utterances you’ve written for the skill, then test them by voice to ensure that they work as expected.

Test Expected Results

1.

Inspect the skill’s sample utterances to ensure that they contain the right phrasing to match the different phrases for invoking a skill with a specific intent.

Noun phrases: phrases that can follow
ask <invocation name> for …” or
tell <invocation name> about…”

  • “ask <invocation name> for my favorite color
  • “tell <invocation name> about my appointment today at 3 pm

Questions, in both interrogative and inverted forms: phrases that can follow “ask <invocation name> …”

  • “ask <invocation name> where is my car
  • “ask <invocation name> where my car is

Commands: phrases that can follow “tell <invocation name> to…” or “ask <invocation name> to…”

  • “ask <invocation name> to get me a car
  • “tell <invocation name> to find my favorite book

(In the examples above, the italic phrase is the sample utterance).

  • Noun, question, and command utterances are all included.
  • At least five varieties of these three types of phrases are present (five noun forms, five question forms, and five command forms)
  • When combined with the ask and tell phrases, the sample utterances are intuitive and natural.

2.

Launch the skill using each of the following common “ask” patterns (ideally do multiple variations for each pattern):

  • Ask <Invocation Name> for <something>
  • Ask <Invocation Name> about <something>
  • Ask <Invocation Name> to <do something>
  • Each of these common “ask” patterns works.
  • The skill successfully launches and completes the request.
  • The phrase is easy and natural to say.

3.

Launch the skill with the generic “ask” pattern (recommended test if this is a natural phrase for your skill):

  • Ask <Invocation Name> <question>

Test with questions starting with different question words (who, what, how, and so on).

The specific question words that sound natural with your skill may vary. For example, these types of questions do not flow well with “Space Geek.” A user is unlikely to say something like “Ask Space Geek what is a space fact?”

  • The generic “ask” pattern works if appropriate for your skill.
  • The skill successfully launches and completes the request.
  • The phrase is easy and natural to say.

4.

Launch the skill using the following common “tell” pattern:

  • Tell <Invocation Name> to <do something>
  • The common “tell” pattern works.
  • The skill successfully launches and completes the request.
  • The phrase is easy and natural to say.

5.

Review the “Invoking a Skill with a Specific Request (Intent)” section in Understanding How Users Invoke Custom Skills and test as many of the phrases as apply to your skill.

Note that not all of the phrases apply to all skills. For example, the “Ask…whether…” phrasing would probably not make sense for a skill asking about weather or tide information, so the skill would still pass this test even without this phrase.

  • The skill successfully launches and completes the request.
  • The phrase is easy and natural to say.

4.7. Variety of Sample Utterances

Given the flexibility and variation of spoken language in the real world, there will often be many different ways to express the same request. Therefore, your sample utterances must include multiple ways to phrase the same intent.

In this test, inspect the sample utterances for all intents, not just the “one shot” intents described in One-Shot Phrasing for Sample Utterances.

Test Expected Results

1.

Inspect the skill’s intent schema and sample utterances:

  1. For each intent, identify several ways a user could phrase a request for that intent.
  2. Verify whether the sample utterances mapped to that intent cover those phrasings.
  3. Examine any slots that appear in the sample utterances.

The five most common synonyms for phrase patterns are present. For example, if the skill contains “get me <some value>”, then the utterances include synonyms such as “give me <some value>”, “tell me <some value>”, and so on.

Each sample utterance must be unique. There cannot be any duplicate sample utterances mapped to different intents.

Each slot is used only once within a sample utterance.

4.8. Intents and Slot Types

Slots are defined with different types. Built-in types such as AMAZON.DATE convert the user’s spoken text into a different format (such as converting the spoken text “march fifth” into the date format “2017-03-05”). Custom slot types are used for items that are not covered by Amazon Alexa’s built-in types.

For this test, review the intent schema and ensure that the correct slot types are used for the type of data the slot is intended to collect.

Note that this test assumes you have migrated to the updated slot types as described in Migrating to the Improved Built-in and Custom Slot Types. If you are still using the previous version (for instance, DATE instead of AMAZON.DATE), then you need to also perform the Sample Utterances (Slot Type Values) test.

Test Expected Results

1.

Inspect the skill’s intent schema to identify all slot types.

Verify that the types match the type of data to be collected.

  • The slots for each intent match the recommended slot types listed in the Slot Types table below.
  • Slots that collect a value from a list use a custom slot type.

Slot Types:

Slot Type Use for slots that collect...

AMAZON.NUMBER

Integer numbers.

AMAZON.DATE

Relative and absolute dates (“this weekend” and “august twenty sixth twenty fifteen”).

AMAZON.TIME

The time of day (“three thirty p. m.”).

AMAZON.DURATION

A period of time (“five minutes”).

Custom Slot Types

A value from a list (horoscope signs, all NFL football teams, supported cities, recipe ingredients, and so on).

See Custom Slot Types (Values) for additional testing for your custom slot types.

AMAZON.LITERAL

Not recommended, consider replacing AMAZON.LITERAL with a custom slot type if possible.

If your schema does include AMAZON.LITERAL, also review the sample utterances and make sure that appropriate sample slot values are provided for each instance of the slot:

  • All values with a high likelihood of input
  • Appropriate distribution of word counts in the input (for instance, if only one-word values are possible, include only one-word values. If 5-6 word values are possible but rare, include only a handful of 5-6 word values).

4.9. Custom Slot Type Values

The custom slot type is used for items that are not covered by Amazon’s built-in types and is recommended for most use cases where a slot value is one of a set of possible values.

Test Expected Results

1.

Inspect the skill’s intent schema to identify all slots that use custom slot types.

For each custom slot type, review the set of values you provided for the type.

  • If possible, the list of values includes all values you expect to be used. For example, a horoscope skill with a LIST_OF_SIGNS custom type would include all twelve Zodiac signs as values for the type.
  • If the list cannot cover every possible value, it covers as many representative values as possible.
  • If the list cannot cover every possible value, the values reflect the expected word counts. For instance, if values of one to four words are possible, use values of one to four words in your value list. But also be sure to distribute them proportionally. If a four-word value occurs in an estimated 10% of inputs, then include four-word values only in 10% of the values in your list.
  • All custom values are written in the selected language. For instance, all custom slot type values on the German tab must be in German.

For guidelines for defining custom slot type values, see Recommendations for Custom Slot Type Values.

4.10. Writing Conventions for Sample Utterances

Sample utterances must be written according to defined rules in order to successfully build a speech model for your skill.

Test Expected Results

1.

Review the text of all sample utterances.

All sample utterances adhere to the following
writing conventions:

  • Capital letters and punctuation are not used (periods can be used, but only in initialisms and spelling (e.g. “t. v.”). Hyphens can be used, but should be very infrequent. Apostrophes can be used in possessives).
  • Individual letters are followed by a period and a space before the next letter or word: “TV” is written as “t. v.”, “OK” is written as “o. k.”- Compounds are written similarly: “AccessHD” is written as “access h. d.”
  • The invocation name must not appear in isolation or within supported launch phrasing. For example, a skill with the invocation name “Daily Horoscopes”cannot contain any sample utterances that are just “daily horoscopes” or sample utterances containing launch phrases such as “tell daily horoscopes.”For a complete list of launch phrases see Understanding How Users Invoke Custom Skills.
  • All sample utterances are written in the selected language. For instance, the sample utterances on the German tab must be in German.

For more information about syntax rules for sample utterances, see the Custom Interaction Model Reference .

4.11. Error Handling

Unlike a visual interface, where the user can only interact with the objects presented on the screen, there is no way to limit what users can say in a speech interaction. Your skill needs to handle a variety of errors in an intelligent and user-friendly way. This test verifies your skill’s ability to handle common errors.

For more information on validating user input, please see Handling Possible Input Errors.

Test Expected Results

1.

Invoke the skill without specifying an intent, for example:

  • Open <Invocation Name>.

When prompted to respond, say nothing.

  • The skill responds with a prompt that clarifies the information you need to provide.
  • The prompt clearly indicates what you need to say.
  • The prompt ends with a question and keeps the session open for your response.

Note that in this scenario, the prompt you hear is the re-prompt included in the previous response.

2.

Invoke the skill using the following phrase:

  • Open <Invocation Name>.

When prompted to respond, say something that matches one of your skill’s intents, but with invalid slot data.

For instance, if the intent expects an AMAZON.DATE slot, say something that cannot be converted to a date.

Repeat this test for each slot.

  • The skill responds with a prompt or help text that clarifies the information you need to provide.
  • The prompt clearly indicates what you need to say.
  • The prompt ends with a question and keeps the session open for your response.

Note that in this scenario, the prompt is not the re-prompt included in the previous response. This prompt must come from error handling within the code that handles the intent.

4.12. Providing Help

A skill must have a help intent that can provide additional instructions for navigating and using the skill. Implement the AMAZON.HelpIntent to provide this. You do not need to provide your own sample utterances for this intent, but you do need to implement it in the code for your skill. For details, see Implementing the Built-in Intents.

This test verifies that this intent exists and provides useful information.

Test Expected Results

1.

Invoke the skill without specifying an intent, for example:

  • Open <Invocation Name>.

When prompted to respond, say “help”.

For a simple skill that gives a complete response even with no specific intent, (such as the Space Geek sample), invoke the help intent directly:

  • Ask <Invocation Name> for help.

The help response:

  • Provides instructions to help the user navigate the skill’s core functionality.
  • Is more informative than the prompt users hear when launching the skill with no intent. For example, the help prompt could explain more about what the skill does or inform users how to exit the skill.
  • Educates users on what the skill can do, as opposed to what they need to say in order for the skill to function.
  • Ends with a question prompting the user to complete their request.
  • Leaves the session open to get a response from the user.

For more about designing help for your skill, see What Alexa Says.

4.13. Stopping and Canceling

Your skill must respond appropriately to common utterances for stopping and canceling actions (such as “stop,” “cancel,” “never mind,” and others). The built-in AMAZON.StopIntent and AMAZON.CancelIntent intents provide these utterances. In most cases, these intents should just exit the skill, but you can map them to alternate functionality if it makes sense for your particular skill. See Implementing the Built-in Intents.

Test Expected Results

1.

Start the skill and invoke an intent that prompts the user for a response.

After hearing the prompt, say “stop.”

One of the following occurs:

  • The skill exits.
  • The skill returns a response that is appropriate to the skill’s functionality. The response also makes sense in the context of the request to “stop.” For example, a skill that places orders could send back a reply confirming that the user’s order has been canceled.

If the skill responds to all requests with a complete response and never provides a prompt, skip this test.

2.

Invoke an intent that responds with lengthy text-to-speech. As soon as Alexa begins speaking the response, say “Alexa, stop” to interrupt the response.

After the wake word interrupts Alexa, one of the following occurs.

  • The skill exits.
  • The skill returns a response that is appropriate to the skill’s functionality. The response also makes sense in the context of the request to “stop.” For example, a skill that places orders could send back a reply confirming that the user’s order has been canceled.

If all of the skill’s responses are too short to reasonably interrupt, skip this test.

3.

Start the skill and invoke an intent that prompts the user for a response.

After hearing the prompt, say “cancel.”

One of the following occurs:

  • The skill exits.
  • The skill returns a response that is appropriate to the skill’s functionality. The response also makes sense in the context of the request to “cancel.” For example, a skill that places orders could send back a reply confirming that the user’s order has been canceled.

If the skill responds to all requests with a complete response and never provides a prompt, skip this test.

4.

Invoke an intent that responds with lengthy text-to-speech. As soon as Alexa begins speaking the response, say “Alexa, cancel” to interrupt the response.

After the wake word interrupts Alexa, one of the following occurs.

  • The skill exits.
  • The skill returns a response that is appropriate to the skill’s functionality. The response also makes sense in the context of the request to “cancel.” For example, a skill that places orders could send back a reply confirming that the user’s order has been canceled.

If all of the skill’s responses are too short to reasonably interrupt, skip this test.

5.

Invoke any intent that starts the skill session. While the session is open, say “Exit.” This ends the session and sends your skill a SessionEndedRequest.

The skill closes without returning an error response.

Appendix: Deprecated Test for Sample Utterances (Slot Type Values)

If all of your slots use the newer slot types with the AMAZON namespace (such as AMAZON.DATE), you do not need to do this test.

In previous versions of the Alexa Skills Kit, it was necessary to include slot values showing different ways of phrasing the slot data in your sample utterances. For example, sample utterances for a DATE slot were written like this:

OneshotTideIntent when is high tide on {january first|Date}
OneshotTideIntent when is high tide {tomorrow|Date}
OneshotTideIntent when is high tide {saturday|Date}
...(many more utterances showing different ways to say the date)

If your skill still uses this syntax for the built-in slot types, you need to review the sample slot values in your sample utterances. We strongly recommend migrating to the updated slot types that no longer require the sample values.

Test Expected Results

1.

Inspect the skill’s intent schema to identify all slot types, then inspect the slot type values found in the sample utterances.

Verify that the slot type values provide sufficient variety for good recognition.

  • NUMBER: provide multiple ways of stating integer numbers, and include samples showing the full range of numbers you expect (for example, include “ten, “one hundred”, and several samples in between). If you expect the numbers used to only be within a small range, include every number within that range as a sample value in an utterance.
  • DATE: provide both relative and absolute date samples (for example, “today”, “tomorrow”, “september first”, “june twenty sixth twenty fifteen”). If you expect a certain set of phrases to be more likely than others, include samples of those phrases.
  • TIME: provide samples of stating the time (“three thirty p. m.”).
  • DURATION: provide samples of indicating different time periods (“five minutes”, “ten days”, “four years”).
  • LITERAL: see recommendations for AMAZON.LITERAL, above. The LITERAL and AMAZON.LITERAL slot types work the same way.

Next Steps