Batch Test Your Natural Language Understanding (NLU) Model
Use the NLU evaluation tool in the developer console to batch test the natural language understanding (NLU) model for your Alexa skill.
To evaluate your model, you define a set of utterances mapped to the intents and slots you expect to be sent to your skill. This is called an annotation set. Then you start an NLU evaluation with the annotation set to determine how well you skill's model performs against your expectations. The tool can help you measure the accuracy of your NLU model, and run regression testing to ensure that changes to your model don't degrade the customer experience.
You can use the NLU evaluation tool with skill models for all locales. You can also access these tools with the Skill Management API (SMAPI) or the ASK Command Line Interface (ASK CLI). See NLU Evaluation Tool API.
- Annotations and annotation sets
- Create and edit annotation sets
- Start an evaluation
- Review the results and update your model
- Use the NLU evaluation tool for regression testing
- Related topics
You can use the NLU evaluation tool once you have defined an interaction model and successfully built it.
The tool does not call an endpoint, so you do not need to develop the service for your skill to test your model.
Annotations and annotation sets
To evaluate your model with the NLU evaluation tool, you create an annotation set. This is a set of utterances mapped to the intents and slots you expect to be sent to your skill for each one you expect for each one. Each utterance with its expected intent and slots is called an annotation.
Each annotation has the following fields:
- The utterance to test.
- Do not include the wake word or invocation name. Provide just the utterance as it would be used after the user invokes the skill.
- You can use either written form or spoken form for the utterance. For example, you can use numerals ("5") or write out numbers ("five"). For more examples, see the rules for custom slot type values.
- Expected Intent
- The intent that the utterance should trigger.
- Expected Slot Names
- (Optional) The name of the slot that the utterance should fill. You can provide more than one expected slot for an utterance.
- Expected Slot Values
- Required for each Expected Slot Name. Provides the value you expect the utterance to fill for the specified slot.
- Reference Timestamp (UTC)
- (Optional) A time and date in UTC format to use as the basis for relative date and time values. Use this when the utterance tests the
AMAZON.TIMEslots with words that represent relative dates and times such as "today," "tomorrow," and "now." Provide the full date and time, including milliseconds, for example:
2018-10-25T23:50:02.135ZFor more information, see Create an annotation with relative dates or times.
Create and edit annotation sets
You can manage annotation sets in the developer console from the Build > Custom > Annotation Sets page. You can then create and edit annotations directly in the developer console.
Create an annotation set in the developer console
- Open your skill in the developer console.
- Navigate to Build > Custom > Annotation Sets.
- Click Create Annotation Set.
- At the top of the page, enter a name for the annotation set.
- Create the annotations.
Edit an annotation set
- Open your skill in the developer console.
- Navigate to Build > Custom > Annotation Sets.
- Find the annotation set to edit and click its name or the Edit link.
Create the annotations in the developer console
For more about the fields for an annotation, refer back to Annotations and annotation sets.
- Create or edit an annotation set.
- Enter the utterance to test and click the plus or press enter.
- In the table of utterances, click in the Expected Intent field and select the intent the utterance should trigger.
- If the utterance should also fill a slot, select the Expected Slot Name and enter the Expected Slot Values, then click the +.
- If needed, click in the Reference Timestamp field and select the date and time from the date picker. This fills in the selected timestamp in UTC format. See Create an annotation with relative dates or times.
- After you have added all the new annotations, click Save Annotation Set.
Create an annotation with relative dates or times
AMAZON.TIME slot types let users specify dates and times relative to the current date. For example, the utterance "today" normally resolves to the current date. The slot value therefore depends on the day you test the utterance.
To test these types of utterances with the NLU evaluation tool, enter a specific date and time in the Reference Timestamp (UTC) field. This value is then used instead of the actual current date and time when calculating the date and time slot values.
For example, note the following annotations:
|Utterance||Expected Intent||Expected Slot Names||Expected Slot Values|
|test the date slot with tomorrow||TestDateSlotIntent||DateSlotExample||2019-08-22|
|test the date slot with next Monday||TestDateSlotIntent||DateSlotExample||2019-08-26|
Without a Reference Timestamp, these utterances would only pass if you ran the evaluation on August 21, 2019. Set the Reference Timestamp for each of these to 2019-08-21T00:00:00.000Z. Then, regardless of the actual date and time, the NLU evaluation tool resolves the slots as though it was midnight on August 21, 2019, so the specified Expected Slot Values match the actual results.
Select the date and time from the calendar picker. This adds the date/time in UTC format:
YYYY-MM-DDThh:mm:ss.sTZD, for example: 1997-07-16T19:20:30.45Z.
Start an evaluation
Once you have at least one annotation set defined for your skill, you can start an evaluation. This evaluates the natural language understanding (NLU) model built from your skill's interaction model, using the specified annotation set.
For live skills, you can choose whether to run the evaluation against the development version or the live version.
You can run multiple evaluations at the same time.
From any page in the Build > Custom > Interaction Model section, click the Evaluate Model button in the upper-right corner. Evaluate Model is also available on the Annotation Sets page.
- Select the NLU Evaluation tab.
- From the Stage list, select Development or Live (if applicable).
- From the Annotation Source list, select one of your annotation sets.
- Click Run an Evaluation.
The evaluation starts and its current status is displayed in the NLU Evaluation Results table. Note that an evaluation may take several minutes. You can close the Evaluate Model panel and do other work on your skill. Check back later to see the results of the test.
Review the results and update your model
You can review the results of an evaluation on the NLU Evaluation panel, then drill down into the results for a specific evaluation. All past evaluations are saved for later review.
Review a summary of NLU evaluation results
Click the Evaluate Model button, then select the NLU Evaluation tab. The table at the bottom of the panel shows each in-progress and completed evaluation.
Each evaluation in the table displays the following information:
- Evaluation ID
- Unique ID for the evaluation. Once an evaluation is complete, this becomes a link you can click to see the full report.
- Indicates whether the evaluation is Complete.
- Displays the results of the evaluation. An evaluation is considered PASSED if all the tests within the annotation set returned the expected intent and slot values. An evaluation is considered FAILED if any of the tests within the annotation set failed to return the expected intent and slot values.
- Annotation Src
- Unique ID for the annotation set used in the evaluation. Click this link to open the annotation set page.
- The skill stage that was tested (Development or Live).
- Start Time
- The time the evaluation was started.
Get the results for a specific evaluation
To see the results for a given evaluation, open the summary of results. Click the Evaluation ID link to open the results. Use the results page to see which utterances failed the test. For each utterance that failed, the table shows the expected value and the actual value, highlighted in red.
Click the Export JSON button to download the report in JSON format.
Update your skill
Use the evaluation results to identify failing utterances. Add these to your interaction model as sample utterances and slot values, rebuild, then re-run the evaluation with the same annotation set to see if the changes improved the accuracy.
Use the NLU evaluation tool for regression testing
The NLU evaluation tool is especially useful for regression testing. Once you have an annotation set that passes all the tests, you can re-run the evaluation whenever you make changes to your interaction model to ensure that your changes did not degrade your skill's accuracy.
If you do encounter issues, you can revert your skill to an earlier version of your interaction model. See Use a previous version of the interaction model.
- Create the Interaction Model for Your Skill
- Create Intents, Utterances, and Slots
- Create and Edit Custom Slot Types
- Define the Dialog to Collect and Confirm Required Information
- Test Your Utterances as You Build Your Model
- Validate Slot Values
- Alexa Design Guide
- Manage Skills in the Developer Console
- Build Your Skill
- NLU Evaluation Tool (SMAPI)