Build, Test, and Tune Your Skills with Three New Tools
Leo Ohannesian Oct 09, 2019
Share:
Analyze Developer Console News Test
Blog_Header_Post_Img

We’re excited to announce the General Availability of two tools which focus on your voice model’s accuracy: Natural Language Understanding (NLU) Evaluation Tool and Utterance Conflict Detection. We are also excited to announce that you will now be able to build your own quality and usage reporting with the Get Metrics API, now in Beta. These tools help complete the suite of Alexa skill testing and analytics tools that aide in creating and validating your voice model prior to publishing your skill, detect possible issues when your skill is live, and help you refine your skill over time.

The NLU Evaluation Tool helps you batch test utterances and compare how they are interpreted by your skill’s NLU model against your expectations. The tool has three use cases:

  1. Prevent overtraining NLU models: overtraining your NLU model with too many sample utterances and slot values can reduce accuracy. Instead of adding exhaustive sample utterances to your interaction model, you can now run NLU Evaluations with utterances you expect users to say. If any utterance resolves to the wrong intent and/or slot, you can improve accuracy of your skill’s NLU model by only adding those utterances as new training data (by creating new sample utterances and/or slots).
  2. Regression tests - you can create regression tests and run them after adding new features to your skills to ensure your customer experience stays intact.
  3. Accuracy measurements - you can measure the accuracy of your skill’s NLU model by running an NLU Evaluation with anonymized frequent live utterances surfaced in Intent History (production data), and then measure the impact on accuracy for any changes you make to their NLU model.

Utterance Conflict Detection helps you detect utterances which are accidentally mapped to multiple intents, which reduces accuracy of your Alexa skill’s Natural Language Understanding (NLU) model. This tool is automatically run on each model build and can be used prior to publishing the first version of your skill or as you add intents and slots over time - preventing you from building models with unintended conflicts.

Finally, with the Get Metrics API (Beta) you can immediately benefit from the ability to analyze key metrics like unique customers in your preferred tools for analysis or aggregation. For Example, you can now connect to AWS CloudWatch and create monitors, alarms and dashboards in order to stay on top of changes that may impact customer engagement.

With these three additions to the ASK Tech toolset, we will recap the overall suite of testing and feedback tools you have available and where they fall in the overall skill development lifecycle. The skill development lifecycle can be separated into three general steps that come after your design phase (see situational design): building, testing, and tuning.

Build Your Dialog Model
As you are defining your intents, slots, and dialogs from the ground up per your situational design definition, you will want to test how utterances fall into your model. This is where utterance profiler is useful. You can enter utterances to see how they resolve to your intents and slots. When an utterance does not invoke the right intent or slot, you can update your sample utterances or slot and retest, all before writing any code for your skills. You should set up a fallback intent for requests that your skill does not know how to interpret, otherwise known as unhandled requests. Now, as you’re building your voice model out, you can use utterance conflict detection to ensure that there aren’t conflicting utterances with respect to your VUI. Utterance conflict detection will identify utterances (and slots) that map to more than one intent. Outlining conflicting utterance will help you detect areas where the NLU model of your skill could break and cause an unintended customer experience.

Test Before Go-Live
As you approach voice model readiness, you will want to test using the built in Alexa Simulator. You can also test-distribute to your Alexa device or go for beta testing. As your voice model becomes solidified, you can start using the NLU Evaluation Tool to batch test utterances and how they fit into your voice model. You will need to define a set of utterances mapped to the intents and slots you expect to be sent to your skill. You can then run an NLU Evaluation and add to your slots and intents to improve the accuracy of your skill depending on the results. Before going live, you will want to both functionally test and debug your skill.

Tune Over Time
The skill development journey has only begun when you go live. You can use Interaction path Analysis to begin to understand your customer’s journey through your skill, and where possible bottlenecks are. Interaction path analysis shows aggregate skill usage patterns in a visual format, including which intents your customers use, in what order. This enables you to verify if customers are using the skill as expected, and to identify interactions where customers become blocked or commonly exit the skill. You can use insights gained from interaction path analysis to make your flow more natural, fix errors, and address unmet customer needs.

The Intent History page of the developer console displays aggregated, anonymized frequent live utterances and the resolved intents. You can use this to learn how users interact with your skill to identify improvements you may want to make to your interaction model. The Intent History page displays the frequent utterances in two tabs, Unresolved Utterances, which did not successfully map to an Intent, and Resolved Utterances, which mapped successfully to an intent and slot. This lets you review the utterances, update your interaction model to account for phrases that were not routed correctly, and mark utterances as resolved. For example, suppose you see a particular utterance that was sent to AMAZON.FallbackIntent, but it is actually a phrase that should trigger one of your custom intents. You can map that utterance directly to that intent and update your interaction model right from the Intent History page. Conversely, you could add to your voice model if you find that an utterance falling to the Fallback intent is a good feature for your skill. As mentioned above, you can also use the utterances surfaced in Intent History to run a NLU Evaluation and generate an accuracy indicator for your skill. You can also re-run the test after making changes to your skill model to measure the overall impact on your skill experience, otherwise known as a regression test.

Access to skill metrics was previously restricted to pre-configured dashboards displaying static metrics in the developer console. Static metrics are insightful but fall short when you need to automate mechanisms that guarantee operational continuity. In contrast, with the Get Metrics API (Beta), you can set up live metrics to your preferred analysis tools to pinpoint changes in your Skill's performance and behavior. You can now compute your own aggregated metrics or create automation that feeds that data into a monitoring system like AWS CloudWatch, where you can create alarms or trigger changes in your skill based on certain inputs. For example, you can track how new customers are interacting with your skill and set up alarms to understand when indicators of a bad user experience surface, like when customers land on the AMAZON.FallbackIntent at a higher rate than normal. The Get Metrics API (Beta) also works across multiple skills so you can now set up aggregated reporting for your entire skills dialog without switching back-and-fourth to the developer console.

With the new Get Metrics API, you can save time and increase visibility into the key insights that we provide in order to optimize skill engagement. The Get Metrics API is available for skill builders in all locales and currently supports the Custom skill model, the pre-built Flash Briefing model, and the Smart Home Skill API.

Start Optimizing today
Begin working with the three new tools in order to create an optimal customer experience. Start by reading our technical documentation on the NLU Evaluation ToolUtterance Conflict Detection, and the Get Metrics API (Beta) today!