Use the Intent Confidence Dashboard to Improve Skill Accuracy

Omkar Phatak Apr 02, 2020
Share:
Build Tips & Tools Developer Console Analyze Voice User Interface News
Blog_Header_Post_Img

Today, we are excited to announce the launch of intent confidence as part of ASK analytics dashboard. Intent confidence indicates your interaction model’s performance by mapping customer utterances to intents with high, medium and low confidence. In aggregate you can see how your skill is performing against how customers are using it. This blog covers a few methods you can use to improve the overall intent confidence of your skill if it is not performing to your expectations. You can view the intent confidence metric under the analytics tab of the developer console.

Intent Confidence Dashboard

NLU and Alexa Skills

There are many ways to improve your skill’s intent confidence. You can achieve this despite the many challenges that human languages pose; namely, vast combinations of words, varying vocabulary, ambiguity, grammatical and structural complexity, context-based complications, accents, dialects and more. Two key technologies working seamlessly to make this possible are automatic speech recognition (ASR) (converting speech to text) and natural language understanding (NLU) (extracting meaning from text). Alexa’s NLU technology maps customer requests to the correct response from your skill. The data you provide in you interaction model - intents, slots, slot values, sample utterances, and more - trains the Alexa NLU on how to route customer requests to the appropriate part of your skill. In this blog, we’ll talk about improving the NLU accuracy of your Alexa skills, and thus your skill’s intent confidence.

How to Improve the NLU Accuracy of Your Skills

An NLU accuracy error occurs when a customer invokes a skill with a specific request and it provides an inappropriate response or it doesn’t get invoked altogether. You can now review intent confidence within the developer portal analytics tab to keep track of NLU accuracy. Here are a few things, within your control as a developer, that can reduce your skill’s NLU error rate and ensure a smooth customer experience:

1. Avoid Common Invocation Name Issues

The choice of invocation name is important for your skill from a voice recognition and discoverability point of view. Keep in mind that your invocation name isn’t necessarily the same as the display name for your skill in the Alexa skill store. Here are a few invocation name issues that you can easily avoid:

  • Apostrophes and Periods: If your skill is named USA facts, for best recognition, the correct invocation name would be ‘u. s. a. facts’ (with periods and single spaces, interspersed within the acronym letters). Also, apostrophes should only be used for possessive form of nouns in invocation names (e.g. ‘sam’s horoscope’).
  • No Words Without Spaces: Words without spaces (e.g. horoscopereader) shouldn’t be added in the invocation name. This type of invocation name will most likely cause recognition issues.
  • Write Numbers as Words: If your invocation name contains numbers, they must be written in words. For a skill named ‘1984 Quotes’, the invocation name should be ‘nineteen eighty four quotes’.
  • Avoid Phonemes: Phonemes are not recommended to be used in an invocation name. For example, you will most likely face voice recognition issues if the invocation name is defined to be ‘you ess a facts’ instead of ‘u. s. a. facts’.
  • Brand Names and Made-Up Words: It’s recommended that you thoroughly test invocation names that are brand names or made-up/non-vocabulary words, prior to submitting them for certification, as the recognition accuracy levels for them may be low. Whenever possible, use words your customers can easily recognize and pronounce correctly.

2. Include a Comprehensive Set of Sample Utterances Resembling Real World Interactions

The success of your skill in responding to customer utterances is directly related to how closely your sample utterances resemble real world customer interactions. Work backwards from every intent or functionality within your skill to think about all the possible ways in which customers could pose questions/requests that will be answered by those intents. Here are some of the best practices for building a robust interaction model:

  • Use suggested slot values: When entering your slot values and synonyms through the developer console, you will find suggested slot values popping up that are relevant to the ones you are entering. These are served by a machine-learning based algorithm that finds similar slot values to what you are looking to enter. Using the suggested slot values and incorporating them in the interaction model will enrich your slot’s repertoire to cover a wider gamut of customer choices, in turn improving its recognition accuracy. For more details, refer our guide on using suggested slot values.
  • Use referenced-based catalog management: What if you could fetch your slot type values from an external database instead of having to hard code them all when publishing the skill? Reference based catalog management lets you do just that. You want to use this for very dynamic slot types where the values are constantly in flux like ‘song names’ for example.
  • Use a Thesaurus: Use a thesaurus to find synonyms for words used in sample utterances as well as slot values and ensure that all possible variations of the words involved are included.
  • Add as many carrier phrases as possible: A carrier phrase is the entire content of an utterance, except slots. Examples are "search for", "please" or "find out". Try to provide as much context around slots by including a variety of sample utterances and carrier phrases (e.g. ‘what is’, ‘please provide’, etc.). This helps illustrate the usage of slots in as many ways as possible, thus maximizing your chances of building a well-rounded interaction model.
  • Include all possible utterance variations: Look at all the possible straightforward as well as colloquial ways in which the same sentence could be spoken by customers. Include all possible variations of the sentences, contractions (what’s, they’ve, etc.) and shortened informal sentences.
  • Avoid overlap in utterances: Ensure there the same utterance does not map to multiple intents and/or slots. Also, avoid assigning the same slot value to different slots. Use the Utterance Conflict Detection to detect and remove any such overlaps.
  • Resolve all spelling, grammar, punctuation errors: Just like in any interface, in a VUI (voice user interface), error-free text in the interaction model with grammatically accurate phrases/sentences is absolutely necessary. It leads to a better customer experience and superior recognition.
  • Relevant Utterances and Slot Values: Ensure only relevant slot values and sample phrases are present with respect to slots and intents that you've used. Irrelevant content is going to throw off your skill’s utterance recognition and will make it error-prone. Examine and remove all irrelevant content from all intents.

3. Use Custom or Built-in Slots and Built-in Intents Wherever Relevant

If your utterances contain different words belonging to the same category in multiple utterances (for example, Virgo, Aries etc. belonging to the ‘Zodiac’ category in our example skill), it helps if you add custom slots wherever applicable in your sample list of utterances. Also, Amazon provides a wide range of built-in slots describing numbers, dates, times, phrases and list of items. These cover some of the most common use cases that may be used for skills. If you use any of these built-in slot types, you do not need to provide the slot values or sample utterances for the same as they are pre-built and provided. Both of these slot types (custom and built-in) automatically reduce the number of utterances you need to provide for your skill.

For example, in our ‘Horoscope Reader’ skill, it makes sense to add a custom slot for zodiac signs – ‘{sign}’ with slot values equating to all of the 12 signs and a built-in Amazon.DATE slot in your utterances:

Alexa, Ask Horoscope Reader my horoscope for {sign} {AMAZON.DATE}.

Here, AMAZON.DATE slot has built-in values like "today", “yesterday” "tomorrow", or “august”, "july” that it can convert into a date format.

You may use built-in intents wherever relevant in cases where the actions are of a general nature like stop or cancel. These built-in intents tend to have higher NLU accuracy than custom intents that are built for the same purpose. Use built-in intents like AMAZON.CancelIntent, AMAZON.StopIntent, and AMAZON.NoIntent whenever relevant. For more details on built-in intents, please refer our guide here.

4.Use Fallback Intent to Handle Out-of-Domain Requests

Despite building a robust interaction model that could cover most scenarios, at times, customers may express utterances that are out-of-domain (i.e., utterances that aren't mapped to any of your intents). In such cases, your skill should still be able to gracefully handle the request and gently redirect your customers with a message that conveys what the skill can do for them and set the right expectations. For this exact purpose, we have provided the Fallback Intent that can help you take care of these unmapped utterances. Here is an example:

User: 'Alexa, open Horoscope Reader'

Alexa: 'Welcome to Horoscope Reader. What is your Sun Sign?'

User: 'What is Robert De Niro’s Sun sign? '
(This utterance isn’t mapped to any of the Horoscope Reader intents. Since Alexa cannot map this utterance to any intent, AMAZON.FallbackIntent is triggered.)

Alexa: 'The Horoscope Reader skill can't help with that, but I can tell you your daily horoscope. What is your Sun sign?

(The response gently nudges the customer to ask questions within the skill's domain.)

For more details, please refer to our detailed guide here.

5. Review Intent History

The intent history feature helps you improve the resolution accuracy for your live or in development skills by letting you work backwards from actual customer interaction data for your skill. It provides the anonymized and aggregated customer utterances and slots data along with confidence levels (High, Medium, Low) with which the current run-time interaction model is resolving them to your skill’s intents and slots. The tool will display daily data for any skill locale when it has at least 10 unique users for that day and is not inclusive of all utterances but only constitutes a select sample size.

By studying this data, you can check the confidence level with which each utterance is resolved to an intent. You may either take action to change its mapping to a different intent/slot or retain the existing one. If you see frequently-used user utterance requests in the data that are currently missing in your interaction model, add them to improve accuracy. You may also identify carrier and common phrase patterns that are currently not included in your interaction model and update them.

For example, let’s say you open the intent history tab for our skill 'Horoscope Reader' and it shows “talk about my horoscope” as a frequent utterance that is currently resolved to “Fallback Intent” or is not resolved to any intent. This indicates that the phrase is currently not triggering the launch request and hence it is mapping to the Fallback intent and the skill is not working in this case. To fix this, you will map the phrase as a sample utterance for the “LaunchRequest” intent for your skill. Intent history feature is accessible on the left hand side of the developer console under the “Build” tab as depicted in the screenshot below. Please refer to our detailed guide here for more details.

intent-history

6. Use the Utterance Profiler to Test Intent Mapping Accuracy

A simple way of improving your skill’s intent mapping accuracy is to study the utterance profiler data (accessible through the Build page in the developer console, by clicking the utterance profiler button in the upper-right corner). It lets you check whether your interaction model is resolving to the right intents and slots, even before your skill’s endpoint code is ready. Thus, it lets you test the standalone interaction model in the very early stages without the need to deploy code. For the utterances that are not resolving to the right intents or slots, you can go back and iteratively update the interaction model until it is resolving correctly. You can see an example of utterance profiler use for a skill in the screenshot below.

Utterance Profiler

7. Use Entity Resolution to Eliminate Slot Value Redundancy

The entity resolution feature can help improve the NLU accuracy of customer utterances that have slot values with the same meaning, which do not need to be handled differently. This is achieved by defining synonyms for those slot values. It addresses the redundancy in your slot values and makes it easy for synonyms to be handled by the same code.

For example, if your skill has a slot named {weather} and it has possible slot values to be ‘storm’, ‘hurricane’, ‘gale’, ‘squall’ etc., and you have similar code responses for all of them, you can use entity resolution by defining ‘storm’ as a canonical slot value, ‘hurricane’, ‘gale’ and ‘squall’ as synonyms, and defining ‘storm’ as the unique ID.

Once you’ve set this up, if a user says ‘gale’ in the utterance, your skill back-end will receive ‘gale’, as the customer utterance, ‘storm’ as the canonical value and ‘storm’ as the unique ID. No matter what the synonym value in the utterance, your skill can now take the unique ID and respond according to that slot value, thus reducing the redundancy in code that would otherwise be introduced by slot values with same connotation. Refer to our detailed guide here for more.

8. Use the NLU Evaluation Tool

A scalable technique for batch testing the NLU accuracy of your interaction model is provided by the NLU evaluation tool. Instead of testing each utterance manually on the developer console, this tool lets you create a complete set of utterances mapped to expected intents and slots, known as the annotation set, and automate the batch testing of your skill’s interaction model using it. Results of the tests are marked to be passed or failed depending upon whether they invoked the right intents and slots. This automates your testing process and makes regression testing possible. For more details, please refer to this guide. As depicted in the screenshot below, you can access the NLU evaluation Tool under the 'Build' tab on the developer console.

NLU Evaluation Tool

9. Use Dynamic Entities to Improve Slot Recognition

Your skill may have slots that are highly dynamic in nature ('food item names' for example). With a static slot value list, voice recognition of slots with dynamic values could be poor. In such cases, you may use Dynamic Entities to modify or replace slot values at run time, to offer a personalized experience for customers. The use of dynamic entities substantially improves speech recognition by dynamically biasing the skill’s interaction model towards newly added slot values at run time. For example, you might be building a restaurant skill that lets customers order items. Dynamic entities let customers order the 'daily specials' by passing on the current daily special slot values at run time, even though they may not have been entered in the pre-built ‘static’ model. For skills that use device location, like a hyper-local food ordering skill, different slot values for restaurant names would be served at run time, based on the device location provided. For implementation details, please refer to our detailed guide here.

Review Intent Confidence Today

Studying your skill’s intent confidence, using the tools above to address issues, and iteratively updating your interaction model is a great way of ensuring your skill is accurate over time. Hope we have 'invoked' your curiosity enough to motivate your 'intent' of exploring this important topic further. For questions, you may reach out to me on Twitter at @omkarphatak.

Related Articles

How to Add FallbackIntent Handling to Your Alexa Skill

How to add the Fallback intent to ensure your skill can respond gracefully to unexpected utterances.

5 Ways to Build a Better Interaction Model for Your Alexa Skill and Improve Customer Engagement

Know the best practices you can incorporate into your skills to improve customer engagement.

 

Update Live Skills in Minutes

The new automated certification workflow lets you update your live skill’s slot values and sample utterances within minutes.

Subscribe