Improving Natural Language Understanding Accuracy of Your Alexa Skills

Omkar Phatak Jan 16, 2020
Tips & Tools Voice User Interface Build

The Alexa Skills Kit empowers you with Star Trekesque creative powers to help computers understand and speak human languages. Alexa technology achieves this despite the almost-insurmountable challenges that human languages pose; namely, infinite combination of words, vast vocabulary, ambiguity, grammatical and structural complexity, context-based complications, accents, dialects and more. Two key technologies working seamlessly to make this possible are automatic speech recognition (ASR) (converting speech to text) and natural language understanding (NLU) (extracting meaning from text). Alexa’s NLU technology helps map customer utterances to the correct response, making it the most critical ingredient in the recipe for your skill’s success. In this blog, we’ll talk about improving the NLU accuracy of your Alexa skills.


The Anatomy of an Alexa Skill Interaction

To understand how NLU works for your Alexa skill, consider the following customer utterance for a hypothetical skill named “Horoscope Reader”:

Alexa, Ask Horoscope Reader my horoscope for Virgo today.

Here's what happens next. Alexa hears the wake word ('Alexa'), gets triggered into action and listens to the customer utterance. ASR then converts the customer’s utterance into text, which is broken down into its different parts and identified:


The word 'Ask' is identified as the 'Launch Phrase', 'Horoscope Reader' as the 'Skill Name' and the rest of the sentence is identified as the 'Customer request' through statistical modeling and exact match rules. Next, Alexa refers the skill's interaction model to map the customer request – “my horoscope for Virgo today” to the correct intent - 'GetHoroscope' and maps the slot values ‘Virgo’ and ‘today’ to the slots - {sign} and {time} respectively. These values are then passed to your skill's backend code through a structured JSON query, to elicit the correct response, which in this case is the horoscope prediction for Virgo on that day. This response is then converted from text to speech and heard. As we can see, to accurately answer all customer requests for your skill, Alexa must provide an interaction model at run time to elicit the expected response. This model contains the maximum possible permutations and combinations of the utterances, slots and slot values, mapped to intents, to cover a wide range of possible customer utterances. Hence, the quality of the interaction model is a crucial aspect that determines the NLU accuracy of your skill. 


How to Improve the NLU Accuracy of Your Skills

An NLU accuracy error occurs when a customer invokes a skill with a specific request and it provides an inappropriate response or it doesn’t get invoked altogether. Here are a few things, within your control as a developer, that can reduce your skill’s NLU error rate and ensure a smooth customer experience:

1. Avoid Common Invocation Name Issues

The choice of invocation name is important for your skill from a voice recognition and discoverability point of view. Here are a few invocation name issues that you can easily avoid:

  • Apostrophes and Periods: If your skill is named USA facts, for best recognition, the correct invocation name would be ‘u. s. a. facts’ (with periods and single spaces, interspersed within the acronym letters). Also, apostrophes should only be used for possessive form of nouns in invocation names (e.g. ‘sam’s horoscope’).
  • No Words Without Spaces: Words without spaces (e.g. horoscopereader) shouldn’t be added in the invocation name. This type of invocation name will most likely cause recognition issues.
  • Write Numbers as Words: If your invocation name contains numbers, they must be written in words. For a skill named ‘1984 Quotes’, the invocation name should be ‘nineteen eighty four quotes’.
  • Avoid Phonemes: Phonemes are not recommended to be used in an invocation name. For example, you will most likely face voice recognition issues if the invocation name is defined to be ‘you ess a facts’ instead of ‘u. s. a. facts’.
  • Brand Names and Made-Up Words: It’s recommended that you thoroughly test invocation names that are brand names or made-up/non-vocabulary words, prior to submitting them for certification, as the recognition accuracy levels for them may be low. As far as possible, prefer invocation names that are part of standard vocabulary.

2. Include a Comprehensive Set of Sample Utterances Resembling Real World Interactions

The success of your skill in responding to customer utterances is directly related to how closely your sample utterances resemble real world customer interactions. Work backwards from every intent or functionality within your skill to think about all the possible ways in which customers could pose questions/requests that will be answered by those intents. Here are some of the best practices for building a robust interaction model:

  •  Use a Thesaurus: Use a thesaurus to find synonyms for words used in sample utterances and ensure that all possible variations of the words involved are included.
  • Add as many carrier phrases as possible: A carrier phrase is the entire content of an utterance, except slots. Examples are "search for", "please" or "find out". Try to provide as much context around slots by including a variety of sample utterances and carrier phrases (e.g. ‘what is’, ‘please provide’, etc.). This helps illustrate the usage of slots in as many ways as possible, thus maximizing your chances of building a well-rounded interaction model.
  • Include all possible utterance variations: Look at all the possible straightforward as well as colloquial ways in which the same sentence could be spoken by customers. Include all possible variations of the sentences, contractions (what’s, they’ve etc.) and shortened informal sentences.
  • Use dedicated Intents for slots: Slots in isolation are likely to have higher recognition accuracy if they have their own dedicated intents. Try to build your interaction model with isolated slots mapping to their own unique intents, instead of multiple slots mapping to the same intent.
  • Avoid overlap in utterances/slots: Ensure there is no overlap between utterances mapped to different slots i.e. avoid the same utterance being mapped to multiple utterances. Also, avoid assigning the same slot value to different slots. Use the Utterance Conflict Detection API to detect and remove any such overlaps.
  • Resolve all spelling, grammar, punctuation errors: Just like in any interface, in a VUI (voice user interface), error-free text in the interaction model with grammatically accurate phrases/sentences is absolutely necessary. It leads to a better customer experience and superior recognition.
  • Relevant Utterances and Slot Values: Ensure only relevant slot values and sample phrases are present with respect to slots and intents that you've used. Irrelevant content is going to throw off your skill’s utterance recognition and will make it error-prone. Examine and remove all irrelevant content from all intents.

3. Use Custom or Built-In Slots Wherever Relevant

If your utterances contain different words belonging to the same category in multiple utterances (for example, Virgo, Aries etc. belonging to the ‘Zodiac’ category in our example skill), it helps if you add custom slots wherever applicable in your sample list of utterances. Also, Amazon provides a wide range of built-in slots describing numbers, dates, times, phrases and list of items. These cover some of the most common use cases that may be used for skills. If you use any of these built-in slot types, you do not need to provide the slot values or sample utterances for the same as they are pre-built and provided. Both the slot types (custom and built-in) automatically reduce the number of utterances you need to provide for your skill.

For example, in our ‘Horoscope Reader’ skill, it makes sense to add a custom slot for zodiac signs – ‘{sign}’ with slot values equating to all of the 12 signs and a built-in Amazon.DATE slot in your utterances:

Alexa, Ask Horoscope Reader my horoscope for {sign} {AMAZON.DATE}.

Here, AMAZON.DATE slot has built-in values like "today", “yesterday” "tomorrow", or “august”, "july” that it can convert into a date format.

 4. Use Entity Resolution to Eliminate Slot Value Redundancy

The entity resolution feature can help improve the NLU accuracy of customer utterances that have slot values with the same meaning, which do not need to be handled differently. This is achieved by defining synonyms for those slot values. It addresses the redundancy in your slot values and makes it easy for synonyms to be handled by the same code.

For example, if your skill has a slot named {weather} and it has possible slot values to be ‘storm’, ‘hurricane’, ‘gale’, ‘squall’ etc., and you have similar code responses for all of them, you can use entity resolution by defining ‘storm’ as a canonical slot value, ‘hurricane’, ‘gale’ and ‘squall’ as synonyms, and defining ‘storm’ as the unique ID.

Once you’ve set this up, if a user says ‘gale’ in the utterance, your skill back-end will receive ‘gale’, as the customer utterance, ‘storm’ as the canonical value and ‘storm’ as the unique ID. No matter what the synonym value in the utterance, your skill can now take the unique ID and respond according to that slot value, thus reducing the redundancy in code that would otherwise be introduced by slot values with same connotation. Refer our detailed guide here for more.

5. Use the Utterance Profiler to Test Intent Mapping Accuracy  

A simple way of improving your skill’s intent mapping accuracy is to study the utterance profiler data (accessible through the Build page in the developer console, by clicking the utterance profiler button in the upper-right corner). It lets you check whether your interaction model is resolving to the right intents and slots, even before building your skill’s backend code. For the utterances that are not resolving to the right intents or slots, you can go back and iteratively update the interaction model until it is resolving correctly. You can see an example of utterance profiler use for a skill in the screenshot below.

Utterance Profiler

6. Use the NLU Evaluation Tool  

A scalable technique for batch testing the NLU accuracy of your interaction model is provided by the NLU evaluation tool. Instead of testing each utterance manually on the developer console, the tool lets you create a complete set of utterances mapped to expected intents and slots, known as the annotation set, and automate the batch testing of your skill’s interaction model using it. Results of the tests are marked to be passed or failed depending upon whether they invoked the right intents and slots. This automates your testing process and makes regression testing possible. For more details, please refer this guide. As depicted in the screenshot below, you can access the NLU evaluation Tool under the 'Build' tab on the developer console.

NLU Evaluation Tool

7. Review Intent History

The intent history feature helps you improve the resolution accuracy for your live or in development skills by letting you work backwards from actual customer interaction data for your skill. It provides the anonymized and aggregated customer utterances and slots data along with confidence levels (High, Medium, Low) with which the current run-time interaction model is resolving them to skill’s intents and slots. The tool will display daily data for any skill locale only if it has at least 10 unique users for that day and is not inclusive of all utterances but only constitutes a select sample size.

By studying this data, you can check the confidence level with which each utterance is resolved to an intent. You may either take action to change its mapping to a different intent/slot or retain the existing one. If you see frequently-used user utterance requests in the data that are currently missing in your interaction model, add them to improve accuracy. You may also identify carrier and common phrase patterns that are currently not included in your interaction model and update them.

For example, let’s say you open the intent history tab for our skill 'Horoscope Reader' and it shows “talk about my horoscope” as a frequent utterance that is currently resolved to “Fallback Intent” or is not resolved to any intent. This indicates that the phrase is currently not triggering the launch request and hence it is mapping to the Fallback intent and the skill is not working in this case. To fix this, you will map the phrases as a sample utterance for the “LaunchRequest” intent for your skill. Intent history feature is accessible on the left hand side of the developer console under the “Build” tab as depicted in the screenshot below. Please refer our detailed guide here for more details.

Intent History

8. Use Fallback Intent to Handle Out-of-Domain Requests

Despite building a robust interaction model that could cover most scenarios, at times, customers may express utterances that are out-of-domain i.e. utterances that aren't mapped to any of your intents. In such cases, your skill should still be able to gracefully handle the request and gently redirect your customers with a message that conveys what the skill can do for them and set the right expectations. For this exact purpose, we have provided the Fallback Intent that can help you take care of these unmapped utterances. Here is an example:

User: 'Alexa, open Horoscope Reader'

Alexa: 'Welcome to Horoscope Reader. What is your Sun Sign?'

User: 'What is Robert De Niro’s Sun sign? '

(This utterance isn’t mapped to any of the Horoscope Reader intents. Since Alexa cannot map this utterance to any intent, AMAZON.FallbackIntent is triggered.)

Alexa: 'The Horoscope Reader skill can't help with that, but I can tell your daily horoscope. What is your Sun sign?'

(The response gently nudges the customer to ask questions within the skill's domain.) 

For more details, please refer our detailed guide here.

9. Use Dynamic Entities to Improve Slot Recognition

Your skill may have slots that are highly dynamic in nature ('food item names' for example). With a static slot value list, voice recognition of slots with dynamic values could be poor. In such cases, you may use Dynamic Entities to modify or replace slot values at run time, to offer a personalized experience for customers. The use of dynamic entities substantially improves speech recognition by dynamically biasing the skill’s interaction model towards newly added slot values at run time. For example, you might be building a restaurant skill that let’s customers order items. Dynamic entities let customers order the 'daily specials' by passing on the current daily special slot values at run time, even though they may not have been entered in the pre-built ‘static’ model. For skills that use device location, like a hyper-local food ordering skill, different slot values for restaurant names would be served at run time, based on the device location provided. For implementation details, please refer our detailed guide here.


To sum up, continuous testing with real users, studying the skill response data through the tools described above and iteratively updating the interaction model to improve resolution accuracy is the sure fire way of addressing all NLU issues. The key lies in getting the interaction model right. Hope we have 'invoked' your curiosity enough to motivate your 'intent' of exploring this important topic further. For questions, you may reach out to me on Twitter at @omkarphatak.

Related Articles

Use Dynamic Entities to Create Personalized Voice Experiences

The new dynamic entities capability helps you personalize Alexa skill experiences by adapting your interaction model at runtime without edits, builds, or re-certification.

5 Ways to Build a Better Interaction Model for Your Alexa Skill and Improve Customer Engagement

Know the best practices you can incorporate into your skills to improve customer engagement.


How to Add FallbackIntent Handling to Your Alexa Skill

How to add the Fallback intent to ensure your skill can respond gracefully to unexpected utterances.