Use LLMs to improve accuracy in your Interaction Model skill

Key takeaways

You can improve your existing custom skill’s customer experience right now using a text-based large language model (LLM) of your choice. Use an LLM to improve accuracy by generating a robust set of utterances to add to your intents.

 

In this article:

line-break

Introduction & prerequitites

You can use large language models (LLMs) to improve your existing custom skill.

  1. First, you can improve your skill’s accuracy and reduce errors by generating a large set of possible customer utterances to add to your custom intents. Learn more about Intents, Utterances, and Slots, and how they affect your customers’ experiences.
  2. Second, you can quickly add large variation to your skill’s responses so the content sounds fresh and conversational.

This article will help you use an LLM to improve your skill’s accuracy by generating more utterances to add to the intents in your IM, so customers can speak more naturally to your skill and be understood.

Learn how to create your Interaction Model for your skill.

Will Built-in intents or slot types satisfy your needs? Check the Built-in Intent Library and the Slot Type Reference.

line-break

Generate a large set of utterances for your custom intents

When you create a new custom intent, you provide a name and a list of utterances that customers might say to hear the response you’ve written for that intent. You can start by just writing out the full phrases, and then identify the slots within the phrases later.

Some of the most common sources of errors and customer frustration are caused by insufficient utterances included in the Interaction Model that match to an intent because …

  1. Not enough utterances that represent natural customer instincts for speech were included; or
  2. The utterances included aren’t the ones customers will actually use (people don’t speak that way); or
  3. Excessive utterances and too much generic speech, or utterances that don’t follow the best practices, were included; or
  4. The same or similar-sounding utterances were included for more than one intent, causing Utterance Conflicts.


These IM “misses” might cause Alexa to tell the customer she can’t fulfill their request or didn’t find what they were looking for when the customer knows the skill should be capable of it, or does, in fact, have that content. That’s a Trustbuster. These misses may also cause Alexa to default to fallback responses and errors that don’t make any sense, or even ruin the experience of a competitive game. How disappointing for a customer to answer a tough trivia question correctly, only to have Alexa tell them their answer is incorrect because the skill didn’t have “<what the customer said>” as an accepted response in the Interaction Model! For example:

 

Avoid

 

Customer: Alexa, ask Seattle Super Trivia to play today’s game.

Alexa: Welcome back to Seattle Super Trivia. Get on your rain boots. Here’s the first question of the day. Seattle gets an average of how much rain per year? Is it A. About 20 inches, B. About 40 inches, or is it C. About 100 inches?

Customer: I think it’s 40.

Alexa: <fail sound fx> Ouch. Not quite. The answer was B. About 40 inches. Let’s see if the rain lets up on the next one.

Customer: Alexa, stop!

Our interaction model for this hypothetical skill didn’t support the customer answering “I think it’s 40.” Remember, customers will speak naturally to answer this question, such as saying “I think it’s…” and use partial phrases rather than the full phrases your skill (and maybe even your display) used, such as saying simply “40.”

An LLM can help with this problem. You can use it to help you generate additional utterances and better anticipate what customers might say to your skill. You can do this by using a general prompt, or with a prompt that asks for variations on some sample phrases you specify.

If we prompt an LLM for something like the following example, we can get a good starter set of utterances, or catch utterances we may have missed. You probably won’t want to use the whole list, but select the ones that sound most natural, and re-generate the list by requesting variations of those:

 

Prompt

What are the different ways a person might state the correct answer to the following trivia question: “Seattle gets an average of how much rain per year? Is it A. About 20 inches, B. About 40 inches, or is it C. About 100 inches?” Give me 100 examples using 10 words or fewer each to choose B.


Selected output

  • About [40] inches
  • Around [40] inches
  • Roughly [40]
  • It’s [40] inches
  • The answer is [40] inches
  • I think the answer’s [40]
  • I think it might be [40]
  • Is it [40]?
  • [B]
  • It’s [B]
  • [B] [40]
  • [B] [40] is the answer
  • [B] [40] I think
  • oh, I bet it’s [B] [40]

We can use a few high-quality phrases we generated above to further prompt for even more variation. For example:


Prompt

Give me 100 variations someone might say on the following statements, using 10 words or fewer. [Paste your list from the above step]


Selected output

  • Possibly 40.
  • Is 40 the answer?
  • B is my guess.
  • It's B, around 40.
  • B around 40.
  • My hunch is B 40.
  • Around 40 inches, maybe?
  • Almost 40 inches.
  • It's close, about 40.
  • I reckon 40 inches.
  • B, it's almost certainly 40.
  • B, 40, I believe.
  • B sounds like 40.

Learn more about Best Practices for Sample Utterances and Custom Slot Type Values so you can select the phrases that will yield further high-quality responses. You might want to conduct a survey or interview with multiple people to ask what they might say, so your examples included in your LLM prompt begin with the most natural-sounding responses.

Keep in mind that when Alexa is listening to the customer, the mic will stay open for their response for eight seconds. You might want to specify in your prompt a maximum number of words, as we did above.

line-break

How do you know when you have enough utterances?

Double check that you’ve considered in your variations some of these natural speech patterns:

  • Use of pleasantries such as "please tell me…" and "give me the second one please" to make a request
  • Questions about who, what, why, when, where, which, how
  • Questions about “did” “does” “do”, such as “did you …”, “does it …”, and “do they…”
  • Requests or questions about can, could, should, would, may, must, shall, such as “can you …” and “should I…”
  • Phrases that include is, was, will, am, are, were, such,
  • Phrases that include have, has, had
  • Some common variants of the above, such as: what's, whose, didn't, ain't

 

You may want to include some of these examples in your prompt. For example:


Prompt

What are 100 different ways a person might say the following phrases, using less than 10 words: please tell me, tell me about, does it, do they, can you, give me, I want, do you have


Selected output

  • What do you have for …
  • I need to know …
  • Please tell me …
  • What about …
  • Do you know …
  • Show me …
  • Let me know …
  • I wonder …
  • Can you tell me …
  • What’s there for …
  • Tell me about …

We can add the above outputs to selected phrases to add to the intents of our interaction model. Let’s say we’re creating a skill to help customers keep track of the health of their houseplants. Some things the customer might say that we’ll want to support, informed by the above activity, might include:

  • What info do you have for Monsteras?
  • I need to know how much water my cactus needs
  • Please tell me if I need to feed my plants yet
  • What about pest control?
  • Do you know anything about aphids?
  • Show me what a healthy snake plant should look like
  • Let me know how to repot a succulent
  • I wonder what the bugs are on my plant
  • Can you tell me why my plant is dying?
  • What’s there for advice on fertilizing?
  • Tell me about spider plants

Keep in mind, however, adding vastly (hundreds to thousands) more utterances to your custom skill model isn’t always better. You (a human) must still ensure there aren’t overlapping utterances in one or more intents, and review every utterance generated by the LLM before you add them. Finally, once you’ve added your utterances, one intent at a time, test your skill on a device to ensure the accuracy or expected outcome doesn’t degrade.