Alexa Skills Playbook

How to Build a High Quality, Engaging Alexa Skill

Scale with your growth

One of the first things to consider is whether to host your skill in the cloud.

When you create an Alexa skill, you can store your code and resources yourself. However, if you create an Alexa-hosted skill, then Alexa stores your code and resources on Amazon Web Services (AWS) for you.

Alexa-hosted skill comes with several other benefits:

  • Quick start: you can use one of the skill templates available in the developer console.
  • Online code editor: you get access to an online code editor enabling you to build your skill end-to-end on the developer console.
  • CLI and VS Code toolkit: you can use the Alexa Skills Kit Command Line Interface (ASK CLI) or the Alexa Skills Toolkit (ASK Toolkit) for Visual Studio Code to create and manage Alexa-hosted skills.
  • Low latency: Alexa-hosted skills have replicated skill endpoints in all Alexa service regions by default, ensuring the lowest possible latency for your customers.
  • Rapid experimentation: you also get access to deployment pipelines that deploy to all Alexa regions in under 15 seconds, enabling rapid experimentation.

 

If you choose to build an Alexa-hosted skill, then you’ll ensure its scalability as your usage grows.

At Alexa Live 2021, we announced removal of the AWS Free Tier limits for skills on the service. You can learn about the extended free usage limits here.

Learn more on how to set up an Alexa-hosted skill.

Build for screens

More and more customers are choosing Alexa-enabled devices with a screen — in fact, Amazon Echo devices with visual experiences are the top-selling devices — and customers are expecting skills to include multimodal experiences. Customers want immersive experiences that use rich sounds and visuals to convey information, and to use both voice and touch for input. The Alexa Presentation Language (APL) is the technology that enables you to build these experiences.


With APL, you can create visual experiences — including animations, graphics, images, slideshows, and video — to integrate with your skill content. Customers can see and interact with these visual experiences on supported devices such as the Echo Show, Fire TV, some Fire tablets, and others.


In this video, Steven Arkonovich, Alexa Champion and creator of the Big Sky skill, explains how he implemented APL to build a visually enticing experience for his weather skill:

Visual cues can complement your voice skill in many ways. This is one more opportunity to be creative.

Resources:
APL in the technical documentation
Sample code

Ensure accuracy and quality of your skill

So what makes a “quality” skill?

A quality skill understands the intention behind a customer utterance and provides a satisfactory response. Under its hood, automatic speech recognition (ASR) understands what customers are saying, while natural language understanding (NLU) maps customer utterances to the right intents and slots.

Even when both ASR and NLU skill components work perfectly, there still might be cases of customers facing friction. For instance, a skill doesn’t support this particular user’s requests, or a skill can’t handle variations in requests, or users are simply unfamiliar with it.

We have tools to help you close such feedback loops. These are tools to build, detect, diagnose, and fix skill quality issues:

Anticipating variations of customer interaction patterns can be challenging, especially for new skills. Oftentimes we hear developers say “I don’t know if my skill has enough sample utterances to provide voice user interface (VUI) for the skill” or “When do I stop adding sample utterances to my skill?“.

To address this problem, we’ve launched AI-SURE (AI-based Sample Utterance Recommendation Engine), a self-serve recommendation tool that uses machine learning techniques to generate variations of the sample utterances provided by developers. The tool assists developers to quickly bootstrap new skills with improved VUI coverage and existing skills to identify missing variations of sample utterances.

sample utterance

With Alexa Entities, you no longer have to source, acquire, or manage your own catalogues of general knowledge information for use in your skills.

Alexa Entities makes Built-in Slot Types more useful by linking entities to Alexa’s Knowledge Graph. Knowledge is automatically updated as Alexa learns new facts — when a new movie is released, for example — reducing the effort it takes to keep skills up to date. As a result, you can build more engaging and intelligent experiences for customers.

Connections between entities can be used to create more natural dialogues, like so:

“Alexa, add Alias Grace to my reading list.”
“Got it. Have you thought about reading The Testaments, also written by Margaret Atwood in 2019?”

You can also use Knowledge to help customers disambiguate between similarly named entities:“Did you mean Anne Hathaway, born in 1556, or the actress in films such as Les Misérables?” Or you can simply make skills more fun and engaging: “Did you know that tortoises can live to 120 years old!”

Slot values in your custom intents are automatically resolved to Alexa Entity IDs for supported Built-in Slot Types. These IDs can be used to make additional calls to fetch data via our Linked Data API from within your skill’s code.

For example, suppose you have defined a custom intent CelebrityBirthdayIntent with a sample utterance “When was {celebrity} born?”, where the slot {celebrity} is assigned to the Built-in Slot Type AMAZON.Person. When a customer asks your skill “When was Beyoncé born?”, you will receive the Alexa Entity ID corresponding to Beyoncé in your intent request. This ID can then be used to fetch additional knowledge (Beyoncé’s birthday).

build through Alexa entities

Furthermore, the new Custom Pronunciation feature allows skill builders to ingest your own pronunciations into Alexa to improve the speech recognition of rare or custom words (character names, game titles, pharmaceutical drugs, etc.), which are unique to your skills. Before this feature, developers had to build additional wrappers or add incorrect spellings for entities to mitigate speech recognition issues. 

You can identify common failure cases through quality dashboards. After you’ve done that, we recommend that skill builders use Evaluation tools to constantly monitor ASR and NLU performance.

ASR Evaluation tool allows you to evaluate the ASR performance of your model. We’ve made it even simpler by launching 1-click testing frameworks that automatically create NLU and ASR test sets for batch testing. These testing frameworks learn from your interaction model, frequent utterances to your skills, and high-friction patterns identified in the quality dashboards.


1-click testing drastically reduces time to get started by over 50% and includes vast variations with 100% intent coverage. In other words, it makes your testing more comprehensive and diverse.

In order to continuously iterate on skill models, it’s pivotal to constantly keep track of flawed interaction patterns. Typically, skill builders use skill rating and reviews as a proxy to learn about incorrect skill behavior. To that end, our quality dashboard consists of Skill Performance and Customer Friction metrics.

Skill Performance provides an end-to-end view of accuracy for eligible skills with Estimated System Error, insights into Intent Confidence and endpoint health (Endpoint Latency, Endpoint Response).

If you have a high-traffic skill, you can further review the Customer Friction dashboard. It includes Friction Summary indicative of overall customer friction.

Alexa uses various implicit and explicit customer signals to identify friction utterances, similar to how humans interpret an interaction. For instance, explicit signals are:

  • the customer interrupted your music stream (“Alexa, play the song not the movie”);
  • the customer left the skill pre-maturely (“Alexa, stop it”);
  • your skill wasn’t able to respond to a customer request (“Sorry, I don’t know that”);
  • the customer had to repeat their request (“Alexa, again, what’s the weather in Seattle?”).

Each dashboard includes a frequent utterance table that points to specific utterance patterns resulting in high friction.
Learn more about these new skill metrics here.

Check out the Developer console to ensure your skill is high quality.

Create new pathways for customers to discover your skill, naturally

Sometimes, customers don’t remember a skill’s name. The Name-Free Interaction (NFI) Toolkit, a self-service method, makes it easier for your customers to open skills without having to remember and say the skill’s name. When you add NFI launch phrases and intents to your skill, Alexa learns how to dynamically route customers’ name-free utterances to your skill. This enables customers to launch your skill and intents without knowing its name. Engagement is different for every skill, but after implementing the NFI toolkit, some of the skills participating in the preview have seen significant increases in dialogs. For example:

  • Music & Audio: “Alexa, play campfire sounds
  • Games & Trivia: “Alexa, countdown to 20,” while playing games or working out
  • Education & Reference: “Alexa, how do I pronounce a e r o
  • Lifestyle: “Alexa, what’s my horoscope?

Benefit for Customers

  • Improves personalization and increases chances of repeat engagement with name-free requests.
  • Reduces friction for customers who can’t remember how to invoke a specific skill or who use incorrect invocation phrases.
  • Makes skill discovery more natural for users.

Benefit for Developers

  • Increases chances of repeat engagement with name-free requests.
  • Increases organic traffic to skills and improves revenue potential.
  • Creates relevant intents that directly serve the customers’ needs and requests.

If you want to add NFI and unlock new ways for customers to interact with your skill name-free, follow these steps to get started:

  • Navigate to the eligible skill in the Alexa Developer Console.
  • From Invocations in the left-hand menu, open Skill launch phrases and Intent launch phrases.
  • Add additional phrases that your customer might use to invoke your skill.

To learn more about implementing this functionality, check out the NFI step-by-step guide and technical documentation. If you have NFI toolkit questions or issues, please post them to the NFI Developer Forum.