Utterances Overview (VSK Fire TV)

Utterances refer to the phrases users say to Alexa. General concepts about utterances and localization are provided here. For a reference of all utterances by Alexa interface across locales, see the Utterances Reference.

Selecting Utterances by Locale
Locale Groupings
Statistical versus Deterministic Matching
Required Versus Optional Utterances
Implicit and Explicit Contexts

Selecting Utterances by Locale

On the Utterances Reference, you can select the language you want for the utterance examples. Although some locales aren't yet supported, the localized utterances are provided for early development for video skills in those regions. See Supported Countries for details on which locales are supported.

Note that the localized utterances also localize the "entities." Entities refer to the programs, channels, actors, and movies in that region. For example, a "play by sports team" feature will have different sports teams in English, German, and Italian. For example:

English: Watch seahawks
German: Erster. FC Heidenheim, Erster FC Heidenheim, Erster FC Kaiserslautern
Italian: Aberdeen, ac milan

This is what it means to localize the entities. Alexa language teams gather a list of entities for each locale that Alexa supports. These entities help Alexa better recognize terms that fit into these entity slots in the utterances.

Locale Groupings

If you're selecting different locales with the locale selector, you'll notice that some locales are grouped together. Even though English has the locales en-US, en-GB, en-IN, en-NZ, and en-IE, the locale selector just says "English (all locales)." In this case, there aren't significant differences in the utterances between these locales, so they're grouped together.

The grouping logic is the same for Spanish. es-ES, es-MX, and es-US don't have notable differences, so they're grouped as "Spanish (all locales)." And also with German. Both de-DE and de-AT are treated the same, so the locale selector groups them as "German (all locales)."

The only difference is with fr-FR and fr-CA. There are differences between the utterances for French-speaking France and French-speaking Canada locales, so these are presented as different options in the locale selector.

Also note that even though the locale selector presents all available locales, some locales might not be supported. Consult the Supported Countries for a list of which locales are officially supported. For example, es-ES, es-MX, and es-US are not all currently supported. es-ES is supported, es-MX is in public beta, and es-US is not supported (as of July 2020). Even so, the locale selector will just say "Spanish (all locales)" the localized utterances are provided for early development for video skills in those regions.

Another note is that some locales have multiple versions of utterances, while others have just one. There was some variation in how the utterances were collected across the different locales. To gather the utterances with which Alexa is trained, localization researchers interviewed people from the locale and observed/collected the most common phrases for different scenarios. The most common phrases were then mapped into the natural language understanding for Alexa.

For locales that have multiple ways of communicating the same information, note that there isn't a popularity or most common index. If you can test your app with at least one phrase per locale, that is usually sufficient.

Statistical versus Deterministic Matching

Some utterances use deterministic matching and others use statistic matching. It's not important to know the difference, but essentially deterministic matching is similar to hard-coding for phrases. Alexa listens for exact matches for specific phrases, such as "Alexa, stop."

In contrast, statistical matching is fuzzy matching, where Alexa picks out specific recognized terms and tries to guess the user's intent. Given the infinite permutations of phrases in natural conversation, it would be impossible to hard-code every phrase, so this is where natural language algorithms come into play to help Alexa decide on the intent of the utterance.

One aspect of the algorithm is the input device. If a user says "Play Rocketman" (which is both a song and a movie), the algorithm will take into consideration whether the user is speaking to a audio speaker (Echo) or a Fire TV and use this device to inform the intent.

Tip: "Play" is an ambiguous action, as it can refer to either playing audio or playing a movie. For better results, use "watch" when referring to video content that you want to play.

Required Versus Optional Utterances

If an utterance is required for certification, the words Required for certification appear below this utterance. If the supporting utterance is optional, the word Optional appears.

Note that utterances that are Required for certification are only required if your app supports that functionality (through text input). Also, some utterances are only required if you support those directives. For example, if you don't support ChannelController, you don't need to support utterances related to ChannelController; however, if you do support ChannelController, you will need to support the utterances marked as Required for certification for ChannelController.

For convenience, the tables have a "Complete" column with a check box. If desired, you can print this page out and mark the Complete check box to indicate your support for the requirements.

Implicit and Explicit Contexts

As you test your app's handling of utterances, you need to test both implicit and explicit contexts for each utterance. Explicit utterances include the app name in the utterance, whereas implicit do not. With app-only integrations, if you're working with an app that has never been submitted to the appstore or to Live App Testing (LAT) (so the catalog isn't recognized by Alexa), explicit utterances won't work. to simulate explicit utterances, you can make your Alexa requests with your app in the foreground. See Implicit versus Explicit Utterances for details.