Utterances Overview (VSK Echo Show)
Utterances refer to the phrases users say to Alexa. General concepts about utterances and localization are provided here. For a reference of all utterances by Alexa interface across locales, see the Utterances Reference.
- Selecting Utterances by Locale
- Locale Groupings
- Multiple Utterances in Different Locales
- Statistical versus Deterministic Matching
- Version Support
Selecting Utterances by Locale
With the utterances below, you can select the language you want from the Utterance language selector. Although some locales aren't yet supported, the localized utterances are provided for early development for video skills in those regions. See Supported Countries for details on which locales are supported.
Note that the localized utterances also localize the "entities." Entities refer to the programs, channels, actors, and movies in that region. For example, a "play by sports team" feature will have different sports teams in English, German, and Italian. For example:
Erster. FC Heidenheim, Erster FC Heidenheim, Erster FC Kaiserslautern
Aberdeen, ac milan
This is what it means to localize the entities. Alexa language teams gather a list of entities for each locale that Alexa supports. These entities help Alexa better recognize terms that fit into these entity slots in the utterances.
If you're selecting different locales with the locale selector, you'll notice that some locales are grouped together. Even though English has the locales en-US, en-GB, en-IN, en-NZ, and en-IE, the locale selector just says "English (all locales)." In this case, there aren't significant differences in the utterances between these locales, so they're grouped together.
The grouping logic is the same for Spanish: es-ES, es-MX, and es-US don't have notable differences, so they're grouped as "Spanish (all locales)." And also with German. Both de-DE and de-AT are treated the same, so the locale selector groups them as "German (all locales)."
The only difference is with fr-FR and fr-CA. There are differences between the utterances for French-speaking France and French-speaking Canada locales, so these are presented as different options in the locale selector.
Also note that even though the locale selector presents all available locales, some locales might not be supported. Consult the Supported Countries for a list of which locales are officially supported. For example, es-ES, es-MX, and es-US are not all currently supported. es-ES is supported, es-MX is in public beta, and es-US is not supported (as of July 2020). Even so, the locale selector will just say "Spanish (all locales)" the localized utterances are provided for early development for video skills in those regions.
Multiple Utterances in Different Locales
Another note is that some locales have multiple versions of utterances, while others have just one. Ideally, only the most popular utterance for each locale should be presented. Currently, we are in the process of reducing the multiple instances to just one or two instances. For now, you can choose from among multiple utterances (if available) that might work in a locale.
Statistical versus Deterministic Matching
Some utterances use deterministic matching and others use statistic matching. It's not important to know the difference, but essentially deterministic matching is similar to hard-coding for phrases. Alexa listens for exact matches for specific phrases, such as "Alexa, stop."
In contrast, statistical matching is fuzzy matching, where Alexa picks out specific recognized terms and tries to guess the user's intent. Given the infinite permutations of phrases in natural conversation, it would be impossible to hard-code every phrase, so this is where natural language algorithms come into play to help Alexa decide on the intent of the utterance.
One aspect of the algorithm is the input device. If a user says "Play Rocketman" (which is both a song and a movie), the algorithm will take into consideration whether the user is speaking to a audio speaker (Echo) or a Fire TV and use this device to inform the intent.
The Utterances Reference also has information about versions. In your Lambda, where you respond to the Discover directive, you indicate the version you support for each interface. This version determines what utterances Alexa will send to your Lambda. If a new utterance is introduced in VSK version 3.0, but your Lambda indicates support for 2.0, Alexa won't send directives for the 3.0 utterance.