Latent Goal Discovery

Latent goal discovery enables Alexa users to complete multi-skill interactions without having to remember the name of a skill or having to repeat the same information from one skill to another. Multi-skill interactions are user interactions that require conversing with more than one skill. Latent goal discovery is available to Alexa users in English in the United States. The feature requires no additional effort to activate.

In another step toward natural sounding conversations, natural goal discovery lets Alexa infer customers’ latent goals — goals that are implicit in customer requests, but not directly expressed. For instance, if a customer asks Alexa, “How long does it take to steep tea?”, the latent goal could be setting a timer for steeping a cup of tea.

With the latent goal discovery, Alexa might answer that question, “Five minutes is a good place to start," and then follow up by asking, "Would you like me to set a timer for five minutes?” This means that Alexa can suggest multiple skills that she can invoke thereafter.

These transitions appear simple, but the Alexa service runs a number of sophisticated algorithms to detect latent goals, formulate them into actions that frequently span different skills, and surface them to users in a way that sounds fluent.

In this interaction, Alexa infers that a user who asks about the weather at the beach might be interested in other information that could be useful for planning a beach trip.

We have launched multi-skill experiences that will provide an additional channel for skills to receive net-new traffic. These experiences are expected to drive 15 percent increase in dialogs to the participating skills. In some cases we are seeing over 200 percent uplift to the participating skill.

The trigger model

In the latent goal discovery workflow, Alexa first decides whether to anticipate a latent goal at all. To determine whether to suggest a latent goal, the Alexa service uses a deep-learning-based trigger model that factors in several aspects of the dialog context. These aspects include the text of the customer’s current session with Alexa and whether the customer has engaged with Alexa’s multi-skill suggestions in the past.

If the trigger model finds the context suitable, the Alexa service suggests a skill to handle the latent goal. Those suggestions are based on relationships learned by the latent-goal discovery model. For instance, the model may have discovered that users who ask how long tea should steep frequently follow up by asking Alexa to set a timer for a certain amount of time.

How latent goal discovery works

The latent-goal discovery model analyzes multiple features of user utterances, including pointwise mutual information. The model derives this information by measuring the likelihood of an interaction pattern in a given context relative to its likelihood across all Alexa traffic. Deep-learning-based sub-modules assess additional features, such as whether the user tries to rephrase a prior command or issue a new command, or whether the direct goal and the latent goal share common entities or values. For example, the sub-modules assess the time-value required to steep tea.

Over time, the discovery model improves its predictions through active learning. Active learning identifies sample interactions that would be particularly informative during future fine-tuning.

Next, the semantic-role labeling model looks for named entities and other arguments from the current conversation, including Alexa’s own responses. Alexa's context carryover models transform those entities into a structured format that the follow-on skill can consume. The skill can consume the format even if it's a third-party skill that uses its own ontology or concept hierarchy.

Lastly, through bandit learning, in which machine-learning models track whether recommendations are helping or not, Alexa suppresses underperforming experiences automatically.

You can make other skills more visible to the discovery model by using the Name-Free Interaction Toolkit. The toolkit provides natural hooks for interactions between skills.

Related topics/blogs: Alexa Conversations and Context Carryover