In order to advance clear discussions of multi-agent product design, it is important to use a consistent shared vocabulary. The list included here is not meant to be comprehensive, but to promote a widely adopted set of standard terms.


Active Agent

Any voice agent currently capable of responding to a customer invocation.

Agent (Voice Agent)

The digital “person” that the customer interacts with through conversation (turn taking). An agent has its own brand (voice, personality), method of invocation (custom wake word, Action button), and one or more unique capabilities.

Agent Arbitration

The process of determining which voice agent will participate in an interaction.

Agent Attribution

The action of clearly ascribing a response to the Agent responsible for providing it. (See also Content Attribution.)

Agent Transfer

When customers ask something of an agent who cannot directly fulfill their request, the agent can summon a second agent to assist. No data or context is passed between agents during a transfer and the user repeats their request directly to the second agent, but doesn’t have to invoke the second agent using its wake word.

Assessed Agent Arbitration

A method of Agent Arbitration whereby a service or mechanism selects which agent will participate in an interaction based on an assessment of all relevant interaction factors.

Attention States

The stages of interaction with a customer that a voice agent can enter. Minimally comprised of the Listening, Thinking, and Speaking states, the attention states can also include states such as Do Not Disturb or Notifications Pending. (See also Attention System.)

Attention System

The combination of all visual and audible cues presented to a customer to communicate a voice agent’s attention state. The attention system is typically displayed through animated patterns of lights or colors, along with synchronized sound cues. (See also Attention States.)


Cloud AI

Server-resident infrastructure supporting an agent’s ASR, NLU, NLG, TTS, NN and ML based interactions. In a multi-agent scenario, multiple agents may use a single Cloud AI.

Content Attribution

Attribution informs the customer of the source of the information or content that they are getting from an agent. From a customer perspective, this enhances the clarity and credibility of some information. For brands, this gives them recognition for the services they are providing. Attribution can be given either verbally (“According to…”) or visually. (See also Agent Attribution.)



When the recognition system hypothesizes two or more possible resolutions to a user utterance, it may ask the user to choose between the various interpretations to decide which was meant by the user.



Group of one or more customers that have agreed to share some aspect of their agent experience



The specific action a user wishes to perform. The specific command that is derived from the range of natural language utterances users may speak to convey their intention. The capability needed to respond to specific intents may determine which agent in a multi-agent scenario will respond to the customer.


The method whereby an agent or capability is initiated. This could be a spoken wake word or button press from the customer, or a contextual event such as a timer, geofence or other circumstantial event. Each voice agent will generally have its own unique wake word or other invocation method.



The practice of prepending the response with a sound to orient a customer where they are in the experience. This can be spoken by either the sending or receiving agent (e.g. “Alexa can help with that” or “its Alexa...”). Landmarking provides attribution to the agent handling the request and clarifies for the customer which agent is handling the request.


A person may use an agent across multiple stationary locations or on-the-go.


Multi-Agent Product or Device

A product designed to support multiple voice agents.

Multiple Simultaneous Wake Word (MSWW)

When two or more wake words are able to invoke voice agents on the same device at all times.

Multi-turn Interaction

An interaction between a customer and an agent that includes more than one utterance or response. It is often used by an agent or domain to ask for additional information from the customer, or to continue an experience. It is characterized by not requiring the customer to invoke the agent beyond the initial start of the interaction.


Agent Orchestration

See Agent Arbitration.



The characteristics, or personality, of an agent including its name, wake word, voice, accent, and visual appearance. Each agents has its own persona, and a single agent may also offer a range of identifiable or selectable personas.


An umbrella term that covers invocation of an agent by means of a physical or on-screen affordance such as a button. Push-to-talk includes both tap-to-talk and hold-to-talk implementations. Different agents may be invoked by a single, “overloaded” push-to-talk affordance.


Registered Agent

A voice agent which is available on a device and authenticated with customer credentials.


Simultaneous Agents

Two or more agents that are simultaneously available on a device.


Universal Device Commands

Commands or controls recognized across a range of voice agents on a device.


Voice Agent

See Agent.


Wake Word

A phrase or a word in a purposeful human-initiated utterance that can be detected to allow an associated agent to start acting on the utterance following the wake word. Otherwise known as “named invocation.” A multi-agent device will recognize different wake words for different agents or personas.