Baseline Guidance Summary

The following is a summary of the design guidelines included in the previous sections.

  • A customer should be able to choose from available voice agents for a particular interaction. They should have the option to use multiple simultaneous wake words when more than one agent is registered on a device.
  • Multiple simultaneously registered agents should be available to customers at all times, aside from the following exceptions:
    • When one agent has been invoked and is actively streaming a customer utterance to the cloud, no other agent’s wake word should be detectable. For example, if a customer says “Agent 1, tell me about Agent 2,” Agent 2 should not be invoked.
    • An agent should not be able to invoke any other agent by distributing the wake word via TTS. For example, one agent cannot wake up another agent by speaking its wake word.
  • When an agent is in Speaking state, responding to a customer, the customer should be able to interrupt that agent’s response with any other active agent’s wake word (barge in).

  • Customers should easily be able to discover information about the primary uses, benefits, and capabilities of available agents.
  • Customers should be made aware of any multi-agent functionality supported by the device:
    • Customers should be informed of simultaneously available wake words.
    • Customers should be informed of Universal Device Command support.

  • Devices with multiple simultaneous agents should provide access to the device state information that agents need to implement Universal Device Commands, when invoked by the customer.
  • The data sent by the device to an invoked agent about on going activity states on the device should be minimal and specific to actions that UDCs allow the agent to take. For example, if a customer uses one agent to begin a timer but then invokes a second agent to stop that timer when it rings, the only information the second agent should receive about the timer is that there is stoppable sounding timer on the device (and not, for example, details about the duration of the timer or which agent originally set it).
  • Agents invoked to take action on an ongoing activity should not use the device state information provided for any other purpose than to fulfill the UDC request.

The presence and use of multiple agents should not compromise a customer’s privacy.

  • Device makers should ensure that a customer’s voice recording (or“utterance”) is sent only to the agent that the customer intends to invoke (i.e. the agent whose wake word the customer uses).
  • Devices or agents should implement an attention system (eg.LEDsorvoice chrome) to ensure customers know that an agent is collecting a voice recording.
  • Customers should be able to easily understand when any voice recording is shared between agents, and have the ability to provide consent for experiences that require sharing recordings or other types of data.
  • Each voice agent should provide customers transparency by enabling them to see and understand which voice recordings were handled by that agent. If an agent provides a voice history, customers should be able to delete it.

  • All agents and devices should convey to customers the core attention states: Listening, Thinking (when applicable), and Speaking (eg. displayed on the device, listed in Settings, or indicated in a companion app).
  • Agents should not use attention state colors and sound cues which conflict in meaning. For example, the same color should not be used as Listening for one agent and Mic Off for another.
  • It is very important for a product to convey a device’s Microphone On/Off state.

The presence and use of multiple agents should never compromise the security of the device or the customer's data.

  • A device should not store any data related to personal customer information. Any required storage of personal data should be minimized and encrypted.
  • All customer data in the cloud should be handled in a secure manner (eg.access control, automatic logging, encryption, multi-factor authentication).
  • A device should have hardware and software security capabilities that include secure boot, a trusted compute boundary, an anti-roll-back mechanism, and should support hardware-based cryptographic engines.
  • A device should implement sufficient hardening and access control techniques to limit system access to authorized users, processes, or applications.
  • A device should implement adequate authorization, authentication, and input sanitization mechanisms.
  • A device should implement a secure software update process to apply all security patches.
  • A device should implement secure transmission of data between a device and the cloud, such as use of latest TLS, certificate validation of cloud endpoints.

line-break
blue-wave_bottom