Last fall, Amazon and more than 30 leading technology companies announced the Voice Interoperability Initiative (VII), a new program with the mission to ensure voice-enabled products provide customers with choice and flexibility by supporting multiple voice agents simultaneously on a single device. Today, we’re excited to announce that Dolby, Facebook, Garmin, and Xiaomi have joined the initiative. We’re also releasing the first version of the Multi-Agent Design Guide, which provides recommendations and best practices for delivering delightful customer experiences on devices that support multiple voice agents.
VII is built around a shared belief that voice agents should work simultaneously alongside one another on a single device. It is built around four priorities:
With today’s addition of Dolby, Facebook, Garmin Xiaomi, and 38 other new members in the past year, 77 member companies now support the VII effort, including consumer electronics brands, automotive manufacturers, telecommunications operators, hardware solutions providers, and systems integrators. We welcome the newest members to the initiative and are excited about the progress we have made over the last year. We remain committed to the long-term goals of the VII and appreciate members’ support and their commitment to our shared vision.
“Portal from Facebook devices power hands-free video calling to connect people -- and has Alexa built-in to control smart home devices, use Skills, and more. By enabling multiple voice services to co-exist on Portal, we are offering more capabilities and a better experience. We hope to see this from more companies in the future,” said Micah Collins, Director of Portal Product Management at Facebook.
“Xiaomi is a world leader in connected devices that power people’s smart living. Voice is a critical interface to offer services to our customers,” said Paul Lin, Vice President of Corporate Business Development at Xiaomi Technology. “We are excited to join the Voice Interoperability initiative to work with industry leaders to build voice-enabled products that offer customers choice and flexibility. This collaboration will enable us to bring more exciting and high quality voice and AI-enabled products to everyone around the world.”
The Multi-Agent design guide, authored by Amazon and reflecting feedback from VII members, captures design recommendations that device makers can use in building products that support multiple voice agents. The guide covers three key topic areas, specifically, customer choice and agent invocation, multi-agent experiences, and privacy and security.
Customer choice is a bedrock principle of the VII. The guide recommends letting customers choose from available voice agents by enabling the use of multiple, simultaneous wake words when more than one agent is registered on a device. Further, the multi-agent experiences section of the guide addresses fundamental behaviors that agents can employ to provide engaging interactions for customers. The guide recommends that multi-agent products help customers find the agents that are available and explore their capabilities.
Beyond discovery and education, the guide provides recommendations for agent transfer and universal device commands (UDCs). Agent transfer refers to an interoperability pattern to address a scenario in which a customer makes a request that the first agent cannot directly fulfill. The first agent can summon the other agent, without sharing data or context, to assist the customer. Agent transfer allows customers to take advantage of unique capabilities and experiences of each agent. UDCs are commands and controls that a customer may use with any agent to control certain device functions, even if the agent was not used to initiate the experience, such as changing the device volume and stopping a timer.
The guide describes some of the essential building blocks to earn and maintain customer trust, including customer privacy, agent and device security, and attention state. The guide recommends that devices with multiple, simultaneously available voice agents provide transparent, easily predictable and expected behaviors and experiences to customers. For example, the guide recommends that all coexisting agents convey at least three core attention states: listening, thinking, or speaking. The guide recommends providing visual and sound cues for the three core attention states, making it easy for customers to see and understand which agents are active and when the state changes.
To get started, you can review the video below that provides a demonstration of Multi-Agent design guide recommendations on a prototype device. You can also use the design guide as a reference while designing your own products that support multiple agents.
We will continue to update the guide based on VII member feedback. If you’re not already a member, you can sign-up here to indicate your interest.