In our last installment of our series on How Building for Voice Differs from Building for the Screen, we covered how you can make your voice-first interactions accessible. The final post in our series covers another key voice design principle, which is to make your interactions relatable to your customers.
Graphical user interfaces (GUIs) are inherently inflexible; customers have to first learn how to navigate them before they can use them to find what they need. A good GUI is one that clearly presents itself in an understandable and usable way. It presents a consistent path each time so users can habitually perform the task and achieve the same goal every time in minimal time. It is declarative and inherently non-cooperative.
A good voice-first UI is cooperative because conversations are cooperative. As linguist Paul Grice stated in his cooperative principle, participants of a conversation cooperate in order to achieve mutual conversational ends. In other words, both participants look beyond the explicit statements to explore the implicit meaning in order to help advance the conversation.
This cooperative spirit carries over to voice user interfaces. If the user says, “I’d like to get a gigantic dog,” the voice user interface (VUI) might respond with, “I have several large dog breeds in mind. Would you prefer more of a family dog or guard dog?” That response indicates implicit confirmation that “gigantic” was understood. The response also asks a forward-moving question in an attempt to help the user narrow down the choices.
Behind the scenes here, we are looking to gather values like {size} and {temperament}. The size our database or API requires is large, med, and small. But rather than saying, “You must now say large, med, or small. What size do you want?” the VUI understands that large has synonym words or phrases like “huge,” “gigantic,” “waist-high,”“as big as a pony,” “that my daughter can ride,” and so on. In this way, you can match what a person will say (“medium” or a synonym thereof) to what your API expects (“med”).
Additionally, voice-first experiences should account for situations where the user overanswers or answers a different question. For example, if the prompt is “Where would you like to go?” and the answer is “I'd like to go to Portland for some kayaking,” or even “kayaking,” the next prompt should not be, “What would you like to do there?” The user already provided “kayaking” as the activity.
In screen UIs, a wizard is a fairly common metaphor for asking users for information in a constrained and linear path. In voice, users will provide information in the way they think is most appropriate. Where you might use a wizard or a form in GUI, in voice, you’ll want to use dialog management to manage the state and flow of the conversation. In other words, instead of talking at users the same way each time whether they understand or not, voice-first UIs talk with them and cooperatively carry the interaction forward.
Explore the Amazon Alexa Voice Design Guide to learn more about the fundamental design principles to build rich and compelling voice experiences. If you’re a graphical UI designer or developer, download the guide 4 Essential Design Patterns for Building Engaging Voice-First User Interface. Or watch our on-demand webinar and recorded Twitch stream on how building for voice differs from building for the screen.
Bring your big idea to life with Alexa and earn perks through our tiered rewards system. US developers, publish a skill in June and earn an AWS IoT button. Add in-skill purchasing to any skill in June can earn an Alexa-enabled device for the car. If you're not in the US, check out our promotions in Canada, the UK, Germany, Japan, France, Australia, and India. Learn more about our promotion and start building today.