When my team and I teach developers how to build voice experiences, we often start by emphasizing the importance of design. We tell developers to first take the time to design their interactions before they start building and writing code. If you start with the code instead of design, you likely won’t build a rich and compelling experience that customers will want to engage with.
Now, this isn’t to say you design once, build once, and you’re done; voice design is an iterative process that takes time to refine and get right. You start by writing a script—how you imagine your customer interaction will go. Then you read the script out loud and see if, in fact, it sounds the way you imagined. And based on what you hear, you should make tweaks to your script to make sure it’s natural, conversational, and punchy enough to catch the ear.
Oren Jacob, co-founder and CEO of PullString, understands that design is what separates the great voice experiences from the rest. His company offers the voice technology platform Converse that enables creatives and developers to collaboratively design, prototype, and publish Alexa skills. I recently chatted with Oren and was delighted to learn that he shares my view on the importance of conversational design. Here are a few points from our chat that have stayed with me.
When you start designing for voice, you first have to adopt a different writing process. In web or mobile design, once you write your copy, you likely don’t need to test and iterate many times. In voice design, it’s important to test and iterate on voice design; it’s not sufficient to simply write the script and build the skill.
This is because we don’t speak the way we write, says Oren. Writing for the ear isn’t the same as writing for the eye. Anyone who has read and watched Harry Potter knows this, he says.
Listen to the interview clip »
I love Oren’s Harry Potter example because it clearly shows how designing for voice isn’t the same as designing for web or mobile. There are subtle but potent differences to consider. One is consistency. For screen experiences, you want to keep your UI consistent so that your customers can easily learn their way around and quickly get to what they need each time. Think about how you use your banking website, for example. You know exactly which menu items to click on each time to quickly get to your balance. If your bank frequently changed its menu layout, you’d find it frustrating to have to relearn the UI each time.
But in voice, variety—the opposite of consistency—is king. Think about how your conversations flow, even as you perform repeat tasks and interactions each day, says Oren. You likely don’t say the same thing in the same way every day. If you had someone who forced you to have the same conversation in the same way every day, day after day (for example, with a chatbot that tells you your bank balance), would you look forward to that interaction? Would you find it delightful and engaging or dull and tedious?
Listen to the interview clip »
We don’t speak in predefined phrases; we relish the unpredictable variety of conversation. And our UI needs to support this inclination if we are aiming to design a conversational experience.
This is one key reason why you can’t simply transcribe your mobile app and turn it into a voice experience. If you don’t reimagine the design, you’ll champion consistency over variety, which will result in a rigid and repetitive UI that forces customers to have the same conversation times over.
Another notable trait of voice is its wonderful unpredictability. When you design for the web or mobile, you define the so-called “happy path”—the path you want your customers to take—and expect that your customers will follow. There may be a few edge cases that your happy path doesn’t address, but you’re not too worried about them since you trust that your defined path will cover the major use cases. You also know that customers expect consistency and therefore are likely inclined to stay within the bounds of your path.
In voice, defining the happy path is just the start. The character and quality of your skill is reflected in how well it allows customers to take the lead and veer off the happy paths. What happens if your happy path offers three choices (“Do you want small, medium, or large?”) and the customer picks a fourth (“Gigantic or how big is large?”)? If the response is “Please say small, medium, or large,” the customer will be forced to learn your rigid UI and lose agency in the experience. If the response rolls with the customer’s new direction (“Large is about the size of a refrigerator, are you thinking more medium or large?”), it helps the customer clearly understand the decision they need to make without reverting to a menu system. Getting these non-happy path experiences right is where the deep design happens and it’s where customers grow to love or hate your experience. This is important, because when we have conversations, we don’t look for the set happy path in the way that we might when using an app or a website; we say what we want, when we want, how we want.
In this way, there are no boundaries from the customer’s perspective. The beauty of conversations is that they can take many turns in many ways and lead us to new subjects and insights. In fact, unexpected turns can yield the most delightful and memorable moments of the entire exchange. That’s why it’s crucial to expect—and plan for—the unexpected, says Oren, because how you react in these moments can define your interactions.
Listen to the interview clip »
In talking about voice and how it compares to web and mobile, Oren put it this way: “Mobile is dominantly touch. Web was dominantly keyboard and mouse. Those are different by a good degree, but those two things are way closer than they are different to voice. Where those are cousins, voice is a different family unit.”
In order to build for voice, we have to make a bigger leap that we had to when we shifted from web or mobile because we are fundamentally shifting the way people interact with technology. With voice, customers no longer bend to the technology and the rules of its UI; the technology bends to the customer. This is not only good news for customers but also good news for innovators of voice experiences. By leaning into the fundamentals of voice design, we can unlock countless new scenarios that bring imaginative ideas to life and meet our customers’ every need.
To get started with voice design, check out my guide, Situational Design: How Building for Voice Differs from Building for the Screen. Or register for our upcoming webinar to learn how to create engaging experiences that let customers speak in their own words and respond to them with individualized interactions. I can’t wait to see what you build.
Bring your big idea to life with Alexa and earn perks through our milestone-based developer promotion. US developers, publish your first Alexa skill and earn a custom Alexa developer t-shirt. Publish a skill for Alexa-enabled devices with screens and earn an Echo Spot. Publish a skill using the Gadgets Skill API and earn a 2-pack of Echo Buttons. If you're not in the US, check out our promotions in Canada, the UK, Germany, Japan, France, Australia, and India. Learn more about our promotion and start building today.