In episode 19 of the Alexa Dev Chat podcast, I discussed the growing area of voice design and some major trends: conversational interface building, what this means for your brand, and how you might design new compelling game experiences for it. This is not an exhaustive list, but the ideas bubble up some important topics you should think about while developing your own Alexa skills and conversational experiences this year.
With the launch of Echo Show and Echo Spot, we introduced a new way to build skills so you can display information to customers during conversations with Alexa. With touch interfaces, screens immediately draw you to developing an interface that is close up, to be “held” and tapped on. But Echo devices with screens present a new opportunity entirely. It’s about showing ancillary information while having a conversation. Think about how many times you’ve pointed in the air and drawn a map with your fingers. This is a visual cue that adds to the conversation, but it is not the main focal point.
How can you think about designing for voice in this visual scenario, especially when many of us have a touch-first mobile mindset? One way to think about it when creating intents for your Alexa skill is to start with your user and work backwards.
For example, in a mobile app or web page you might have an okay button. The label on the button says "okay," and the function is okay. When you click the okay button, the okay function happens, and your code is executed. With voice, it doesn’t work the same way: you have that same okay function, which you would map to an intent, but there is no visual element like a button.
Instead, your customer will say something like, "okay," "ready," "yes," "next," "carry on,"—all these different utterances mean "okay," and we map it to the okay button.
The subtle difference is that in mobile or web, your goal is to figure out how to build an experience that is easy for people to learn. With a graphical user interface, a customer looks at your page and applies the natural language understanding in their head to what they are seeing on the page. They go, "I see all these different controls and buttons, and okay. I have the highest confidence that okay is the right thing for me to click right now, so I click okay."
With voice, there’s no learning curve. Your customer will simply talk naturally and have a conversation. The work falls back on you to set context for the customer and explain what kinds of things they can do, but you still want them to be able to speak freely and get the right intent. This requires developers to have a different mindset and brings into focus concepts like synonyms, entity resolution, and the newer Alexa skills tooling like the dialog models.
With visual elements, voice should always remain the primary form of interaction. And as much as possible, the user should be able to navigate through the skill strictly by voice.
Think about the types of interfaces you’ve built over the past decade. Chances are, you’ve been creating the same type of on-ramps of mobile and web visual elements: buttons, dropdowns, lists, tabs, etc. What choice have we had? Yes, some of those interfaces have created a unique way to “swipe” or “tap,” but they have all forced your customer to think and act in a certain way. For better or worse, you could argue we have all been trained just to click that “okay” button when we see it on a screen.
When it came to conversational design, however, it is about having a real conversation. Your customer can interact with you much in the same way they can with another human being without being forced into the “on-ramps” of visual elements that mobile and web have created.
Voice-driven games are a great example of how voice can enable natural conversation. In my house, for example, we have been using Echo Buttons. They have brought back the communal feelings I used to get playing board games around the table with family and friends, back when everyone would engage in natural conversation rather than look down at their phones.
Voice-driven experiences create a different kind of interaction, and they have the ability to bring people back into a more natural conversation.
If you were raised in the digital age, you may have had most of your customer interactions through technology-driven experiences via web and mobile. How often are you talking directly with your customers? Do those conversations happen outside of an email or a call center? Now is the time to take a step back and reflect on how you interact with your customers. Let's use the retail industry as an example: when you walk into a brick and mortar store, you notice how the staff greet and interact with their customers. With the web and mobile, you immediately lose direct human interaction. Voice creates many opportunities for brands to engage with customers in more delightful and personal ways.
After you’ve figured out how your brand will interact with your customers, you need to decide how you will interact with your customers via voice. For example, will you use Alexa’s voice to represent your brand, or will you use a different spokesperson? Many Alexa skills, such as those in the games category, use pre-recorded narration and content with a voice that’s tailored to the brand.
Brands also need to think about how their brand traits on web and mobile will transition to voice. The design language a brand uses within a graphical user interface will not translate to a voice user interface, so you need to think about how you will meet customers’ expectations for your brand through a conversation with Alexa.
In other words, your brand (for the first time) literally has a voice. And more and more brands are recognizing the opportunity they have to leverage voice to enable meaningful conversations with their customers. Conversational UI is creating a huge opportunity for brands to redefine how people shop, bank, travel, book hotels, everything.
Growing up I played a lot of D&D and other computer role-playing video games, and I always found them to be very different experiences. Gary Gygax, one of the creators of D&D, famously said it was never intended to be a thing you play alone, which is what computer role playing evolved into over time. It was always supposed to be played with a bunch of people, and that's really where the campaigns come to life.
If you’ve never experienced this on your own, you can check out one of my favorite shows on Twitch called Geek and Sundry. They do a great job showing all types of role-playing games in a communal setting. These experiences are great at enabling conversation. In my own observations with Echo buttons, that's where real fun comes into play. These devices enable everyone to have conversations with Alexa in the game. It’s involving a group and it's communal. Mobile and even computer role-playing games in an online multiplayer scenario can feel very solo versus communal.
Your typical game today is also written with a single type of player involved. Yes, you may be able to modify the gender or look of that character, but it doesn’t take into account the most important thing: your digital persona. We all have different personality types and view the world in different ways. Our emotions may even push us to different areas of intensity in that personality type depending on our mood. Conversational experiences, because they get to know you over time, have the ability to tailor to your personality and present information in a way you find the most enjoyable to digest. As a player, you are going through much the same narrative when it comes to existing visual gameplay across mobile, console, and PC. The gameplay is static because a visual experience requires the game developer to create all the assets on screen. Human imagination is infinite; voice-first experiences become easier to paint those pictures because you don’t have to worry about creating every graphical asset.
When you start thinking about creating conversational experiences and communal gameplay today, think about these concepts. Throw out what you know about the limitations of mobile and go back to the games you played when you were a kid, sitting around a table laughing and enjoying that experience together. That’s what voice will enable, unlike anything we have ever played together before.
Be sure to check out my Alexa Dev Chat Podcast to tune into more discussions on various aspects of Alexa, natural language understanding, voice recognition, and stories of developers like you who are building innovative solutions with voice.
Ready to start building for Alexa? Check out these resources:
-Dave (@TheDaveDev)