For years, professional developer Ben Ursu has built technology that brings immersive visual and augmented reality experiences to life. But Ursu was curious how he might integrate such powerful visual interfaces with another engaging technology: voice. When Amazon announced the Alexa Skills Challenge: Multimodal, he knew it was the perfect opportunity to build his first Alexa skill. By combining an engaging voice-first experience with three-dimensional graphics, Ursu created Fork On The Road, a skill which helps the user choose between multiple options. His efforts won him the Bonus Prize for Best Multimodal Kitchen Experience and a total of $8,000 cash.
“A big driver for me—besides the contest itself—was the opportunity to build a skill connecting different types of applications, features, and content,” said Ursu. “The Alexa Skills Challenge: Multimodal was the perfect opportunity to work on it.”
Working for an agency that brings innovations to the table for big brands, Ursu is used to visual technologies, but found voice intriguing. When he dug into the Alexa Presentation Language (APL), he knew he’d found the way to marry his love of visual effects with the opportunity of voice. APL allows developers to add visual and touch elements to make their skills more delightful and engaging for customers with Alexa-enabled devices with screens of different sizes and shapes, such as Echo Spot, Echo Show, and Fire TV. The multimodal challenge gave him a reason to explore and create his first Alexa skill, and it has opened up opportunities he hadn’t previously imagined with either voice or visual interfaces alone.
“Winning in the Alexa Skills Challenge showed me the tremendous opportunity of combining voice with complex visual experiences,” said Ursu. “Looking at features alone, multimodal puts you in a different category. Alexa already allows developers to do more than other voice technologies, but with APL, you have the opportunity to create voice-first skills that are visually stunning.”
Alexa Introduces a Visual Technologist to the World of Voice
Ursu has been building user experiences with interactivity and animation for more than three decades. Starting in the 1990s with website development, he graduated to building visual application interfaces, 3D experiences for the web, and—most recently—to virtual and augmented reality. As one of the creators of Spark AR Studio, Ursu has built software that lets anyone create augmented reality effects in minutes, without writing any code. Today, he’s expanding his personal experience and fostering his team’s abilities and opportunities, especially in voice skills with visual interfaces.
Ursu’s curiosity about voice user interfaces began when Amazon introduced Echo Show, its first Alexa-enabled device with a screen. When APL came along, that curiosity grew. By the time the Alexa Skills Challenge: Multimodal was announced, Ursu knew it was time to get serious. He dove into AWS, networking, and the APIs available in the Alexa Skills Kit, and was convinced that he could add Alexa development techniques to his visual effects experience.
“It was the combination of many technologies that really allowed me to make Fork On The Road the way I did,” said Ursu. “Being able to piece together several different technologies like that, and understand how the underlying AWS structure works, allowed me to flex a bit and learn new things while creating this skill.”
Family Movie Night Inspires a Winning Voice-First Multimodal Skill
Inspiration for Fork On The Road struck Ursu when trying to solve an age-old family dilemma: What should we watch on TV tonight? He asked Alexa to help by flipping a coin, but found he often needed to decide between more than two options. That’s when Ursu had the idea to make a multimodal decision-making skill.
“I'm always looking towards real-life scenarios and problems to solve,” said Ursu. “I already let artificial intelligence help me make many important decisions in my life, so this was the perfect inspiration for my first Alexa skill. That’s how Fork On The Road was born.”
Because the objective of the multimodal challenge was to create a voice-first—but not voice-only—skill, Ursu could call on his visual development skills and bring his skill to life with dynamic 3D images and animation. Employing simple design with elegant execution, Fork On The Road prompts the user to name up to four different items from which they want to choose, which the skill displays at a “crossroads” on the screen. Alexa then prompts the user to “spin the fork,” displaying a 3D image of a fork which spins until it comes to rest on one of the options, making the decision for the user.
Ursu used APL’s capabilities to incorporate multiple technologies to perform the 3D scene for the skill, making many functions work together in a cohesive visual experience that appeals to a wide audience.
“A skill like Fork On The Road appeals to a wide, growing audience,” said Ursu. “From two-year-olds to grandmothers, people of all ages and backgrounds use Alexa as part of their daily lives.”
A Visual Developer Looks to Voice to Create Even More Engaging Experiences
Fork On The Road may have been Ursu’s first experiment in developing for voice, but it certainly will not be the last. After his win in the Alexa Skills Challenge: Multimodal, he’s more excited than ever by the opportunities for voice developers. With the ability to combine voice with complex visual experiences, Ursu intends to bring these elements together again in future projects for both his clients and himself. In developing Fork On The Road, Ursu found the key to a rich multimodal experience is to develop the voice-first experience. He added the visual elements only after he had a engaging voice-first skill. He developed the visuals for a small screen first, like the Echo Show, and worked his way up in size to the Fire TV. That way, his skill can reach the broadest audience without relying on one particular Alexa device.
“I’ve always focused on visual front ends but building Fork On The Road was so interesting that now I want to build more Alexa skills,” said Ursu. “With Alexa you have the ability to reach many different people and personalize the experience with both voice and visuals. The way I see it, by coupling voice interaction with great visuals, we can build richer, more engaging experiences for our customers.”
Check out the APL resources below and get started with building your own multimodal skills today.