Automatic speech recognition (ASR) is technology that converts spoken words into text. In short, it’s the first step in enabling voice technologies like Amazon Alexa to respond when we ask, “Alexa, what’s it like outside?”
With ASR, voice technology can detect spoken sounds and recognize them as words. ASR is the cornerstone of the entire voice experience, allowing computers to finally understand us through our most natural form of communication: speech.
Before ASR, our speech was simply an audio recording of peaks and valleys in a computer’s “mind.” With ASR, computers can detect patterns in audio waveforms, match them with the sounds in a given language, and ultimately identify which words we spoke. Like other forms of human-computer interaction, voice services began with only basic functionality, such as robotic call centers with a limited list of words it could understand (think: “say: yes or no”).
Today, voice services have grown by leaps and bounds. They can understand the way you speak, in certain languages, and even in your accent. It can even tell when you’re just mumbling or thinking out loud with a few ‘um’s. And most importantly, today a computer can talk back to you.
Here are three ways ASR makes it possible to interact with technology via voice:
For a conversation to feel natural, responses must happen in milliseconds. Modern voice technologies take advantage of cloud computing to break recorded audio into text that computers can act on instantly.
Languages are full of homonyms. How can computers distinguish between to, two, and too? The leading technologies of today all use some background statistics to determine which word the speaker actually intended to say.
ASR is only the first step in voice user interfaces. With additional technology layered on top of ASR, such as natural language understanding, Alexa can also understand the context of what users mean. “Four miles” might mean distance, but it might mean a reminder to buy a gift “for Miles.”
ASR has been making quiet advancements for decades. It’s been used in education to help people learn second languages, as accessibility tools for those hard of hearing, and even for hands-free computing.
Today, ASR enables us to have conversations with computers. We don’t have to learn how to use a mouse and keyboard or a touch-screen UI just to set a timer, to look up a sports score, or to call another person. We just have to speak the way we already do in everyday life.
This opens up the doors to so many other possibilities. Now that computers can understand our language, what else can we teach them to do? What other magical experiences can we build with voice? That part is still up to us.
Learn more: Alexa Design Guide
How can you start taking advantage of ASR to build for voice? You can start creating innovative voice experiences with the Alexa Skills Kit (ASK), which enables developers to leverage Amazon’s knowledge and pioneering work in the field of voice design for their Alexa skills. You don’t need to have a background in natural language understanding or speech recognition to build great voice experiences with Alexa. ASK is a collection of self-service APIs, tools, documentations, and code samples that makes it fast and easy to build for Alexa.
Start building for voice today and shape the customer experience of tomorrow.