An Amazon technical expert provides three acoustic certification-related tips for AVS device makers

Felipe Pinzon Aug 16, 2023
AVS Device SDK

Amazon recently announced that there are 500 million Alexa-enabled devices sold worldwide. These devices include Amazon devices like the Echo. They also include devices made by original equipment manufacturers, original design manufacturers, and systems integrators who leverage Alexa Voice Services (AVS) to build voice recognition and processing capabilities into their smart speakers, headphones, PCs, TVs, and more. 

The world's leading OEMs offer Alexa Built-in devices that let customers talk directly with Alexa through a microphone and speaker. Alexa turns your device into a smart device, and expands its capabilities. Customers can ask Alexa for help with everyday tasks and entertainment, and control Alexa connected devices or their own device.

To bring Alexa Built-In devices to the market, device makers must first complete a crucial step: AVS testing and certification. All devices need to pass a series of tests to ensure that they meet Alexa’s functional, user experience, acoustic, music, and security standards. 

In this blog, AVS solutions architect Felipe Pinzon shares three tips to guide new and experienced AVS device makers alike to pass the acoustic certification process successfully.

Pinzon’s journey into the world of audio technology began early. Pinzon’s parents were musicians, and as a child, he found himself drawn not only to the melodies but also to the audio quality itself. This fascination led him to pursue an undergraduate degree in Electrical Engineering, where he studied the nexus of technology and audio. 

At Amazon, Pinzon’s role as a solutions architect gives him the unique opportunity to blend his technical expertise with his passion for acoustics. In his role on the AVS team, he aids developers in integrating Alexa into their devices. He is also instrumental in helping partners comprehend how acoustics and the audio path can impact the user experience. 

“With AVS, there’s a highly technical aspect of integrating the AVS software development kit into a device,” Pinzon says. “At the same time, there’s this critical audio component. You need to understand the acoustics of a typical room or apartment and what you can do with signal processing and microphones to make sure the audio is as clear as possible.”

The acoustic testing process

AVS testing and certification validate that Alexa-supported devices comply with Amazon device requirements. First, the device maker runs Amazon-provided self-tests; once a device has successfully passed all self-tests, it is ready to be tested by Amazon or an Authorized Third-Party Lab for certification. 

The quality of audio captured by an Alexa-supported device is fundamental to the entire customer experience, which is why testing sound quality is particularly vital. 

Acoustic testing aims to validate the device’s Automatic Speech Recognition performance and assess both far-field and near-field performance. A near-field scenario is one where the user is expected to be within a meter or two from the device, while a far-field scenario envisions the user being further away.

Near-field and far-field tests are conducted in various locations, which mimic real-world user-device interaction patterns. The device’s performance is evaluated under various conditions such as in silence, during audio playback from the device, or in environments with background noise. Such noise could be from music playing on another device, stationary noise such as an air conditioner, or other ambient noise.

In all of these conditions, three parameters are assessed:

False rejection rate (FRR): The FRR refers to how often a device fails to respond to a wake word. For example, when the test speaker says the wake word “Alexa,” the device should wake up. If it doesn’t, that’s a false rejection.

False acceptance rate (FAR): The FAR gauges how often a device wakes up when it’s not supposed to, such as during a conversation. For example, if people are talking near the device, but not addressing it directly, it should remain in its dormant state.

 Response accuracy rate (RAR): The RAR measures the accuracy of the device’s responses. A classic example of an error would be when a user asks, “Where is the moon?” and Alexa starts playing “The Moon Song.”

Three tips for AVS acoustic certification

To prepare for AVS acoustic certification, Pinzon recommends that makers first familiarize themselves with the testing requirements. 

“Before even starting with the device, it’s important to know every facet of the testing process,” he says. “For example, from a security perspective, they need to know what types of chips to use. Makers need to be aware of these requirements so they can design the most optimal customer experiences accordingly.”

Pinzon encourages makers to keep the following tips in mind to optimize their audio quality:

  1. Understand the device’s application and location: The device’s application and location directly impact performance. Device makers need to consider if the device will be used in the kitchen, living room, bedroom, or garage, or whether it will be used outdoors or indoors. By considering these factors, developers can design the device to optimize its acoustic characteristics for the intended environment. 

    Developers should also consider whether the device will be primarily used for playing music or audio. No matter the use case, device makers should implement Acoustic Echo Cancellation, which cancels the audio output, helping the device pick up the user’s voice and command, even when it is playing sound. 

  2. Pay attention to noise sources: When an Alexa device encounters external noise, certain techniques can prevent it from interfering with the device’s speech recognition capabilities. 

    “I always recommend using at least two or more microphones to improve the signal-to-noise ratio so that the device can cancel the external noise and capture the user’s speech,” Pinzon says. 

    He recommends using techniques like beamforming to enhance the device’s ability to capture clear speech in the presence of background noise. Beamforming is a signal processing technique for multi-microphone arrays that emphasizes user speech from a desired direction when suppressing audio interference from other directions. These algorithms result in an increase in Signal-to-Noise ratio, and a reduction in reverberation in the customer’s speech from the desired direction that improves the accuracy of speech recognition systems, especially for far-field. 

    3. Optimize for near field and far field
    : Pinzon highlights the importance of optimizing the device for both near-field and far-field interactions. He explains, “If the user is going to be close to the device, then the signal-to-noise ratio is important, which is something that will be magnified if the user is going to be farther away. In these cases, the device manufacturer needs to make sure that they have a good audio front end that will be able to clean the signal.” 

    Getting the sound pickup right is a big part of ensuring that your Alexa device delivers a great customer experience. As Pinzon points out, “It’s extremely important that you are testing for audio in a thorough way, because voice is how the whole Alexa experience starts. You say what you want and then that utterance goes to the Alexa Cloud. If the audio has good quality, Alexa will understand it, and perform her magic.”

Visit the Alexa hardware configurations and the Alexa testing and certification process pages to learn more.

Recommended Reading

Develop smart home skills for different languages
Understand Alexa Discover Response
Alexa Innovators: Talksocket