Major device makers are deeply invested in conversational UI—teaching their connected devices to understand, and respond to, natural human speech. At Amazon, we provide Alexa development toolkits that make it easier to build for voice-forward, context-aware products. Understanding what toolkit to use requires you to first determine what you want to build, and then learn the nuances of our various services, tools, and lexicon.
We’re starting a new series on the Alexa Blog where we dive into the components of the Alexa Voice Service (AVS). The series will cover the ins and outs of the service, from integration best practices and how-to guides for new features to conceptual user experience (UX) guidelines and frequently asked questions from our community. For this introductory post, we’ll define some common products and terms you’ll see as you invest time in building with Alexa, starting with AVS and the Alexa Skills Kit (ASK), and give you a high-level overview of our interaction model.
A lot of device makers ask what the differences are between AVS and ASK.
AVS enables you to integrate Alexa directly into your physical connected products, like a smart speaker, thermostat, or car – extending Alexa outside the Echo family of devices. Device makers can integrate with AVS without building a skill, helping to add cloud-based intelligence and voice control to their products. We define products with an AVS integration as “Alexa-enabled.” AVS also provides development kits, software development kits, and technical documentation to make the integration process faster and easier.
ASK is all about making Alexa smarter by building new capabilities, or skills, that extend Alexa’s feature set. With ASK, you can build a skill that enables Alexa to control a third-party smart home device like a light switch (Smart Home API), a connected TV or set-top box from companies like DISH (Video Skill API), or to provide services relevant to your business. For example, the Domino’s skill makes it simple to order a pizza with Alexa—just ask! Best of all, skills work on every Alexa-enabled device.
Think about it this way: You can make Alexa smarter using ASK, and you can integrate Alexa into new devices with AVS.
Let’s Start with the Basics
For the remainder of this post, we’ll focus on the AVS Interaction Model. You’ll need to understand how to establish a connection with AVS and how an interaction happens between the device and AVS.
AVS is exposed to developers via a set of APIs, categorized broadly by the kind of interaction taking place. The back and forth between the developer device and the AVS happens across a single HTTP/2 connection.
Establishing the Connection with AVS
The HTTP/2 stream’s strength and stability are what make or break magical Alexa experiences. We have detailed instructions for how to build and maintain this connection, but at a high level, there are some best practices to keep in mind:
- Build a connection for your device that’s stable enough to send ping traffic approximately every five minutes. This informs AVS that the device is still there.
- Send your audio payload in smaller pieces to avoid latency – approximately 10ms chunks at 320 bytes. Larger files can create a jarring experience with the amount of time the service takes to process them.
- Watch your connection. If it breaks for any reason, rebuild it. AVS requires an open downchannel for communication. Depending on the HTTP/2 client you consume, you may experience a timeout after a period of no activity.
Once an interaction is started between Alexa and your device, we require that you keep that interaction in focus on your client, so Alexa can properly respond as the conversation unfolds. An inconsistent connection can lead to low-quality customer experiences, where end users ask Alexa a question, and the service is unable to process or respond to the request.
With a healthy connection, your device has the ability to follow the AVS interaction model and handle the various interactions associated with common use cases. In AVS terms, these interactions are known as Directives and Events. Events are messages shuttled up to AVS notifying Alexa that something has occurred, such as a user request, and Directives are messages sent from AVS to the device to perform a specific action, such as play music.
An Alexa-enabled device will regularly encounter Events and Directives, so it’s useful to spend some time understanding the differences of each. There are a few specific requirements we have for how the client on your device handles creative unique IDs, tracks the conversation and processes the audio. You can dig in deeper in our Interaction Model overview.
Now that you know the basics of AVS – how it’s different from building a skill, the unique terms you’ll encounter during integration, and best practices for establishing a connection with the Alexa cloud – it’s time to start building!
You can get started with AVS in two ways:
Commercial developers looking to build Alexa-enabled products should use our AVS Device SDK and view available Development Kits for AVS.
Developers interested in spinning up a working Alexa prototype on a Raspberry Pi in 60 minutes or less can use our java-based sample app.
We’ll be publishing more technical deep dives in the various features, interfaces, and concepts that make up the AVS over the coming weeks. You can subscribe to monthly email updates from us by visiting our Quick Start Guide.