Editor’s Note: We have changed the name of the Alexa skill in our beginner tutorial from Cake Walk to Cake Time given the term’s racially insensitive history.
Are you a web developer? It may surprise you to learn that developing Alexa skills isn’t all that different than developing a web application, and can be a natural extension for your skill set. Furthermore, your Alexa skill can be a natural extension of your web application. When you build an Alexa custom skill, you can expand your reach to more than 100 million Alexa devices around the world. Although developing for voice may seem very different from developing for the web, in this blog post, I’ll demonstrate the similarities between the two paradigms so that you can get ready to start building your own Alexa skill!
You may be wondering if you need to know machine learning (ML) or natural language processing (NLP) to build an Alexa skill. The good news is that when developing an Alexa custom skill, the ML and NLP is abstracted from you by the Alexa Service, so you don’t need to have a deep understanding of these technologies. You can just focus on the creative and functional aspects of building great voice experiences. A great place to get started is by understanding the key concepts in voice design and how they relate to key web development concepts. For the rest of this post, I will make the following analogies that I hope will bridge the conceptual gap between web development and Alexa custom skill development.
Web applications | Alexa skills |
---|---|
Web browsers | Alexa devices |
HTML | Speech |
CSS | Sound effects |
All applications—web, mobile, or voice—need a client for users to interface with. For web applications, this is a web browser, which provides a graphical user interface (GUI). Users interact by using a mouse and keyboard to point and click. With Alexa skills, customers can have a conversation through an Alexa device. The skill provides a voice user interface (VUI) instead of a mouse and keyboard.
For web applications, HTML adds elements and provides structure to a screen. For Alexa skills, the elements and structure are provided by the speech that is exchanged between Alexa and the user. Specifically, the speech portion that users interact with is designed through the interaction model.
The interaction model for a custom skill consists of an intent schema that defines the requests the skill can handle and a set of sample utterances that customers can say to invoke those requests. For this blog, I’d like to specifically highlight four components:
The invocation name is how Alexa identifies a specific custom skill. When combined with the launch phrase such as “Alexa, open Cake Time” or “Alexa, launch Dynamic Heroes,” it launches Cake Time and Dynamic Heroes skills. In this way, an invocation name is similar to a domain name and saying the launch phrase is similar to entering a URL in the browser.
Utterances are what you can say to express an intention or action that you’d like Alexa to help you with. Relating this to GUIs, this would be like clicking a button to perform a certain action on a webpage.
Utterances can also include slots, which are input arguments to an intent. This is similar to entering input arguments in an HTML form and submitting it.
An intent is the fulfillment of the user’s spoken request. The intent sends data about the request to the skill’s backend. This is like a webpage making an HTTP request after an HTML form is submitted.
For more information, check out our documentation about creating a custom interaction model.
Note: In order to handle intents (a separate topic which is not covered in this blog post), the skill needs a backend that’s set up with intent handlers to process and respond to intents. This is like how web servers have routers route requests to the correct API endpoint.
In a webpage, CSS is what adds the personality and style to the page’s content. In a VUI, sound effects fill this role. You can add a variety of sound effects to your skill using the Alexa Skills Kit Sound Library, or even make your own custom sound effects. You can use Synthetic Synthesis Markup Language (SSML) tags to add sounds effects to your skill.
As you can see, developing an Alexa skill is not so different from building a web application. I hope this blog has provided a good conceptual analogy between web applications and Alexa skills. You can also view a video version of this blog post on YouTube.
As a next step, I recommend taking our beginner training course to learn how to design and build your first Alexa custom skill. We look forward to seeing what you build!