Understanding Alexa Skills
Welcome to Module 1 of the beginner workshop about how to build an engaging Alexa skill. In this module, you'll learn why you should build an Alexa skill.
Time required: 10 - 20 minutes
What you'll learn:
- Why you should build Alexa skills
- What type of Alexa skills you can create
- How an Alexa skill works
- The steps to build a skill
Why build Alexa skills?
Ease of access
Voice user interfaces (VUIs) are natural, conversational, and user-centric.
When you build a great voice experience, you allow for the many ways people express meaning and intent. VUIs are rich and flexible. Because of this, you don't build for voice the same way as you build graphical user interfaces (GUIs) for the web or mobile.
The easier a skill is to use, the more speed and efficiency it offers.
Speed and efficiency
Alexa skills make everyday tasks faster and more efficient by using voice instead of keyboards, mouse clicks, or touches.
Consider the kitchen timer. With Alexa, you can easily set a timer by saying, "Alexa, set timer for 10 minutes." Who guessed that pushing a few buttons on the microwave would become the less convenient option?
Skill monetization
You can build skills to support existing businesses or build skills as businesses. Sell physical and digital products that range from home delivery of physical pizzas to in-game delivery of virtual plant food.
What type of skill do you want to build?
The functionality you want to implement determines how your skill integrates with the Alexa service and what you need to develop. Your skill idea might fit one of the Alexa pre-built voice interaction models, or your idea might require that you design your own custom voice interaction model.
Do you want to make money selling digital content in your skill? You can sell engaging content to customers through a subscription, one-time purchase, or consumables.
For example, you build a knowledge-sharing skill that helps teach the user a process or task. You could start with free introductory content to earn the user's trust that the skill is valuable. Then you could sell access to premium content that is more sophisticated and more valuable.
Want to build a custom skill?
For custom skills, you define the voice interaction model. The custom model gives you the most flexibility and control over the skill design and code.
The following examples show how a user might interact with a custom skill:
- "Alexa, order a pizza."
- "Alexa, book a taxi." With a custom skill, you can engage the user in a game, such as word puzzles or trivia, or just about any other action you can imagine!
As the custom skill builder, you do the following things:
- Define the requests the skill can handle.
- Define the name Alexa uses to identify your skill.
- Write the code to fulfill the request.
In Module 3, you will learn how to develop a custom skill using the Alexa Skills Kit.
Want to use a pre-built model?
In the pre-build voice interaction model, the Alexa Skills Kit (ASK) defines the set of words users say to invoke a skill. For example, a user can say, "Alexa, turn on the light." or "Alexa, turn off the television." You simply define your skill to accept these predefined requests. Alexa offers a number of different pre-built skill types for you to choose.
Smart home skills
This type of skill controls smart home devices, such as cameras, lights, locks, thermostats, and smart TVs. The Smart Home Skill API gives you less control over a user's experience but simplifies development because you don't need to create the VUI yourself.
When a user invokes the skill, it takes a single request. The following examples shows things a user can say:
- "Alexa, turn on the living room lights."
- "Alexa, increase the temperature by two degrees."
- "Alexa, show the front door camera."
Flash briefing skills
Use the Flash Briefing Skill API to provide your users with news headlines and other short content, called flash briefings. The following examples show requests a user can make:
- "Alexa, give me my flash briefing."
- "Alexa, tell me the news."
As the skill developer, you define the content feeds for the requested flash briefing. These feeds can contain audio content played to the user or text content read to the user.
Video skills
Use the Video Skill API to provide video content, such as TV shows and movies for users. The following examples show requests a user can make:
- "Alexa, play Star Wars: Return of the Jedi."
- "Alexa, change the TV to channel four."
As the skill developer, you define the requests the skill can handle, such as searching for and playing video content, and how video content search results display on Alexa-enabled devices.
Music skills
Use the Music Skill API to provide audio content, such as songs, playlists, or radio stations for users. For example, a user can make the following requests:
- "Alexa, play some music."
- "Alexa, play jazz."
This API handles the words a user can say to request and control audio content. These spoken words turn into requests that are sent to your skill. Your skill handles these requests, and then responds by sending back audio content for the user on an Alexa-enabled device.
Those are just a few examples of pre-built skills that could help speed up your development.
How an Alexa skill works
The following simple workflow demonstrates how an Alexa skill works. In this example, the user invokes a simple Alexa skill called "Hello World."
- To launch the skill, the user says, "Alexa, open Hello World." A complete statement or request to Alexa by the user is called an utterance.
- The Alexa-enabled device sends the utterance to the Alexa service in the cloud. The Alexa service processes the utterance through automatic speech recognition to convert the utterance to text, and then through natural language understanding to recognize the intent of the text.
- Alexa sends a JavaScript Object Notation (JSON) request to handle the intent to the resource compute service that hosts your skill (usually an AWS Lambda function). The resource compute service acts as the back end and executes your skill code to handle the intent. In this case, the server returns, "Welcome to the Hello World skill."
The following diagram demonstrates what happens when a user interacts with an Alexa skill. It assumes that you are using AWS Lambda's serverless compute service to host your skill code.
Watch the following video to understand what happens when a user interacts with an Alexa skill.
- The user says the wake word, "Alexa."
- Alexa hears the wake word, and then listens.
- Alexa captures the audio, and then sends it to the Alexa service.
- The Alexa service uses the interaction model to figure out where to route the request.
- The Alexa service sends a JSON request to the skill's Lambda function.
- The Lambda function inspects the JSON request.
- The Lambda function determines how to respond.
- The Lambda function sends a JSON response to the Alexa service.
- The Alexa service receives the JSON response, and then converts the output text into an audio file.
- The Alexa service sends the audio file to the Alexa-enabled device.
- The Alexa-enabled device receives the audio file, and then plays the audio.
What are the steps to build a skill?
When you build a skill, you complete four major steps.
Step 1: Design
Begin by designing the voice interaction model of your skill. When you start to design, you quickly understand that designing for voice is different from designing mobile or web-based apps. You need to think about all the ways a user might interact with your voice skill. To provide a fluid and natural voice experience, have people role-play interactions with your skill. Understand how people naturally speak to Alexa, and then write down the interactions. Also, if you have a multi-modal experience, which takes advantage of a screen to combine voice and visuals, plan how both the voice-only and voice-plus-visuals experiences will work.
Step 2: Build
When you have your interaction model ready to implement, build the utterances, intents, and slots in the Alexa developer console.
The developer console saves your built interaction model in JSON format, and you can edit the model with any editing tool. After you have your JSON interaction model ready, build the compute service Lambda function in the AWS Management Console.
Select the programming language you want, the corresponding ASK software development kit (SDK), and then begin coding your skill. The ASK SDK and AWS Lambda support Node.js, Python, and Java.
You can build and host most skills for free with AWS Lambda. The service is free for the first one million calls per month. You can provision your own Lambda endpoint or use an Alexa-hosted skill. With an Alexa-hosted skill, Alexa provisions the resources to run your skill for you without the need to create an AWS account.
When the Lambda function is ready, integrate the Lambda function to your skill, and then test it in the Alexa developer console. Take modules 3, 4, 5, and 6 of this course to learn how to build your skill in the console and host your skill as an AWS Lambda function.
Step 3: Test
The Alexa developer console has a built-in Alexa simulator, which is similar to testing on an actual Alexa-enabled device.
After you test your skill with the Alexa simulator, gather user feedback to resolve issues and make improvements before submitting your skill for certification.
Repeat steps 2 and 3 until you're ready to certify and publish your skill with Amazon.
Step 4: Certify and publish
After beta testing your skill, submit it for certification. After your skill passes certification, Amazon publishes in the Alexa Skills Store for anyone to discover and use. Upon publication, you can start promoting your skill to reach more customers.
Wrap-Up
These four steps are the main development phases for building Alexa skills.
You will dive deeper into each step in subsequent modules of this workshop.
Requirements to build a skill with this workshop
You need the following items to proceed with the workshop:
- An Amazon account on the Alexa developer console. To create your Amazon account, go to the Developer Console Sign-In page, and then, under New to Amazon, click Create your Amazon Account. The console is where you build and optimize your skill.
- (Optional) An Alexa-enabled device for testing. Skills work with all Alexa-enabled devices, such as the Amazon Echo, Echo Dot, Fire TV Cube, and devices that use the Alexa Voice Service (AVS). If you don't have a device, you can use the Alexa simulator in the developer console. Through the simulator, you can hear Alexa's voice responses and see any visual responses.
In the next module, you will learn about the design process and key concepts related to building an interaction model for a custom skill.