Cake Walk: Build an Engaging Alexa Skill

Why build Alexa skills?

What’s so appealing about Alexa?

As voice services like Amazon Alexa gain in popularity, consumers are adopting VUIs to play games, get the latest news, and control their growing number of smart home devices.

Ease of access

VUIs are natural, conversational, and user-centric.

A great voice experience allows for the many ways people express meaning and intent. It is rich and flexible. Because of this, building for voice isn’t the same as building graphical user interfaces (GUIs) for the web or mobile.

The easier a skill is to use, the more speed and efficiency it offers.

Speed and efficiency

Alexa skills bring speed and efficiency to mundane or habitual tasks—which is why voice is poised to become ubiquitous.

Consider the kitchen timer. With Alexa, setting a timer is as easy as saying, “Alexa, set timer for 10 minutes.” Who would have guessed pushing a few buttons on the microwave would become the less convenient option?

Skill monetization

Make money selling digital content in your skill. You can sell engaging content to customers as in-skill products through a subscription, one-time purchase, or consumables.

For example, let's say you build a knowledge-sharing skill that helps teach the user a process or task. You could start with free introductory content to earn the user's trust that the skill is valuable. Then, you could sell access to premium content that is more sophisticated and valuable.

What type of skill do you want to create?

With ASK, you can create different types of skills. ASK offers four pre-built interaction models you can leverage, or you can build a completely custom skill. The pre-built interaction models include predefined requests and utterances to help you start building quickly. You can customize them to your liking.

Below are four examples of skills with pre-built interaction models.

Smart home skill

Use the Smart Home Skill API to build a smart home skill with a pre-built model. This type of skill controls smart home devices such as cameras, lights, locks, thermostats, and smart TVs. The Smart Home Skill API gives you less control over a user's experience but simplifies development because you don't need to create the VUI yourself.

Invoking the skill is also very easy. A user can make requests such as the following:

"Alexa, turn on the living room lights"

"Alexa, increase the temperature by two degrees”

"Alexa, show the front door camera”

Flash briefing skill

Use the Flash Briefing Skill API to provide your customers with news headlines and other short content. A user can make requests such as the following:

"Alexa, give me my flash briefing”

"Alexa, tell me the news”

As the skill developer, you define the content feeds for the requested flash briefing. These feeds can contain audio content played to the user or text content read to the user.

Video skills

Use the Video Skill API to provide video content such as TV shows and movies for users. A user can make requests such as the following:

"Alexa, play Manchester by the Sea”

"Alexa, change the TV to channel 4”

As the skill developer, you define the requests the skill can handle, such as searching for and playing video content, and how video content search results display on Alexa-enabled devices.

Music skills

Use the Music Skill API to provide audio content such as songs, playlists, or radio stations for users. A user can make requests such as the following:

"Alexa, play some music"

"Alexa, play jazz"

This API handles the words a user can say to request and control audio content. These spoken words turn into requests that are sent to your skill. Your skill handles these requests and responds appropriately, sending back audio content for the user on an Alexa-enabled device.

Note: Currently, music skills are supported only in the United States.

These are just a few examples of pre-built skills that could help speed up your development.

Building a Custom Skill

For custom skills, you define the interaction model. Therefore, you have flexibility and control over the skill design and code.

Here are a few examples of how a user might interact with a custom skill:

  • “Alexa, order a pizza”
  • “Alexa, book a taxi"

With a custom skill, you can engage the user in a game, such as word puzzles or trivia, or just about any other action you can imagine!

As the skill developer, you:

  • Define the requests the skill can handle
  • Define the name Alexa uses to identify your skill, called the invocation name, which you will learn more about in the next module
  • Write the code to fulfill the request

Throughout this course, you will learn how to develop a custom skill using the ASK.

How an Alexa Skill Works

The following is a simple workflow that demonstrates how Alexa works. In this example, the user invokes a simple Alexa skill called Hello World.

1. To launch the skill, the user says, "Alexa, open Hello World."

2. The Alexa-enabled device sends the utterance to the Alexa service in cloud. There, the utterance is processed via automatic speech recognition, for conversion to text, and natural language understanding to recognize the intent of the text.

3. Alexa sends a JavaScript Object Notation (JSON) request to handle the intent to an AWS Lambda function in the cloud. The Lambda function acts a backend and executes code to handle the intent. In this case, the Lambda function returns, "Welcome to the Hello World skill."

The animation below demonstrates what happens when a user interacts with an Alexa skill. It assumes you are using AWS Lambda, serverless cloud computing, to host your skill code.

  • The user says the wake world, Alexa.
  • Alexa hears the wake word and listens.
  • The Alexa service uses the interaction model to figure where to route the request.
  • A JSON request is sent to the skill's lambda function.
  • The lambda function inspects the JSON request.
  • The lambda function determines how to respond.
  • The lambda function sends a JSON response to the Alexa service.
  • The Alexa service receives the JSON response and converts the output text to an audio file.
  • The Alexa-enabled device receives and plays the audio.

Steps to build a skill

Follow these steps to build your skill with the ASK.

Step 1: Design

Begin by designing the voice interaction model of your skill. Once you start designing, you will quickly understand that designing for voice is different than designing mobile or web-based apps.  You need to think about all the different ways a user might interact with your voice skill. To provide a fluid and natural voice experience, it is important to script and then act out the different ways a user might talk to Alexa.  Also, if you have a multi-modal experience (voice and visual), you need to think of different workflows to navigate through your skill.

Step 2: Build

Once your interaction model is ready, build the utterances, intents, and slots in the Alexa developer console.

The interaction model is saved in JSON format, and you can edit the model with any edit tool. After your JSON interaction model is ready, build the backend Lambda function in the AWS Management Console.

Select the programming language of your choice and the corresponding ASK software development kit (SDK), and begin coding your skill. Lambda supports the programming languages Java, Go, PowerShell, Node.js, C#, Python, and Ruby.

You can build and host most skills for free with AWS Lambda, which is free for the first one million calls per month. Once the backend Lambda function is ready, integrate the Lambda function to your skill and test it in the Alexa developer console.

Step 3: Test

The AWS developer console has a built-in Alexa simulator, which is similar to testing on an actual Alexa-enabled device.

After testing your skill with the Alexa simulator, we recommend gathering user feedback to resolve issues and make improvements before submitting your skill for certification.

Step 4: Certification and launch

After beta testing your skill, submit it for certification. Once your skill passes certification, it will be published in the Alexa Skills Store for anyone to discover and use.

Summary

These are the fundamental steps for building Alexa skills.

You will dive deeper into each step in subsequent modules of this course.

Requirements to build a skill

The following are requirements to begin developing Alexa skills:

  • Account on the Alexa developer console
  • Internet-accessible endpoint for hosting your backend cloud-based service. Your backend skill code is usually a Lambda function. In this case, you need an account with Amazon Web Services (AWS), in addition to your Alexa developer console account. Alternatively, you can build and host an HTTPS web service. In this case, you will need a cloud hosting provider and a Secure Sockets Layer (SSL) certificate.
  • Development environment appropriate for the programming language you plan to use. Lambda natively supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby and provides a runtime API, which allows you to use any additional programming languages to author your functions.
  • Publicly accessible website to host any images, audio files, or video files used in your skill. One possible solution is Amazon Simple Storage Service (Amazon S3). If you do not have files other than a skill icon, you do not need to host any resources.
  • (Optional) Alexa-enabled device for testing. Skills work with all Alexa-enabled devices, such as the Amazon Echo, Echo Dot, Fire TV Cube, and devices that use the Alexa Voice Service (AVS). If you don't have a device, you can use the Alexa simulator in the developer console. Through the simulator, you can see the display templates for Echo Show and Echo Spot, although the display is not interactive. If your skill includes display and touch interactions, you need an Alexa-enabled device with a screen to test the skill.

In the next module, you will learn about the design process and key concepts related to building an interaction model for a custom skill.