Amazon Alexa Voice Design Guide

How Alexa Responds

How to speak so that people can easily understand and respond

Alexa responds, informs, and asks questions in a natural and conversational way. Consider the following best practices when designing what Alexa will say to your customers.

Be brief

Users need Alexa to speak concisely without extra words. This helps them understand what Alexa is saying and feel confident about what is happening. Longer responses tend to be more difficult to follow and remember.

One-breath test

When writing what Alexa will say, read aloud what you’ve written. If you can say the words at a conversational pace with one breath, the length is probably good. If you need to take a breath, consider reducing the length.

For a response that includes successive ideas, such as steps in a task, read each idea separately. While the entire response may require more than one breath, make sure you only take breaths between and not during ideas.

Seven-foot test

Echo Show supplements the voice experience with on-screen details. Assume that the person will be seven feet away. Avoid duplicating the voice experience, and instead offer additional information within the graphical experience. Use visuals to provide feedback and enable the user to more quickly finish what he or she wants to do.

For information about implementing visual experiences on Echo Show, see how to choose the right template.

Speak and write naturally

Inspire users to say what they want naturally. Don’t prompt with a menu of options. Instead, let the user know what’s possible and guide the user toward productive input.

Make sure that Alexa speaks like a person, for example using contractions and avoiding jargon. This will help the user more easily understand Alexa and encourages the user to speak naturally in return.

Be sure to listen to how your prompts sound when spoken by Alexa. Sometimes, a written phrase doesn’t sound natural and needs to be reworded.

Do

Alexa, open Plan-a-Trip.

Let’s plan this trip. Where would you like to go?

Don’t

Let’s plan this trip. Say, “I’d like to go to a city name” to say where you’d like to go. Say, “I’ll leave on a specific date” to say when you’re leaving. Say, “I’m leaving from a city name” to say your departure city. What would you like to do?

Echo Show
Avoid simply reading what’s shown on screen, and instead have Alexa speak about the main idea and allow the user to look at the visuals for additional context or options.

Prompt with guidance for the user

Generally, end with a question before having the user respond. The question provides a cue to begin speaking and coaches the user on what to say next. End the prompt right after the question so that people don’t try to answer while Alexa is speaking. Be specific, but be ready for the user to answer in a different way or to over-answer.

Do

Let’s plan this trip. What city would you like to visit?

Don’t

Let’s plan this trip. I can help you once I know where you’re going and what you want to do there. Where would you like to go? You’ve been to Portland recently.

Next steps on Echo Show
With a screen, you can be more sparing with asking questions to keep the conversation going, and rely more on the screen to provide the next step for the user. The user can decide to touch the screen or speak to initiate the next step, for example by saying “Alexa, show more.”

Use conversation markers

When people converse, they use marker words and phrases to organize and direct topics, which help divide the conversation into more comprehensible chunks. Users of your skill will benefit from marker words and phrases, too.

Timeline markers
“First,” “halfway there,” “then,” and “finally” help set expectations about duration, sequence, and readiness for next steps. Use these words when multiple steps or significant time investment will be part of the experience. However, avoid using timeline markers for quick interactions.
Acknowledgements and feedback
“Thanks,” “got it,” “okay,” “great,” and “sure” let the user know that he or she has been understood or that information has been received.
Pointers
“This,” “that,” “here’s,” and “it” help to identify subjects that have been previously referenced or are about to be mentioned.
Transitions
“Now,” “so,” “all right,” and “next” help to introduce change when moving to a different topic.

Do

Okay, there are three steps to wash a sweater. (half-second pause)

First, turn the sweater inside out and wash it on the gentle cycle using regular detergent. (half-second pause)

Next, put the sweater in the dryer for 10 minutes at low temperature. (half-second pause)

Then, lay the sweater a flat surface to finish drying. That’s it!

Add variety

Use variety to inject a natural and less robotic feel into a conversation and make repeat interactions sound less rote or memorized, for example by randomly selecting from reasonable synonyms of the same prompt.

Introduce variety if the user will hear the same prompt frequently, for example in your opening and closing prompts. This kind of variety is a good way to add personality.

Adaptive prompts
As a person uses a skill more and more, he or she becomes increasingly comfortable and remembers what will happen. Consider making the prompts shorter and more direct, and even acknowledge the frequency of use.

Example

Alexa, tell My Tutor to play today’s lesson.

Variation 1

Okay, playing today’s lesson.

Variation 2

This is going to be fun! Enjoy today’s lesson.

Variation 3

Good luck! Today’s lesson is great!

Variation 4

Playing today’s lesson now. Have fun!

Predictability on Echo Show
On the screen, it is okay to be repetitive and predictable, and your users will thank you for it. Use consistent terminology, graphics, and labeling throughout the visual experience so that it’s easy for users to quickly scan and navigate content.

Use parallel language

Use noun and verb forms consistently, especially for items in a series. See more on lists.

Do

I can help you get a ride, tip your driver, check surge pricing, or get a copy of a receipt.

Don’t

I can help you get a ride, tipping your driver, receipts for your last ride, or surge check.

Remember what was said

Similar to conversing with a friend, users appreciate when Alexa remembers what happened recently and what was said, especially for frequent actions and static information. For example, you could be in the middle of a game, walk away for an hour or two, and pick up right where you left off.

To build this in your skill, see the documentation for session attributes, the guidance in the Node.js SDK for session attributes, and the guidance for persistence.

Familiar images on Echo Show
Consider showing an image and a prompt related to what the user was last doing with the skill. For example, you might show “Would you like to resume baking the carrot cake?” with an image of the cake.

Provide definitive choices

Organize your responses and prompts so that the user has a clear choice to make. Open-ended questions can confuse the user or cause the user to answer in ways that you’re not expecting or supporting. For example, asking “What would you like?” is too open-ended. Even something like “Would you like Brie or Gouda?” opens up a likely response of “Yes.”

Do

We have Brie or Gouda. Which would you like?

Don’t

Would you like Brie or Gouda?

Which would you like? Brie or Gouda?

Choices on Echo Show

When the user asks for a list, your skill should reply conversationally by voice and more formally in a template on the screen. Create list items that are easy to choose from by using clear item names, images, and hint text that gives the user an idea of what to say. Remember that while people can tap on a list item to make a choice, they can also say the name of the item. See intents for navigating and scrolling through lists.

To show choices from a list, you’ll often want to use a List Template.

List Template 1, vertical list: List Template 1 (vertical list)

List Template 2, horizontal list: List Template 2 (horizontal list)

Do

What kinds of cheeses are made with cow’s milk?

Variation 1

Here are a few kinds of cheeses made from cow’s milk.

Variation 2

Brie and Gorgonzola are popular. Here are few others as well.

Titles on Echo Show
In the template, make the choices clear and establish the context with a title that explains what list is being shown. Use title case, for example: “Results for ‘Cow’s Milk Cheese’”. Vary the voice response, while making sure the title on the template is precise and consistent. This helps someone who’s viewing the screen quickly understand what to expect in the list.

Use brevity, arrangement, and pacing when listing options

Lists are longer and more complex than a simple response. If you need to give between two and five options, treat each item like a simple response, and clearly set expectations for what’s about to come.

Have Alexa say something to introduce the list, for example “Here are the popular quick meals,” and have her pause briefly between items in the list. Verify that you can comfortably read each item aloud at a conversational pace with one breath.

Brevity with lists

Have Alexa read the essential content in each list item; for example, always read titles, and only read secondary text if critical to the voice response. Generally, it shouldn’t take more than 20 seconds to read the first few items in the list.

Start with reading between two and five items, and adjust based on the following:

  • How familiar the user is with the list items.
  • How long and voice-friendly the item names are.
  • The total number of elements read and displayed per item, for example Alexa might read the item name while displaying elements for image, ratings, and distance.
  • Whether the count of items sounds like enough without sounding too long.
Arranging items into lists
To improve comprehension when reading a list, try to cluster items into sets of two or three. Also, don’t try to pack everything into the list items. Allow the user to tap the item to learn more.

Do

Here are cheeses that you may like. Cheddar and Gouda, as well as Gorgonzola, Parmesan, and Brie.

Don’t

The cheeses you may like are Cheddar, Gouda, Jarlsberg, Porter Cheddar, St. Agur Blue Cheese, Gorgonzola, Brie, Gruyere, Sharp Cheddar, and Reggiano Parmesan.

Pacing with lists
Use pacing to help the listener distinguish where one list item ends and the next begins, for example:
  • Specify a comma plus a 350-ms pause using SSML after each item instead of a period or question mark. This makes the final item sound more similar to other items in the list.
  • Avoid adding an additional pause to list introductions that end with a period or question mark.
  • For lengthy list items or those that require the user to think more deeply, consider replacing the 350-ms pause with a 400-ms pause.
  • Always test the experience by listening and then adjusting until it sounds right.
Reading lists aloud

When Alexa reads only a few of the possible items in a list, provide a way for the user to tell Alexa to read more.

When you know that your customers are interested in more than the first few items, have Alexa prompt the user with a question like “Would you like to hear more cheeses?”

Echo Show: Have Alexa tell the user “Let me know if you’d like to hear more.” Then, wait for the user to jump in. This allows the user to take up to 30 seconds to review and tap items or ask for more information. If the user doesn’t take action within 30 seconds, the skill session ends.

Do

(Echo Show only) Let me know if you’d like to hear more.

Do

Would you like to hear more cheeses?

List introductions on Echo Show

When introducing a list on Echo Show, it’s best to keep it simple and provide minimal information about the content and let the user peruse the list.

Avoid instructions like “You can say the name of the cheese or tap on the item.” Such instructions are redundant and have limited value to the user because interacting with lists is common across skills. Also, avoid stating the item count unless the count is important and changes from interaction to interaction.

Additionally, don’t ask a question before presenting the list. For example, asking “Which cheese do you want? Gouda, Cheddar, Brie,…” confuses the user about when to speak, and the user may try to answer the question while Alexa reads the options. Don’t use prompts that encourage the user to barge in, for example “When you hear the option you want, just say it.” Barge-ins are also discouraged because the user has to use the wake word to interrupt Alexa’s response.

List items on Echo Show

Usually, list items don’t need ending punctuation because the text isn’t a full sentence. However, a list of sample questions that you’re providing to the user should contain question marks. For example, a list item may contain “What happened on this day in 1918?”

Lists on Echo Show

To improve the user’s ability to scan the list, be selective about which information you show, and choose a layout that helps the user browse items quickly. For items the user might skim, consider showing more items on screen by using a narrower image for each item. When the user needs to study the details of each item, for example when choosing a recipe, consider using a wider image.

List Template 2 with portrait images: List Template 2 with portrait images

List Template 2 with landscape images: List Template 2 with landscape images

Vertical lists on Echo Show

Use vertical lists for lists without images and for lists where images are not unique to the list items. Also, use vertical lists for efficiency when small images are enough for a good user experience. Vertical lists are ideal for the following types of content:

  • Lists of sample utterances (phrases the user can say in your skill)
  • Numeric information, for example prices or calorie counts
  • Lists of stock quotes
  • Bank transaction history
  • Lists of food items
  • Lists of contacts
  • Table of contents
  • Time tables

List Template 1, vertical list: List Template 1 (vertical list)

Variations on vertical lists

Text for a list item can wrap onto a second line. After the second line, text is truncated and does not extend to a third line. To specify text for the second line, use a line break or use the secondary text field. To place text in the far right column, use the tertiary text field.

List Template 1 with primary and secondary text fields: List Template 1 with primary and secondary text fields

List Template 1 with primary and tertiary text fields: List Template 1 with primary and tertiary text fields

List Template 1 with primary, secondary, and tertiary text fields: List Template 1 with primary, secondary, and tertiary text fields

Vertical list with thumbnail images

List Template 1 with thumbnail images and primary text field: List Template 1 with thumbnail images and primary text field

List Template 1 with thumbnail images and primary and secondary text fields: List Template 1 with thumbnail images and primary and secondary text fields

List Template 1 with thumbnail images and primary and tertiary text fields: List Template 1 with thumbnail images and primary and tertiary text fields

List Template 1 with thumbnail images and primary, secondary, and tertiary text fields: List Template 1 with thumbnail images and primary, secondary, and tertiary text fields

Lists with unique images on Echo Show

Use a horizontal list when you have unique images that help users recognize or choose items from the list. A horizontal list is also great for books, albums, movies, videos, destinations, unique establishments/businesses, and products.

List Template 2 with square images: List Template 2 with square images

Variations on image lists

List Template 2 can accommodate a variety of aspect ratios, and resizes your images to fit the template. The image height should be 280px and the image width should be between 192px and 498px. The template scales down images that are larger than the maximum width of 498px and the maximum height of 280px.

List Template 2 with portrait images, 192x280: List Template 2 with portrait images (192x280)

List Template 2 with square images, 280x280: List Template 2 with square images (1:1, 280x280)

List Template 2 with wide images, 372x280: List Template 2 with wide images (4:3, 372x280)

List Template 2 with landscape images, 498x280: List Template 2 with landscape images (16:9, 498x280)

Handle problems

When Alexa doesn’t hear or understand the user, use natural and gentle phrasing to help get the conversation back on track.

Re-prompt
When Alexa receives no answer from the user, use a re-prompt with a slight rewording. This is an opportunity to add detail in case the customer did not understand.

Do

Alexa, open Plan a Trip.

Where would you like to go?

(No response)

I can help you plan a trip. To start, I’ll ask questions about where and when you’re going and what you’ll do once you’re there. What city would you like to visit?

Alexa doesn’t understand
If Alexa hears but cannot process what the user said, be upfront about it and try to get the conversation back on track. Use a straightforward request that helps the user know what he or she can do next. This will help prevent the user from feeling lost.

Do

Alexa, open Plan a Trip.

Where would you like to go?

I would like to go to horse.

(Your skill’s logic detects that “horse” is not a city you support.) I didn’t quite catch that. What city would you like to visit?

Alexa “didn’t understand” versus “didn’t hear”
If Alexa says that she didn’t hear, the user might try to speak more loudly, which will not resolve the issue. Alexa heard the user and didn’t understand what was said.

Don’t

I didn’t hear you. What city would you like to visit?

Alexa understands but can’t help yet
When the user asks for an unsupported function, use some form of “I can’t help you with X yet” to inform the user that the function is not available but may be in the future. To support this, you need to implement intents for planned features. You can then track when users request a feature that you do not support yet, which may also give you insight into how to prioritize features.

Do

I’d like to rent a car.

I can’t help you with that yet. I can help you plan a trip, though. What city would you like to visit?

Error messages

While errors are uncommon, they can be a source of confusion. When possible, let the user know what the error was and avoid using technical jargon. If the error is likely to be present for only a few seconds, tell the user to try again. Otherwise, avoid encouraging the user because the user may encounter the same error. Consider a specific message like “Your smart lock isn’t responding right now.”

Cases not yet supported: Users can say anything in a voice interface, and it is important to gracefully handle errors and guide users back to the skill. For use cases that aren’t yet supported, say something like “The Trivia Mania challenge can’t help you with that yet.” When the user’s statement is unintelligible, say something like “Sorry, I didn’t get that.”

Repetition: After handling the error, prompt the user again with the most recent question that Alexa asked. Avoid telling the user you didn’t hear or didn’t understand, because this encourages users to repeat themselves more slowly or loudly instead of rewording the request.

Provide contextual help

When responding to a request for help, provide additional prompts to give more context to the immediate conversation. For example, if a user asks for help in the middle of confirming a pizza order, focus on completing the confirmation and avoid including information about selecting a topping. Design the conversation to ensure that help is not needed very often.

Do

Alexa, open Pick-Me-Up.

Would you like me to send a car to pick you up at home or work?

How do I set my address?

You change your work or home address in the Pick-Me-Up phone app. (half-second pause) If the address is already set, I can help you now. Would you like to be picked up at home or work?

Do

How do I use this skill?

Pick-Me-Up sends a car to you. You can say things like book a car, set my address, or rate my last driver. (half-second pause) We can pick you up at home or work. Which would you like?

The Node.js SDK includes an example of setting up help handlers for different states of a skill.

Help on Echo Show
In the help response, include information about the functionality of your skill and a few examples of phrases people can say. Then, have Alexa ask a question and listen for the user’s response, for example “The ABC skill can help you learn the English alphabet and practice the alphabet song. You can say things like, ‘What comes after W,’ or ‘I want to sing the alphabet song.’ So, how can I help you?” Use sentence case for sample utterances.
Body Template 1 with help phrases: Example of a list of utterances to display when the user asks for help
Hints: Some templates on Echo Show support a hint phrase at the bottom to help users quickly understand what they can do next. Populate this field wherever possible. When you ask a question and listen for the user’s response, the blue voice chrome at the bottom of the screen partially blocks the hint. Because of this, it’s best not to add hints when asking a question; only use hints when the screen is static.

Body Template 6 with hint phrase: Example of a hint phrase

Choose the right template on Echo Show

When designing your skill to work on Echo Show, pick templates based on the interaction pattern you plan to use. Each template maps to a pattern or scenario for an optimal customer experience. For each intent in your skill, choose a template that matches the response and allows appropriate voice and touch actions for selection, video control, scrolling, and navigation. The key scenarios that templates help with are:

  • Invocation/welcome to the skill
  • Lists
  • Content details
  • Full-screen images
  • Multi-turns (dialog with multiple turns, or questions/answers)
  • Clarifications
  • Help
  • Navigation
  • Close session/goodbye

Choose from the following six templates:

  • List Template 1 – Vertical list with optional thumbnail images
  • List Template 2 – Horizontal list with images and optional hint
  • Body Template 1 – Full-width text or images
  • Body Template 2 – Image right with short text on left with optional hint
  • Body Template 3 – Image left with long text on right and no hint
  • Body Template 6 – Multi-turn scenario with short text and optional hint
Consistent and easy-to-read content

Use title case for titles on templates, for example “Results for ‘Cow’s Milk Cheese’”. Also use bold, italics, and underline styling consistently across templates.

In general, use the default text size <font size = "7"> for body content because it is easiest to read from a few feet away. Additionally, don’t use all uppercase text for large blocks of text because it’s hard to read. Adjust font sizes to create visual hierarchy and use paragraph breaks to break up long blocks of text.

Full-width text and images

For long blocks of text, full-width images, and for messages where there isn’t other content to show, use Body Template 1. The template accommodate shorter text that won’t scroll, and longer text that can scroll by touch. Use this template when you’re displaying content and don’t have an accompanying image, or when you’re presenting information without asking a question.

Body Template 1 with 1-2 lines of text: Body Template 1 with 1-2 lines of text

Body Template 1 with long text that scrolls by touch: Body Template 1 with long text that scrolls by touch

Voice buttons (Actions)

Think of actions as voice buttons that need to stand alone on their own line. Don’t embed action links in the paragraph text. Ensure that the utterance exactly matches the action link so that users can say “Watch Video” as well as click the “Watch Video” action to watch a related video. Prominently display primary actions and keep these above the fold and distinct from the body copy. Common primary actions include: Watch Trailer, Showtimes, Share, Add to List, Reserve a Table, Get a Ride, and Buy Tickets. Don’t display more than three actions per template.

Body Template 3 with optional actions: Body Template 3 with optional actions

Full-width images (Body Template 1)

You can use Body Template 1 to display a full-screen image as seen below. Use an image that’s easy to view from seven feet away. This template works best with one large image. If you need to provide multiple images, use one of the other templates.

Body Template 1 with a full-width inline image: Body Template 1 with a full-width inline image

Specific entities (Body Templates 2 & 3)

Use Body Templates 2 and 3 when Alexa’s response is a specific entity (person, place, or thing) or a property of an entity that a user requested directly or chose from a list. Make sure the user can easily view at least one element from a distance, ideally the title and image. Use Body Templates 2 and 3 when:

  • The user requested a specific entity, for example a recipe, an account, a restaurant, a stock, or a driver profile.
  • The user requested a property of a specific entity, for example the phone number of a restaurant, the balance on a specific account, or the 52-week high of a specific stock.
  • The user selected an item from a list in order to learn more.
  • The user’s search yields only one item.

Body Template 2 with non-scrolling text and optional hint: Body Template 2 with non-scrolling text and optional hint

Body Template 3 with shorter text: Body Template 3 with shorter text

Images on Body Templates 1 and 2

Use images that are beautiful and add value to your skill. Match the aspect ratio and size of the slot so that the image does not appear stretched. Minimize latency by using images smaller than 100 KB. Use transparent background for images because the Echo Show background is gray, and white backgrounds look less polished.

Echo Show has dark and light mode, which changes the default background color automatically. When in dark mode, all-black images are difficult to view, and in light mode, all-white images are difficult to view. Add a contrasting outline to all-white or all-black images so that they are visible in either mode.

Content priority

Order attributes according to their importance to the user. Common attributes include: byline, rating, price, availability (in/out of stock), category/genre, address/location(s), phone number, and hours of operation.

Supplemental information on screen

Do not simply read out what is on the template. Keep the voice response conversational, natural, and brief. Use the template to provide additional detail that may not be voice friendly or as essential to hear. Make the text on the template similar, though not necessarily the same as the Text-To-Speech (TTS) output. Having the text on templates clear and succinct is the best user experience.

Variations of Body Templates 1 and 2

These two detail templates can accommodate optional actions, hints, and long text.

Body Template 2 with optional actions: Body Template 2 with optional actions

Body Template 3 with long text that scrolls by touch: Body Template 3 with long text that scrolls by touch

Body Template 3 with optional actions and long text: Body Template 3 with optional actions and long text

Multi-turn (Body Template 6)

Body Template 6 is ideal for multi-turn situations, which is a conversation with multiple turns in which Alexa asks questions and the user responds with the answers. This template can be used in a variety of circumstances: welcome, navigation, clarification, and goodbye. It’s ideal for asking a question, making a clarification, or displaying search results with zero items.

Reminder: To let users know that it is their turn to talk, ask the user a question before listening for the user’s response. Many people interact with Alexa without looking at a device, and have no visual indication that the skill is waiting for a response. Asking the user a question follows conversational norms that let the user know when to speak.

Body Template 6 with optional hint: Body Template 6 with optional hint

Multi-turn text: Use the default text size of <font size=“7”> for body text. Use sentence case for optimal readability. It’s best to display Alexa’s question verbatim unless there’s a hint.

Background images: It’s best to use an image that doesn’t have text on it already because any text added to the template will clash. Be mindful of using background images that contain large areas of white or very light colors as this will diminish readability because the text on top will be white. Blur background images lightly and apply a black (#000000) layer set at 70% opacity for optimal legibility of text on top of background images.

Also, in order to decrease latency, use images that are 500KB or smaller.

Opening the skill

A welcome prompt, for example “Welcome to Cat Facts,” reinforces that users are in a skill experience, and that they invoked the desired skill. It is also your first opportunity to establish your brand identity on Alexa.

Content display: We recommend a content-forward approach to welcoming users to your skill. In other words, immediately introduce them to content that they can engage with. For a recipe skill, a content-forward approach might be to use List Template 2 to display trending recipes. A movie ticketing skill might use the same template to show movies opening soon, or top-rated movies currently in theaters. A skill for getting to know dog breeds might open to a list of popular dog breeds or use Body Template 1 to share the Dog of the Day.

Introduction of skill functionality: You can use the spoken welcome message to suggest other intents so users know what is possible beyond interacting with what is currently on the screen. The more proactive your skill is at offering interactive content, the less pressure you put on the user to think of what to say. If you’re having a hard time deciding which content to lead with, consider the content that you feature in the hero spot on your website or the landing page of your app.

Do

Example message for “The Daily Cheese”

Welcome to The Daily Cheese. You can search by cheese firmness, wine and beer pairings, source of milk used, or say ‘Help’ for more options.

Example message for “Dog World”

Welcome to Dog World. The Dog of the Day is Fox Terrier, a playful, small breed. You can also search for breeds, watch training videos, or say ‘Help’ for more options.

Welcome message

If you don’t have a verbose message to lead with, consider using an attractive background image and minimal text, for example “Welcome to The Daily Cheese” with “Try ‘Alexa, what is today’s cheesy joke?’” beneath. Echo Show is a voice-forward device that also has a screen, so make sure that Alexa tells users what they can do rather than making users read the screen. If people need a list of sample phrases they can say, use List Template 1, because it’s optimized for lists of phrases and supports voice scrolling.

Body Template 6 used as a welcome: Body Template 6 used as a welcome

Navigation

Tell the user what the skill can do, and then support a variety of natural ways the user might access the functionality. Avoid presenting multiple questions at the end of a task, for example “Do you want to hear more?” or “OK, can I help you with anything else?” Users tend to be frustrated by having to answer questions after completing a task.

Clarification

Clarifications allow users to speak to Alexa naturally without having to provide all information at once, and without needing to know which information is required. If something is missing or ambiguous, Alexa asks questions to clarify. Keep in mind with clarifications that it’s best to ask the user a question immediately before listening for the user to respond. It can confuse the user if Alexa asks a question and continues to speak, because the user expects that Alexa is listening when she is in fact still speaking.

Do

Alexa, order a car.

From which address?

Do

Alexa, make an appointment.

For what date?

Body Template 6 with a clarification question: Body Template 6 with a clarification question

Null result: Use Body Template 6 when there are no results found. Use hints to help redirect the user to find results.

Body Template 6 showing a null result with optional hint: Body Template 6 showing a null result with optional hint

Close of the skill session: A thoughtful way to end the skill is with a goodbye message and an image. This helps users understand that the experience with the skill is now over. This template and message can be used as a response for the AMAZON.CancelIntent or the AMAZON.StopIntent.

Body Template 6 with a goodbye message: Body Template 6 with a goodbye message

Use pre-recorded audio

Consider using pre-recorded audio when it is helpful, especially if you have access to recognizable voice talent. Try out skills like The Grand Tour and The Wayne Investigation to hear examples.

Short-form audio

Audio clips that are less than 90 seconds are considered short-form audio. Short-form audio allows the skill session to remain open, which means that the user does not have to re-invoke the skill by saying Alexa again. Use short-form audio when you expect additional interaction with the user after playing the audio clip.

  • File type: .mp3
  • Specification: 16000Hz w/ bitrate (48kbps)
  • Length: 90 seconds maximum

See the SSML Reference to learn more about implementing short-form audio.

Long-form audio

If you have an audio-based skill like a podcast, you’ll be using long-form audio. Audio clips that are more than 90 seconds are considered long-form audio. When the audio starts playing, the skill closes. The user can control the audio by making requests without the invocation name, for example by saying “Alexa, next.” To interact with the skill again, the user would need to invoke the skill by saying “Alexa” and the invocation name. Use long-form audio when you expect the user interaction to consist of audio-control requests. Your skill can also add new audio files to the queue for continuous playback, such as with a playlist.

  • File types: .acc .mp4 .mp3 .hls .pls .m3u
  • Specification: Bitrates from 16kbps to 384 kbps
  • Length: No limit

Learn more about implementing long-form audio in Audio Streaming in Alexa Skills and AudioPlayer Interface Reference.

Play videos on Echo Show

If you have video content, you can now add it to your skill to enhance the experience on Echo Show. You can either immediately launch the video, display a list template with video offerings, or lead users to a video via an action link.

Ensure that the volume level in the video is at about the same level as Alexa’s talking voice. During video playback, the audio must remain synced with the video.

Video app with title of video and hint: Video app with title of video and hint