Voice

Writing for Alexa requires a knack for writing natural dialog, engaging the customer throughout your skill, and staying true to Alexa's personality.

Alexa's persona

Alexa's personality is friendly, upbeat, and helpful. She can handle daily tasks with ease and accuracy. She's honest about anything blocking her way, but also fun and personable, able to make small talk without being obtrusive or inappropriate.

Key elements across Alexa
There are some guiding principles used across all Alexa skill experiences. These include:

  • Speaking naturally and conversationally
  • Having variation in the responses given
  • Engaging the customer with questions
  • Staying true to Alexa's personality

Learn more about Alexa's personality.

Write for the ear, not the eye

Prompts for Alexa are heard, not read, so it's important to write them in a conversational tone. The way we speak is far less formal than the way we write. Sentence fragments, contractions, irregular comma placement, and ending sentences with a preposition are all acceptable if they sound natural in spoken dialog. Prompts that look correct when written often sound stilted and overly formal when spoken. Keep in mind that no matter how beautiful it sounds when you say your perfect prompt, it may sound off or odd in text-to-speech (TTS), so be sure to listen to prompts on your test device and plan to iterate them based on how they sound.

While you should write for the ear, in multimodal experiences the text on screen should be complimentary but not identical to the voice prompt the customer hears. Visual elements should be minimal, easily scanned and understood at a glance. You should not rely on visual elements alone in your skill.

Be informal

Alexa's tone tends towards informal. Think of the perfect personal assistant or your favorite coworker, not so much the intimate style of your closest friend. The degree of informality will be conveyed using simple, relaxed word choice with a respectful tone.

Do

Starting your car.

Getting your playlist.

I didn't quite get that.

Don't

Starting your automobile.

Acquiring your playlist.

Invalid response.

Engage customers with questions

Remember, the customer always starts the conversation with Alexa, directing her with the type of skill they want to use. Once they do, you'll need to engage the customer to help Alexa determine how to assist the customer in the skill experience. Giving the customer a question is a natural cue to speak, prompting for customer input. Either lead with the question on its own or ask it at the end of your prompt so the customer will know to immediately respond. Asking a rhetorical question, or a question in the middle of a prompt, may cause the customer to begin answering before the mic is opened or before the prompt is done playing, which will result in a recognition error. Customers tend to answer questions immediately when asked.

There are a few ways to elicit a customer response without telling the customer exactly what to say. Depending on the number of choices and how simple or complex those choices are, you may choose one of these formats when you want to elicit a response:

Yes/No questions
If the prompt choices are yes or no, simply ask the customer whether they want to do something. Do not prompt them with an additional “yes or no?”

Do

Do you want to keep shopping?

Don't

Do you want to keep shopping: Yes or no?

Either/or questions
If the options are short (one or two words each), and there are only two of them, either/or options may be acceptable.

Do

Would you like a story or a rhyme?

Don't

Would you like a super scary ghost story or a really funny nursery rhyme?

Short list of options
If there are more than two options and/or the options take several words or long phrases to describe, present the customer with their options and ask which they would like.

Do

I can tell you a story, recite a rhyme, or sing a song. Which would you like?

Don't

Do you want me to tell you a story, recite a rhyme, or sing a you a song?

Use natural prosody

Alexa TTS should mimic the prosody of natural speech, appropriately modeling inflections and intonations. This requires matching a set structure Alexa follows for questions, statements, confirmations, disambiguation, list of choices, number sequences (e.g., phone numbers, zip codes etc.). Always check your TTS prompts and adjust the wording if it sounds unnatural or robotic.

When you're testing your TTS, use Speech Synthesis Markup Language (SSML) tags to fine-tune Alexa's reading. Using SSML, you can change the emphasis as well as adjust pitch, volume, and tonal shifts. Alexa also uses phonemes, or distinct sounds in a language (for English, that would be p, b, d and t). Use phoneme tags to change the way Alexa pronounces your skill's name, company and other proper names, or some terminology specific to your skill. Keep in mind that some words that are spelled the same but pronounced differently, and the Alexa NLU doesn't always choose the correct pronunciation based on context (for example “live”). Test all dialog and insert phoneme tags to correct this as needed.

Learn more about supported SSML tags.

Do

“Alexa, ask Pizza Pro to re-order my last pizza”

One large pepperoni and olive pizza. Got it. The delivery driver can still call you at 555—867—5309 correct?

Uses SSML or natural pauses between parts of the phone number to signal correct prosody. The example above will be read with correct pauses

“Alexa, ask Class Master to schedule me a yoga class today.”

In the morning, or in the afternoon?

“In the afternoon.”

Clear formulation of an either-or question to elicit a preference toward one item. The customer is unlikely to answer with an invalid yes/no.

Don't

“Alexa, ask Pizza Pro to re-order my last pizza”

One large pepperoni and olive pizza. Got it. The delivery driver can still call you at 5558675309 correct?

Does not use SSML or natural pauses between parts of the phone number to signal correct prosody. The example above will be read as one large number.

“Alexa, ask Class Master to schedule me a yoga class today.”

In the morning? Or the afternoon?

“Yes, that.”

Unclear formulation of an either-or question to elicit a preference toward one item. The customer may answer with an invalid “yes/no” in reference to the last item listed.

Be contextually relevant

List options in order of most to least contextually relevant. Avoid giving the customer options in an order that changes the subject of the conversation, then returns to it again. Always put the most relevant option related to the action they just took in the skill as the first item in the list. This helps customers understand and verbalize their choices better without spending mental time and energy figuring out what's most relevant to them.

Do

That show plays again tomorrow at 9 PM. I can tell you when a new episode is playing, when another show is playing, or you can do something else. Which would you like?

Don't

That show plays again tomorrow at 9 PM. You can find out when another show is playing, find out when a new episode of this show is playing, or do something else. What would you like to do?

Use contractions

Skill dialogs should use contractions in most cases to sound more natural and mimic natural conversation.

Do

You’d like to hear the forecast for Seattle, right?

Okay, I’ve added it.

Don't

You would like to hear the forecast for Seattle, right?

I have added that.

Vary Alexa's responses

Because customers interact with Alexa so frequently, it's important to support a variety of responses for common or repeat interactions. This applies especially to discourse markers (words that separate statements such as “well” or “okay” for instance) and escalating error prompting (low confidence and no speech) strategies. These responses can be randomly selected to prevent Alexa from sounding robotic.

Do

Customer answers a quiz question wrong:

That's not quite right. One more try: What year was the Bill of Rights signed?

“1986”

Shoot. That wasn't it. The correct answer was 1791.

Don't

Customer answers a quiz question wrong:

That's not correct. One more try: What year was the Bill of Rights signed?”

“1986”

That's not correct. Let's move on.

Do

Which recipe?

Which recipe do you want?

Vary repetitive responses.

Don't

Which recipe?

Which recipe?

Same response every time.

Be brief

Simplify messages to their essence wherever possible. For functional or transactional intents, variety and brevity are both important. For example, “Which artist?” is the ideal prompt because it’s clear and brief. However, because some skills will use prompts like this hundreds of times, Alexa should vary her response with prompts that are still clear but a bit less terse in order to keep the experience fresh.

Avoid using long clauses to explain options. Ensure options offered aren’t labeled too similarly. If options are read as a list, they need to be distinct enough to be parsed, understood, and accurately selected by the customer. Be precise and make every word count.

  • Reduce the number of steps it takes to complete a task wherever possible.
  • Avoid stating the obvious, such as “incorrect answers earn zero points.”
  • Eliminate redundant information and words.
  • Reduce the number of times a word might be used to just once in a message. This could be the skill name, brand name, or a command. When customers hear the same word or phrase stated several times in one message, they have trouble parsing what the skill is saying and differentiating between its options.
  • Do not include explanations of global Alexa controls like help, stop, exit, repeat, etc.
  • Avoid asking the customer whether or not they want to hear options and simply present them as a question.

Do

Ready to start the game?

OK. We'll pick up right here when you come back. Bye!

There's one game left in your 'For the Dogs' Pack.

Don't

All right then, are you ready to get started on a new game?

Ok. I've saved your progress on your game so you can pick back up where you left off next time you play. Bye!

It looks like you're nearing the end of your current game pack.

Write for engagement

Alexa skills should be built to last and grow with the customer over time, which means you'll need to customize the experience for different types of use. Your skill should provide a delightful customer experience whether it's the first time a customer invokes it, or it's the 100th time they've asked it a question this month. The skill should be designed to phase out information that experienced customers will learn over time and have fresh dialog at the ready so it doesn't become tiresome or repetitive to them.

When possible, reward repeat customers by offering a more personalized experience. Successfully personalized skills give customers information they need frequently right away (for example, giving an update in a welcome message) and use what it's learned about the customer over time to become more relevant to them.

Use varying prompt phrases to avoid sounding robotic or tiring to repeat customers using your skill. Vary those prompts which will be heard the most like “Which would you like?” “What will you do?” etc. Vary not only the language itself, but the length and detail given in the message. For example, a customer doesn’t need to hear the skill name every time, but several different exit messages may be appropriate.

Consider that customers may engage with the skill several times in a row if they’re having trouble, or several times a day for high-use skills. Delivering overly verbose messages repeatedly may fatigue or frustrate customers already having a poor experience.

Components of engaging dialogs

There are several building blocks to a successful and engaging conversation that serves the customer with little friction. The following are dialog components that all skills should implement to increase the chances customers will discover and continue to use the features of your skill:

Welcome message
When a customer invokes a skill without a request ("Alexa, open [skill name]"), the skill should deliver a welcome message and then prompt the customer to respond. You will need several variations of welcome messages including one for first-time use, return welcome message, and personalized welcome messages. Immediately following the welcome message, the prompt will tell the customer broadly what they could do or ask next and opens the mic for input. It is recommended for returning customers to include a reminder about the skill's basic functions.

Do

First use:

Thanks for subscribing to ABC radio. You can ask for a live game with a team name, like Seattle Seahawks, by location, like New York, or for a league, like NFL. You can also ask me for a music station or genre. What would you like to listen to?

Return use:

Welcome back to ABC Radio. Want to keep listening to Kids Jam Radio?

Don't

First use:

Thanks for subscribing to ABC Radio. What do you want to listen to?

Return use:

Welcome back. What do you want to listen to?

Question prompts
Question prompts guide the customer through the conversation by asking questions which open the mic for their response. Alexa's responses to the customer strive to be brief, varied and natural.

Do

What would you like to listen to?

Don't

Tell me what you want to listen to.

Re-prompt
A skill should deliver a re-prompt when the customer responds to a prompt with silence or an utterance the skill doesn't support. It may need to disambiguate or elaborate on the kind of responses supported.

Error messages
Error messages tell the customer briefly what went wrong and should either explicitly or implicitly tell them how to fix it, then ask the question again.

Do

I can find a price range from $1000 to $100000. How much is your budget?

Don't

That's not an accepted value. Tell me the price range again.

Help messages
Your skill should deliver a help message when a customer either asks the skill for help or reaches an error message too many consecutive times. A “master” help message delivers high-level information about the skill and its features that includes similar information as the first-time welcome message. Contextual help messages deliver relevant content to the task or prompt at hand and are only delivered when the customer is trying to use a specific feature or is stuck on a certain task.

Follow-up prompts
A prompt delivered either at a blocking point in the conversation (the customer didn’t provide enough information to complete a request) or at the end of a conversation. It offers additional opportunity for interaction with the skill by offering more features or options to engage with the skill. The follow-up options should be relevant to the previous dialog if possible.

Do

There are a few cities named Smith. Did you mean Smith, Illinois, or Smith, Washington?

I've booked that flight for you. I can help you book a hotel or find a car next. Which would you like?

Don't

There are a few cities named Smith.

I've booked that flight for you. Is there anything else I can help you with?

Greetings and salutations
A portion of a message read before or after the body of a message that welcomes the customer, says goodbye, thanks them for using the skill, or delivers a contextual greeting. Contextual greetings can be personalized, such as a happy birthday wish, or a reward for an accomplishment during their last session with the skill

Exit message
When a customer asks the skill to stop or the conversation comes to a natural end, the skill should exit the conversation naturally and gracefully with few or no words. The exit message could also be a confirmation in some cases where data gathered during the session may be lost if the customer exits the skill.

Using other voices

While Alexa is usually the main voice narrating the interface, there are a range of other voices you can use in your skill. If you want to use a new voice for your skill, consider whether you have a good reason to depart from the standard experience - for example, a unique persona or multiple characters interacting with each other. If you don't have a good reason to depart from the norm, then using Alexa's voice will be the best choice so the customer can focus on the content and functionality of your skill and not the voice it's conveyed in.

Some skills may choose to leverage a combination of voices in their skill at once. This may include:

  • Alexa's voice
  • Recorded voice-overs
  • Amazon Polly

While voice-over is ideal for longer messages and highly branded experiences such as storytelling and gaming, most skills are too dynamic to rely entirely on recorded prompts. It is difficult, expensive and time-consuming to record voice-over prompts for every conceivable message. Dynamic event prompting is easier to handle in TTS since additional recordings won't be needed.

  • You can use additional voices to help with character dialog, narration, and other elements that will enrich the story.
  • Consider assigning Alexa a role in the skill, and determine whether and how to explain that role to customers. If you are using other voices, Alexa is always the default voice of the interface itself, such as help and error messaging.
  • Introduce Alexa explicitly to customers, such as a scorekeeper in a game when they hear the instructions. Alexa can even introduce a role for herself. For example. “As you play the game, I'll be keeping score as your referee. May the best player win!”
  • Questions should be asked consistently by one voice or “actor” to maintain customer focus.
  • Alexa should be a neutral actor in the skill, but can engage in “conversation” with the skill's other actors.

Additional resources


Back to top