Improve your audio experience

Improve your audio experience

Key takeaways

There are a number of reasons you might want to use different voices in your skill. Consider that different voices can play different roles in the experience, and for a more immersive experience, you might want to consult the Alexa Sound Library or using SSML to change the way Alexa (or another voice) will speak your dialog. 


Need quick advice?

View the Checklist for improving your skill’s speech with SSML to learn where SSML can make the greatest different in your skill.


In this article:

While you think about the kind of experiences you should create for Alexa (See, Should You Use Alexa) and begin designing the conversations the customer will have, you should also consider the following ways to improve the skill’s sound to create a more natural, engaging, and immersive experience.


Use Different Voices in Your Skill

Alexa isn’t the only voice you can use in your skill. You can add Amazon Polly voices to your characters with different ranges for male and female voices. You can also use your own voiceover recordings.

Using a different voice is a good choice if you need a variety of voices in a story or need multiple speakers to play different roles in an experience. When you use Polly voices, Alexa is always the main voice of the interface and you should not change this unless you have a good reason to do so. For more information about how to incorporate Polly voices, see Customer Experience Guidelines for the Use of Amazon Polly Voices.

For example, read how a hypothetical trivia skill might use different voices to play the roles of host and scorekeeper


Customer: Alexa, Start Seattle super trivia.

Host: Welcome to Seattle Super Trivia. It’s OK, you’ve probably never heard of it. Think you know Seattle? Think again, transplant. I’m about to school you! Are you ready to take the Seattle Super Trivia challenge?

Customer: Yeah.

Host: I can see you’re a tourist here, so hold on to your umbrella. Each day I’ll have five fresh trivia questions for you. Answer them all correct, and earn a sudden-death bonus round. I’ll give you one hint each day. Alexa will keep score for us.

Alexa: Happy to help. Want to try a practice question?

Customer: Yeah!

Host: Ok. Let’s try one first. When was the city of Seattle founded? Was it … A. 1861. B. 1902., or C. 1792?

Customer: Uh? A?

Alexa: That’s correct! you’re 1 for 1.

Host: Looks like you’ve got the hang of it … let’s start a game.

Tip: If you will be creating your own audio clips or voice-overs to use in your skill, test them on an Alexa device to ensure they play at the same volume. Customers should not have to adjust the volume on their device more than once while using your skill. Don’t play some clips at low volume, and then blast your listeners with sound effects that are too loud.


Change the way Alexa speaks with SSML

Speech Synthesis Markup Language (SSML) is a standard language that affects the synthesis of speech, or, how your TTS voice will “deliver” its lines. Alexa’s voice and Amazon Polly voices support a range of SSML tags, which you can learn about in the Speech Synthesis Markup Language (SSML) Reference.

You can use SSML to …

  • Correct pronunciation errors
  • Correct inappropriate or unwanted emphases and intonations
  • Improve clarity of the message
  • Convey more meaning in the message; correct contradicting tone and meaning
  • Improve the chance the message will be understood by a certain audience (such as speaking slower, louder, with more pauses for young children)
  • Create character, differentiation, and familiarity in the voice(s)

Read the following examples of how SSML can change the way the customer hears the dialog.

Use amazon:emotion to make Alexa more expressive.



Why did the raven get kicked out of the tavern?

Because it was

<amazon:emotion name="excited" intensity="high"> a crow</amazon:emotion> bar!


Use breaks to manage timing.



Why did the raven get kicked out of the tavern?

<break time="800ms"/>

Because it was a crow bar!


Use emphasis for important phrases.



Why did the raven get kicked out of the tavern?

<break time="800ms"/>

Because it was a <emphasis level="strong">crow bar!</emphasis>


Use prosody to adjust speaking rate, pitch, and volume.



<prosody rate="110%">How much wood would a wood chuck chuck </prosody><prosody rate="80%">if a wood chuck could chuck wood?</prosody>




How much <prosody pitch="+20%"> wood would a wood chuck chuck </prosody>if a <prosody pitch="-20%"> wood chuck could chuck wood.</prosody>




<prosody volume="x-loud"> How much wood would a wood chuck chuck </prosody> <prosody volume="x-soft">if a wood chuck could chuck wood.</prosody>


Checklist for improving your skill’s speech with SSML:

It can take a lot of time to write custom SSML throughout an entire skill dialog, but it isn’t necessary to complete SSML for all of your entire skill. The following are some time-saving steps to writing engaging SSML dialog where it makes the most impact:

▢  Finalize your skill design; you’re ready to submit for certification: SSML can be done during certification if needed; changing SSML does not require re-certification. Wait until your script is almost final to avoid re-writing your SSML.

▢  Choose a voice (or all your voices) and test them without SSML: Alexa, or a Polly voice? Many voices? What is each voice’s role? Choose backup options: Some Polly voices reproduce some SSML effects better than others. Test your skill using your chosen voice(s) and note any TTS that sounds awkward

▢  Identify high-value messages: Make a good first impression. Other high-value messages include those heard most often, those that are part of the core experience, and those that need a high comprehension rate.

▢  Prioritize: Prioritize the items from the previous two steps. You don’t necessarily need SSML for all lines of dialog. (De-prioritize low-impact, edge-case dialogs if you’re short on time.)

▢  Work on one line at a time, one effect at a time: Write and test your SSML one dialog at a time, and test effects ONE change at a time so you can learn about how each one impacts the voice.

▢  Use like SSML on like dialogs: Look for dialogs that follow a pattern, such as messages that follow the same syntax, and work on them together, using the same SSML tags across similar messages.

▢  Test your ssml on-device: When you’re done fine-tuning your SSML, test it again on an Echo device to ensure it’s working as intended, the audio is crisp, and the effects sound consistent through a full conversation.


Use the sound library

You can use the Alexa Skills Kit Sound Library to add sound effects to your skill. Select from hundreds of sounds to better tell a story, punctuate important moments, and provide a more rich experience.

You can also use speechcons – a collection of words and phrases that Alexa pronounces with special flair that SSML cannot achieve – to make dialogs more dynamic. For more information about using speechcons in your skill, see Speechcon Reference.

Read how our hypothetical trivia skill might punctuate a typical interaction using sounds from the sound library and speech cons.


Customer: Alexa, Start Seattle super trivia.

 <audio src="soundbank://soundlibrary/ui/gameshow/amzn_ui_sfx_gameshow_intro_01"/>

<audio src="soundbank://soundlibrary/gameshow/gameshow_01"/>

: Stop the presses. You – yes you – are my new Seattle Superstar. You refused to let Seattle Freeze get you down and played for a week straight. Enjoy some bonus questions today on me. Ready to start your bonus game?

Customer: Yes.

Host: Let’s get to it then. First question: When was the city of Seattle founded? Was it … A. 1861. B. 1902., or C. 1792?

Customer: Uh … I don’t know.

<audio src="soundbank://soundlibrary/ui/gameshow/amzn_ui_sfx_gameshow_negative_response_01"/>

 “Nice try, but you’re not getting out of this one. You can’t skip a question, just like you can’t skip rush hour traffic.”

Alexa: (whisper mode) By the way, I can give you one hint daily for free. Do you want to use your free daily hint?

Customer: Yeah!

Alexa: Ok. Here’s a hint. The answer doesn’t rhyme with the first line of that song about Christopher Columbus sailing the ocean blue.

<audio src="soundbank://soundlibrary/ui/gameshow/amzn_ui_sfx_gameshow_bridge_01"/>

 So. When was the city of Seattle founded? Was it … A. 1861. B. 1902., or C. 1792?

 I think the answer is A.

<audio src="soundbank://soundlibrary/ui/gameshow/amzn_ui_sfx_gameshow_positive_response_02"/>

Awesome! <Earcon> That’s correct! The City of Seattle was founded in A: 1861. It was a Wednesday. 


Use APL for Audio (APL-A) to create rich sounds

You can use APL-A to select sounds to play, and mix and sequence them in real-time. You can use APL-A to define and arrange a set of audio clips. You will build these audio clips from text-to-speech and audio files using APL components. You can learn more about APL-A and its components in the APL for Audio Reference.