User Experience Guidelines for the Use of Amazon Polly Voices in Your Skills

You can use Amazon Polly voices in your skill, as described in the SSML Reference. Follow these guidelines to help ensure a good experience for your customers.

The following locales are supported for Alexa: en-US, en-GB, en-IN, en-AU, en-CA, de-DE, es-ES, it-IT, ja-JP, fr-FR.

Considerations about using Amazon Polly voices

  • Amazon Polly voices are especially useful for multi-character story and gaming skills where your skill can use different voices for characters.

  • Your skill can particularly benefit from Amazon Polly voices if your skill's content is gender-specific, such as if you want to voice an Alexa response through your skill in a male voice.

  • Use Amazon Polly voices in any scenario in which multiple voices will improve the interactivity and customer experience within your skill.

  • Apply the same voice design principles as you do when constructing a typical Alexa response. Be brief, speak and write naturally, prompt with guidance for the user, use conversation markers, and so forth. See Voice Design Guide.

  • Ensure that you test how your responses sound in the Alexa Simulator on the developer console, just as you would any other SSML audio tags.

Use the voice tag

Refer to Speech Synthesis Markup Language Reference With Amazon Polly Voices for documentation about how to add Amazon Polly voices to your skills.

  • The voice tag supports all SSML tags supported by Alexa Skills Kit, including lang, say-as, break, and prosody, except that the speechcons tag is not supported with voice.

  • Nest any other SSML tags that you use inside the voice tags, rather than the other way around. Note that voice tags can be nested with voice tags as well.

In this case, the Kendra voice speaks English, as well as foreign language names in an imperfect pronunciation.

<speak>
<voice name="Kendra">
 I am going to spell out Hello as <say-as interpret-as="spell-out">hello</say-as>. Now and then, I speak <lang xml:lang="de-DE">Deutsch</lang> and <lang xml:lang="fr-FR">français</lang> and  <lang xml:lang="es-ES">español</lang>.
</voice>
</speak>
  • You can use the voice tag to use an Amazon Polly voice to construct your entire response, or as an accompaniment to an SSML audio file.

  • Note that the voice tag values are case-sensitive, so use standard name casing, such as "Matthew".

  • Just as with standard SSML TTS, consider combining the voice tag with other SSML tags supported by Alexa to get special effects:

<speak>
<voice name="Matthew"><say-as interpret-as="digits">Can you call me at 8675309?</say-as></voice>
<voice name="Kendra">Okay, let's be mindful and take a deep breath. <break time="3s"/> Now don't we feel better? </voice>
</speak>

Use the lang tag

The lang tag can be used on its own or nested in the voice tag to control how Amazon Polly voices speak. Use the lang tag with a corresponding voice of the same language for the best results, as shown here. See lang tag.

<speak><voice name="Kendra">
 I am going to spell out Hello as <say-as interpret-as="spell-out">hello</say-as>. Now and then, I speak <voice name="Hans"><lang xml:lang="de-DE">Deutsch</lang></voice> and <voice name="Celine"><lang xml:lang="fr-FR">français</lang></voice> and  <voice name="Enrique"><lang xml:lang="es-ES">español</lang>.</voice>
</voice>
</speak>

Technical specifications

Refer to Speech Synthesis Markup Language Reference With Amazon Polly Voices for documentation about how to add Amazon Polly voices to your skills.

  • Alexa skill developers have a limit of 10,000 characters for a TTS (text to speech) response in their skill. With 10,000 characters, you can generate up to approximately 10 minutes of continuous audio stream with Amazon Polly and Alexa voices for use in the Alexa skills. However, responses should generally be brief for the best customer experience. See the one-breath test in the Voice Design Guide.

  • Optionally, adjust for acoustic differences among different Alexa and Amazon Polly voices. Developers should keep in mind that Alexa and Amazon Polly voices may vary in the pitch, rate, timbre, and volume since they are different voices. Acoustic differences among different voices can be adjusted using different SSMLs tags developers should consider using them to provide a customer experience consistent with the use cases in their Alexa skill. For example,

    • Pitch:
      <speak>I can speak in a <prosody pitch="high">higher pitched voice</prosody>, or I can speak <prosody pitch="low">in a lower pitched voice</prosody></speak>
    
    • Rate:
      <speak>I can speak <prosody rate="x-slow">really slowly</prosody>, or  I can speak <prosody rate="x-fast">really fast</prosody></speak>
    
    • Volume:
      <speak>I can also speak <prosody volume="x-loud">very loudly</prosody>, or I can speak <prosody volume="x-soft">very quietly</prosody>. </speak>
    
    • Whisper:
      <speak>I have a secret to tell you, I will whisper it to you.<amazon:effect name="whispered">'<prosody rate="x-slow"> <prosody volume="loud">I am not human.</prosody></prosody></amazon:effect>Can you believe it?</speak>
    

Best practices

  • The initial introduction of your skill must use Alexa's default, in-country voice. This guideline helps ensure that that your skill customers are clearly informed when they are interacting with a skill as opposed to Alexa directly.
  • Remember that Amazon Polly and Alexa are separate, and not all Amazon Polly features are available within Alexa, particularly some Amazon Polly SSML features. Ensure that you only use supported features in your skill.
  • Speechcons used in Alexa skills can only use Alexa's voice.
  • Alexa skills that use Amazon Polly voices must adhere to all other content policies and the Alexa Skills Kit developer contract.

Amazon Polly voices currently available to Alexa

You can use any of the supported Amazon Polly voices in your Alexa responses, for part or all of the response. Be mindful of the customer experience if you combine voices from different locales in your skill responses.