Speech Synthesis Markup Language (SSML) Reference

You can use Speech Synthesis Markup Language (SSML) in your output speech response to control how Alexa generates the speech. For example, you can add pauses and other speech effects.

About SSML

When the your skill returns a response to a request, you provide text that the Alexa service converts to speech. Alexa automatically handles normal punctuation, such as pausing after a period, or speaking a sentence ending in a question mark as a question.

However, sometimes you might want additional control over how Alexa generates the speech from the text in your response. For example, you might want a longer pause within the speech, or you might want Alexa to read a string of digits as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support.

SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech. The Alexa Skills Kit supports a subset of the tags defined in the SSML specification. For the list of supported tags, see Supported SSML Tags.

Use SSML in your response

To use SSML, construct your output speech with the supported SSML tags. When you send a response from your service, you must indicate that the speech is in SSML rather than plain text. If you construct the JSON response directly, provide the marked-up text in the outputSpeech property and set the type to SSML instead of PlainText. Use the ssml property instead of text for the marked-up text:

"outputSpeech": {
    "type": "SSML",
    "ssml": "<speak>This output speech uses SSML.</speak>"
}

You can use SSML with both the normal output speech response and any re-prompt included in the response.

If you use the Alexa Skills Kit SDK for Node.js or Alexa Skills Kit SDK for Java, the SDK wraps the SSML in the <speak> tag automatically.

The following example shows the SSML within <speak> tags.

In the JSON output for the SSML, either escape quotation marks within the output, or use an appropriate mix of single and double quotation marks. The following example wraps the response in double quotation marks and uses single quotation marks for attributes.

{
  "outputSpeech": {
    "type": "SSML",
    "ssml": "<speak>Here is a number <w role='amazon:VBD'>read</w> as a cardinal number: <say-as interpret-as='cardinal'>12345</say-as>. Here is a word spelled out: <say-as interpret-as='spell-out'>hello</say-as>.</speak>"
  }
}

Important: Unpronounceable Unicode characters aren't allowed in SSML.

If you use Alexa Presentation Language (APL) for audio, you can use the Speech component to render SSML. Set the content property to the SSML text, enclosed with <speak> tags. Set the contentType property to SSML.

The following example shows an APL for audio document. The first Speech component renders plain text. The second Speech component renders SSML.

For more about APL for audio, see APL for Audio Reference.

Supported SSML tags

The Alexa Skills Kit supports the following SSML tags, listed in alphabetic order.

<amazon:domain>
<amazon:effect>
<amazon:emotion>
<audio>
<break>
<emphasis>
<lang>
<p>
<phoneme>
<prosody>
<s>
<say-as>
<speak>
<sub>
<voice>
<w>

Note that the Alexa service strips out any unsupported SSML tags included in the text you provide.

Apply multiple SSML tags to the same speech

You can combine most supported tags with each other to apply multiple effects on the speech. For instance, this example uses both the <say-as> and <amazon:emotion> tags. This tells Alexa to speak the entire string in an "excited" voice, and speak the provided number as individual digits:

<speak>
    <amazon:emotion name='excited' intensity='medium'>
        Five seconds till lift off!
        <say-as interpret-as='digits'>54321</say-as>.
        Lift off!
    </amazon:emotion>
</speak>

Incompatible tags

You can't combine all tags. The following tags can't be applied to the same speech:

<amazon:domain>
- You must combine <amazon:domain name="conversational"> with the <voice> tag and the Matthew or Joanna voice. The conversational style doesn't work with other voices, and it doesn't work on its own without <voice>.
- You can combine <amazon:domain name="news"> with the <voice> tag and the Matthew, Joanna, and Lupe voices. The news style doesn't work with other voices.
- You can't combine <amazon:domain name="long-form">, <amazon:domain name="music">, or <amazon:domain name="fun"> with <voice>.
<amazon:emotion>
speechcons

Speechcons use the <say-as> tag with interpret-as set to interjection, for example: <say-as interpret-as="interjection">wow</say-as>.

You can combine <say-as> with other tags when you use other values for the interpret-as attribute. For example, you could combine <amazon:emotion> or <emphasis> with <say-as interpret-as="ordinal">1</say-as>.
<voice>
- You can combine <voice> with the <amazon:domain> tag with the restrictions noted before.
- You can't combine <voice> with any of the other tags listed here.
<emphasis>
<prosody> with the pitch attribute (for example, <prosody pitch="x-low">…</pitch>)

For example, the following examples don't work:

Invalid SSML: <voice> used within <amazon:emotion>

<speak>
    <amazon:emotion name="disappointed" intensity="medium">
        I want to tell you a secret.
        <voice name="Kendra">I am not a real human.</voice>.
        Can you believe it?
    </amazon:emotion>
</speak>

Invalid SSML: <amazon:emotion> used within <voice>

<speak>
    I want to tell you a secret.
    <voice name="Kendra">
        <amazon:emotion name="disappointed" intensity="medium">
            I am not a real human.
        </amazon:emotion>
    </voice>.
    Can you believe it?
</speak>

Incompatible voice used with conversational or news style

<speak>
    <voice name="Kendra">
        <amazon:domain name="conversational">I really didn't know how this morning was going to start. And if I had known, I think I might have just stayed in bed.</amazon:domain>
    </voice>
</speak>

Invalid: Conversational style used without <voice> tag

<speak>
    <amazon:domain name="conversational">I really didn't know how this morning was going to start. And if I had known, I think I might have just stayed in bed.</amazon:domain>
</speak>

You can use the incompatible tags in the same <speak> string, as long as they're not applied to the same text string. For example, the following combination is valid:

Copied to clipboard.

<speak>
    <amazon:emotion name="disappointed" intensity="medium">
        I want to tell you a secret.
    </amazon:emotion>
    <voice name="Kendra">
        I am not a real human.
    </voice>
    <amazon:emotion name="disappointed" intensity="high">Can you believe it?</amazon:emotion>
</speak>

In this example, the first line is spoken in a disappointed voice, the second line is spoken in the Kendra voice, and the final line uses the disappointed voice again.

amazon:domain

Note: The <amazon:domain> tag is available in the following locales: English (US), English (UK), English (CA), English (AU), German (DE), and Japanese (JP). Not all styles are available in all locales.

Applies different speaking styles to the speech. The styles are curated text-to-speech voices that use different variations of intonation, emphasis, pausing, and other techniques to match the speech to the content. For example, the news style makes Alexa's voice sound like what you might expect to hear in a TV or radio newscast, and was built primarily for customers to listen to news articles and other news-based content.

The <amazon:domain> tag takes the following required parameters:

Attribute Possible Values

Attribute	Possible Values
`name`	The name of the speaking style to apply to the speech. Available styles: `conversational` – Style voices to sound more conversational and less formal, more like how people sound when they speak to friends and family. The `conversational` style is available in English (US), Italian (IT), and Japanese (JP) skills. You can also use this style with Amazon Polly voices. For Amazon Polly, `conversational` requires the `<voice>` tag and the Matthew or Joanna voices. `long-form` – Style the speech for long-form content such as podcasts, articles, and blogs. The `long-form` style can't be used with the `<voice>` tag. The `long-form` style is available in English (US) skills. `music` – Style the speech for talking about music, video, or other multi-media content. The `music` style can't be used with the `<voice>` tag. The `music` style is available in English (US), English (CA), English (UK), and German (DE) skills. `news` – Style the speech similar to what you hear when listening to the news on the radio or television. The `news` style can be combined with the `<voice>` tag and the Matthew, Joanna, and Lupe voices. The `news` style is available in English (US) and English (AU) skills. `fun` – Style the speech to sound more friendly and animated in greetings, animation or children stories. The fun style is available in Japanese (JP) skills.

name

The name of the speaking style to apply to the speech. Available styles:

conversational – Style voices to sound more conversational and less formal, more like how people sound when they speak to friends and family. The conversational style is available in English (US), Italian (IT), and Japanese (JP) skills. You can also use this style with Amazon Polly voices. For Amazon Polly, conversational requires the <voice> tag and the Matthew or Joanna voices.
long-form – Style the speech for long-form content such as podcasts, articles, and blogs. The long-form style can't be used with the <voice> tag. The long-form style is available in English (US) skills.
music – Style the speech for talking about music, video, or other multi-media content. The music style can't be used with the <voice> tag. The music style is available in English (US), English (CA), English (UK), and German (DE) skills.
news – Style the speech similar to what you hear when listening to the news on the radio or television. The news style can be combined with the <voice> tag and the Matthew, Joanna, and Lupe voices. The news style is available in English (US) and English (AU) skills.
fun – Style the speech to sound more friendly and animated in greetings, animation or children stories. The fun style is available in Japanese (JP) skills.

Examples

news

Copied to clipboard.

<speak>
    <amazon:domain name="news">
        Latest news: The conversational and news styles are now available for the Matthew or Joanna voices!
    </amazon:domain>
</speak>

music

Copied to clipboard.

<speak>
    <amazon:domain name="music">
        Sweet Child O' Mine by Guns N' Roses became one of their most
        successful singles topping the billboard Hot 100 in 1988. Slash's
        guitar solo on this song was ranked the 37th greatest solo of all
        time. Here's Sweet Child O' Mine.
    </amazon:domain>
</speak>

long-form

Copied to clipboard.

<speak>
    <amazon:domain name="long-form">
        Meet Echo Dot. Our most popular Echo is now even better.
        With a new speaker and design, Echo Dot is a voice-controlled smart speaker with Alexa,
        perfect for any room. Just ask for music, news, information, and more.
        You can also call almost anyone and control compatible smart home devices with your voice.
    </amazon:domain>
</speak>

fun

Copied to clipboard.

<speak>
    <amazon:domain name="fun">
        布団が、ふっとんだ。
    </amazon:domain>
</speak>

news combined with voice

This example uses two different voices in the same response.

Copied to clipboard.

<speak>
    <voice name="Matthew">
        <amazon:domain name="news">
            Latest news: The conversational and news styles are now available for the Matthew or Joanna voices!
        </amazon:domain>
    </voice>
    <voice name="Joanna">
        <amazon:domain name="conversational">
            That was all for today. Thank you.
        </amazon:domain>
    </voice>
</speak>

conversational combined with voice

The <amazon:domain name="conversational"> works with the <voice> tag and the Matthew and Joanna voices. You can't use conversational without the <voice> tag.

Copied to clipboard.

<voice name="Matthew">
    <amazon:domain name="conversational">I really didn't know how this morning was going to start. And if I had known, I think I might have just stayed in bed.
    </amazon:domain>
</voice>

You can combine <amazon:domain> with all other tags, except for those listed in incompatible tags.

Best practices for the amazon:domain tag

These recommendations can help you build a better experience with the <amazon:domain> tag:

Use the default Alexa voice without the <amazon:domain> tag in the intro to your skill. This sets a "baseline," so that the specialized speaking styled responses later have more impact.
Don't overdo the use of the speaking styles, as this might create a poor or unpleasant user experience. For example, don't switch between different speaking styles frequently.
Test how your responses sound with a device or the simulator in the developer console and verify that speaking style is appropriate for the response.

amazon:effect

Applies Amazon-specific effects to the speech.

Attribute Possible values

Attribute	Possible values
`name`	The name of the effect to apply to the speech. Accepted values: `whispered`: Applies a whispering effect to the speech.

name

The name of the effect to apply to the speech.
Accepted values:

whispered: Applies a whispering effect to the speech.

amazon:emotion

Note: The <amazon:emotion> tag is available in the following locales: English (US), English (UK), German (DE), and Japanese (JP).

The <amazon:emotion> tag causes Alexa to express emotion when speaking. The emotion effects are useful for stories, games, news updates and other narrative content. For instance, in a game, you might use the "excited" emotion for correct answers and the "disappointed" emotion for incorrect answers.

The <amazon:emotion> tag takes the following required parameters:

Attribute Possible values

Attribute	Possible values
`name`	The name of the emotion to apply to the speech. Accepted values: `excited` `disappointed`
`intensity`	The intensity or strength of the emotion to express. Accepted values: `low` `medium` `high`

name

The name of the emotion to apply to the speech.
Accepted values:

excited
disappointed

intensity

The intensity or strength of the emotion to express.
Accepted values:

low
medium
high

Examples

Copied to clipboard.

<speak>
    <amazon:emotion name="excited" intensity="medium">
        Christina wins this round!
    </amazon:emotion>
</speak>

Copied to clipboard.

<speak>
    <amazon:emotion name="disappointed" intensity="high">
        Here I am with a brain the size of a planet
        and they ask me to pick up a piece of paper.
    </amazon:emotion>
</speak>

Examples of amazon:emotion combined with other SSML tags

You can combine <amazon:emotion> with all other tags, except for those listed in incompatible tags.

For example, this adds a three-second pause in the middle of speech with the "excited" emotion:

Copied to clipboard.

<speak>
    <amazon:emotion name="excited" intensity="medium">
        Okay, let's be mindful and take a deep breath.
        <break time="3s"/>
        Now don't we feel better?
    </amazon:emotion>
</speak>

This example uses <prosody> to increase the volume of the "disappointed" speech.

Copied to clipboard.

<speak>
    This is how I normally speak.
    <amazon:emotion name="disappointed" intensity="high">
        This is how I speak when I am disappointed.
        <prosody volume="x-loud">Now I am telling you I am disappointed very loudly!</prosody>
    </amazon:emotion>
</speak>

Best practices for the <amazon:emotion> tag

These recommendations can help you build a better experience with the <amazon:emotion> tag:

Use the default Alexa voice without the <amazon:emotion> tag in the intro to your skill. This sets a "baseline," so that the emotional responses later can have more impact.
Don't overuse emotional responses, as this might create a poor or unpleasant experience. Consider these guidelines:
- Don't switch between excited and disappointed frequently.
- Don't use the emotions in every response.
Try the medium intensity initially, and then adjust the intensity as needed. Using medium in most instances gives you more options for adjusting the intensity up or down depending on the situation.
Test how your responses sound with a device or the simulator in the developer console and make sure that the voice is appropriate for the response.

audio

The <audio> tag lets you provide the URL for an MP3 file that the Alexa service can play. Use the <audio> tag to embed short, pre-recorded audio within your response. For example, you could include sound effects alongside your text-to-speech responses, or provide a response that uses a voice associated with your brand.

Tip: For a library of sound effects you can use with the <audio> tag, see the Alexa Skills Kit Sound Library.

Attribute Possible values

Attribute	Possible values
`src`	Specifies the URL for the MP3 file. Note the following requirements and limitations: The MP3 must be hosted at an Internet-accessible HTTPS endpoint. HTTPS is required, and the domain hosting the MP3 file must present a valid, trusted SSL certificate. You can't use self-signed certificates. The MP3 must not contain any customer-specific or other sensitive information. The MP3 must be a valid MP3 file (MPEG version 2). For your speech response, the audio file can't be longer than 240 seconds. The combined total time for all audio files in the `outputSpeech` property of the response can't be more than 240 seconds. The combined total time for all audio files in the `reprompt` property of the response can't be more than 90 seconds. The bit rate must be 48 kbps. Note that this bit rate gives a good result when used with spoken content, but is generally not a high enough quality for music. The sample rate must be 22050 Hz, 24000 Hz, or 16000 Hz. Use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps).

src

Specifies the URL for the MP3 file. Note the following requirements and limitations:

The MP3 must be hosted at an Internet-accessible HTTPS endpoint. HTTPS is required, and the domain hosting the MP3 file must present a valid, trusted SSL certificate. You can't use self-signed certificates.
The MP3 must not contain any customer-specific or other sensitive information.
The MP3 must be a valid MP3 file (MPEG version 2).
For your speech response, the audio file can't be longer than 240 seconds.
The combined total time for all audio files in the outputSpeech property of the response can't be more than 240 seconds.
The combined total time for all audio files in the reprompt property of the response can't be more than 90 seconds.
The bit rate must be 48 kbps. Note that this bit rate gives a good result when used with spoken content, but is generally not a high enough quality for music.
The sample rate must be 22050 Hz, 24000 Hz, or 16000 Hz.

Use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps).

Include the <audio> tag within your text-to-speech response within the <speak> tag. Alexa plays the MP3 at the specified point within the text to speech. For example:

When Alexa renders this response, it sounds like this:

Alexa: Welcome to Ride Hailer.
(the specified amzn_sfx_car_accelerate_01.mp3 audio file plays)
Alexa: You can order a ride, or request a fare estimate. Which do you want?

A single response sent by your service can include multiple <audio> tags according to the following limits:

No more than five audio files can be used in a single response.
The combined total time for all audio files in the outputSpeech property of the response can't be more than 240 seconds.
The combined total time for all audio files in the reprompt property of the response can't be more than 90 seconds.

Converting audio files to an Alexa-friendly format

You can use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps). One option is a command-line tool, called FFmpeg.

This sample command converts the provided <input-file> to an MP3 file that works with the <audio> tag. This version uses 16000 as the sample rate:

ffmpeg -i <input-file> -ac 2 -codec:a libmp3lame -b:a 48k -ar 16000 -write_xing 0 <output-file>

You might get better quality by increasing the sample rate to 24000 like this:

ffmpeg -i <input-file> -ac 2 -codec:a libmp3lame -b:a 48k -ar 24000 -write_xing 0 <output-file>

For more details about command line options, see the documentation for FFmpeg.

Another option is Audacity:

Open the file to convert.
Set the Project Rate in the lower-left corner to 16000.
Click File > Export Audio and change the Save as type to MP3 Files.
Click Options, set the Quality to 48 kbps and the Bit Rate Mode to Constant.

This requires the Lame library, which can be found at: http://lame.buanzo.org/#lamewindl.

Hosting the audio files for your skill

The MP3 files you use to provide audio must be hosted on an endpoint that uses HTTPS. The endpoint must provide an SSL certificate signed by an Amazon-approved certificate authority. Many content hosting services provide this. For example, you could host your files at a service such as Amazon Simple Storage Service (Amazon S3) (an Amazon Web Services offering).

You aren't required to authenticate the requests for the audio files. Therefore, you must not include any customer-specific or sensitive information in these audio files. For example, building a custom MP3 file in response to a user's request, and including sensitive information within the audio, isn't allowed.

For optimal performance, Amazon recommends that you host your MP3 files for SSML responses in close proximity to where your skill is hosted. For example, if the Lambda function for your skill is hosted in the US West (Oregon) region, you will get better performance if you upload your MP3s to a US West (Oregon) S3 bucket.

In addition to using S3 for hosting, Amazon recommends that you use a content delivery network (CDN) such as AWS CloudFront for hosting media assets to prevent throttling under high load.

HTTP Live Streaming (HLS) of audio files

Alexa supports SSML <audio> tags that point toward HTTP Live Streaming (HLS) streams, provided that the audio data conforms to the listed specifications. Due to the streaming approach that Alexa uses, there is no benefit to using HLS streams instead of statically served MP3 files. Furthermore, unlike with statically served MP3 files, an SSML response that contains an HLS stream that violates the 240-second duration limit fails silently. This silent failure means that the playback stops before the limit is hit, no error message is generated on the customer device, and the skill doesn't receive an error request. If your skill uses SSML responses that contain HLS streams, make sure that you take particular care to test the audio returned in its responses.

break

Represents a pause in the speech. Set the length of the pause with the strength or time attributes.

Important: Break tag silence can't exceed 10 seconds, including scenarios with consecutive break tags. SSML with more than 10 seconds of silence isn't rendered to the user.

Attribute Possible Values

Attribute	Possible Values
`strength`	The strength or length to pause. Accepted values: `none`: Don't output a pause. Use this to remove a pause that would normally occur, such as after a period. `x-weak`: Don't output a pause. Equivalent to `none`. `weak`: Treat adjacent words as if separated by a single comma. Equivalent to `medium`. `medium`: Treat adjacent words as if separated by a single comma. `strong`: Make a sentence break. Equivalent to using the `<s>` tag. `x-strong`: Make a paragraph break. Equivalent to using the `<p>` tag.
`time`	Duration of the pause; up to 10 seconds (`10s`) or 10000 milliseconds (`10000ms`). Include the unit with the time (`s` or `ms`).

strength

The strength or length to pause.
Accepted values:

none: Don't output a pause. Use this to remove a pause that would normally occur, such as after a period.
x-weak: Don't output a pause. Equivalent to none.
weak: Treat adjacent words as if separated by a single comma. Equivalent to medium.
medium: Treat adjacent words as if separated by a single comma.
strong: Make a sentence break. Equivalent to using the <s> tag.
x-strong: Make a paragraph break. Equivalent to using the <p> tag.

time

Duration of the pause; up to 10 seconds (10s) or 10000 milliseconds (10000ms). Include the unit with the time (s or ms).

The default is medium. This is used if you don't specify any attributes, or if you provide any unsupported attribute values.

emphasis

Emphasize the tagged words or phrases. Emphasis changes rate and volume of the speech. More emphasis is spoken louder and slower. Less emphasis is quieter and faster.

Attribute Possible values

Attribute	Possible values
`level`	Level of emphasis to apply. Accepted values: `strong`: Increase the volume and slow down the speaking rate so the speech is louder and slower. `moderate`: Increase the volume and slow down the speaking rate, but not as much as when set to `strong`. This is used as a default when `level` isn't provided. `reduced`: Decrease the volume and speed up the speaking rate. The speech is softer and faster.

level

Level of emphasis to apply.
Accepted values:

strong: Increase the volume and slow down the speaking rate so the speech is louder and slower.
moderate: Increase the volume and slow down the speaking rate, but not as much as when set to strong. This is used as a default when level isn't provided.
reduced: Decrease the volume and speed up the speaking rate. The speech is softer and faster.

You can combine <emphasis> with all other tags, except for those listed in incompatible tags.

Note: When you modify the speech with the <emphasis> tag, Alexa uses a legacy text-to-speech system, which might change the speech sound quality.

lang

Use <lang> to specify the language model and rules to speak the tagged content as if it were written in the language specified by the xml:lang attribute. Words and phrases in other languages usually sound better when enclosed with the <lang> tag. This is useful for short phrases in other languages, such as the names of restaurants or shops.

The following example shows the SSML to pronounce "Paris" using the language code fr-FR, which refers to the French language as spoken in France.

Alexa adapts the pronunciation to use the sounds available in the original language of the skill, so it might not sound exactly like a native speaker. To achieve a more natural voice than what you get with the <lang> tag alone, use the <lang> tag and the <voice> tag together. With the <voice>, you can select a voice customized for a specific language. Make sure that the language of the tagged text matches the <lang> attribute, and that the <voice> is specific to the language of the text.

For example, consider the French phrase "J'adore chanter" in an English (US) skill. The following examples show how Alexa speaks this phrase without the <lang> tag, with the <lang> tag alone, and with both <lang> and <voice>.

Without any tags, Alexa speaks the phrase with English-like pronunciation.

With the <lang xml:lang='fr-FR'> tag, Alexa uses French pronunciation with sounds available in English for a "French-like" pronunciation. A perfect French pronunciation would include an uvular trill (/R/) in the word "adore." The French-like English pronunciation achieved with the <lang> tag uses the corresponding /r/ sound instead.

For a better French pronunciation, use the <voice> tag with a French voice. The following example uses the Celine voice.

Supported locales for the xml:lang attribute

The <lang> tag supports the following locales:

de-DE
en-AU
en-CA
en-GB
en-IN
en-US
es-ES
es-MX
es-US
fr-CA
fr-FR
hi-IN
it-IT
ja-JP
pt-BR

p

Represents a paragraph. This tag provides extra-strong breaks before and after the tag. This is equivalent to specifying a pause with <break strength="x-strong"/>.

phoneme

Provides a phonemic/phonetic pronunciation for the contained text. For example, people might pronounce words like "pecan" differently.

Attribute Possible values

Attribute	Possible values
`alphabet`	Set to the phonetic alphabet to use. Accepted values: `ipa`: The International Phonetic Alphabet (IPA). `x-sampa`: The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA).
`ph`	The phonetic pronunciation to speak. Use the symbols provided in supported symbols to define the pronunciation. The symbols are locale-specific.

alphabet

Set to the phonetic alphabet to use.
Accepted values:

ipa: The International Phonetic Alphabet (IPA).
x-sampa: The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA).

ph

The phonetic pronunciation to speak. Use the symbols provided in supported symbols to define the pronunciation. The symbols are locale-specific.

When you use , Alexa uses the pronunciation provided in the `ph` attribute instead of the text contained within the tag. However, you should still provide human-readable text within the tags. In the following example, the word "pecan" shown within the tags is never spoken. Instead, Alexa speaks the text provided in the `ph` attribute:

<speak>
    You say, <phoneme alphabet='ipa' ph='pɪˈkɑːn'>pecan</phoneme>.
    I say, <phoneme alphabet='ipa' ph='ˈpi.kæn'>pecan</phoneme>.
</speak>

Additional examples of writing words with a phonetic alphabet:

Word	IPA	X-SAMPA
bottle	ˈbɑ.təl	"bA.t@l
frozen	ˈfɹoʊ.zən	"fr\oU.z@n
blossom	ˈblɑ.səm	"blA.s@m

Supported symbols

The following tables list the supported symbols for use with the <phoneme> tag. The symbols are specific to the skill language.

These symbols provide full coverage for the sounds of Arabic (SA). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Arabic (SA) skills is discouraged, as it may result in suboptimal speech synthesis.

X-SAMPA	IPA	Description	Example	Pronunciation
?	ʔ	glottal stop	أَنَا	/ " ʔ a n a: /
b	b	voiced bilabial plosive	بَلَد	/ " b a l a d /
t	t	voiceless alveolar plosive	تَمَر	/ " t a m a r /
T	θ	voiceless interdental fricative	ثَلَاثَة	/ T a " l a: T a /
dZ	ʤ	voiced postalveolar affricate	جَمِيل	/ dZ a " m i: l /
X\	ħ	voiceless pharyngeal fricative	حَيَوَان	/ X\ a j a " w a: n /
x	x	voiceless velar fricative	خَرُوف	/ x a " r u: f /
d	d	voiced alveolar plosive	دَرْدَار	/ d a r " d a: r /
D	ð	voiced interdental fricative	ذَلِك	/ " D a: l i k a /
r	r	alveolar trill	رَمْل	/ " r a m l /
z	z	voiced alveolar fricative	زُهُور	/ z u " h u: r /
s	s	voiceless alveolar fricative	سَمْسَم	/ " s a m s a m /
S	ʃ	voiceless postalveolar fricative	شَمْس	/ " S a m s /
s_?\	sˤ	pharyngealised voiceless alveolar fricative	صَاحِب	/ " s_?\ A_?: X\ i b /
d_?\	dˤ	pharyngealised voiced alveolar plosive	ضَوْء	/ " d_?\ a w ? /
t_?\	tˤ	pharyngealised voiceless alveolar plosive	طَالِب	/ " t_?\ A_?: l i b /
D_?\	ðˤ	pharyngealised voiced interdental fricative	ظَلَام	/ " D_?\ A_?\ l a: m /
?\	ʕ	voiced pharyngeal fricative	عُمْر	/ " ?\ u m r /
G	ɣ	voiced velar fricative	غَرْب	/ " G a r b /
f	f	voiceless labiodental fricative	فَصْل	/ " f A_?\ s_?\ l /
q	q	voiceless uvular plosive	قَصْر	/ " q A_?\ s_?\ r /
k	k	voiceless velar plosive	كَامِل	/ " k a: m i l /
l	l	voiced alveolar lateral approximant	لَيْل	/ " l a j l /
l_G	lˤ	pharyngealised voiced alveolar lateral approximant	والله	/ w A_?\ " l_G l_G A_?: h /
m	m	bilabial nasal stop	مَصْر	/ " m A_?\ s_?\ r /
n	n	alveolar nasal stop	نُور	/ " n u: r /
h	ɦ	voiced glottal fricative	هِلَال	/ h i " l a: l /
w	w	voiced labiovelar approximant	وَلَد	/ " w a l a d /
j	j	voiced palatal approximant	يُسْر	/ " j u s r /
g	g	voiced velar plosive	إِنْجِلِتْرَا	/ ? i N " g l i t r a: /
v	v	voiced labiodental fricative	فِيتَامِين	/ v i: t A " m i: n /
p	p	voiceless bilabial plosive	أُوبِرَا	/ " ? O p e r a: /
N	ŋ	velar nasal stop	ْهُونْغْ كُونْغ	/ h O N " k O N g /
Z	ʒ	voiced postalveolar fricative	جاكيت	/ Z a " k e: t /
a	æ	mid-open front unrounded short vowel	لَوْن	/ " l a w n /
A_?\	ɑˤ	pharyngealised open back unrounded short vowel	صَلْب	/ " s_?\ A_?\ l b /
a:	æː	mid-open front unrounded long vowel	بَاب	/ " b a: b /
A_?:	ɑˤː	pharyngealised open back unrounded long vowel	نَاضِج	/ " n A_?: d_?\ i_?\ dZ /
u	u	close back rounded short vowel	شُرْب	/ " S u r b /
u_?\	uˤ	pharyngealised close back rounded short vowel	عُصْفُور	/ ?\ u_?\ s_?\ " f u: r /
u:	uː	close back rounded long vowel	تُوت	/ " t u: t /
u_?:	uˤː	pharyngealised close back rounded long vowel	صُور	/ " s_?\ u_?: r /
i	i	close front unrounded short vowel	بِنْت	/ " b i n t /
i_?\	iˤ	pharyngealised close front unrounded short vowel	طِفْل	/ " t_?\ i_?\ f l /
i:	iː	close front unrounded long vowel	سَبِيل	/ s a " b i: l /
i_?:	iˤː	pharyngealised close front unrounded long vowel	رَطِيب	/ r A_?\ " t_?\ i_?: b /
A	a	open central unrounded short vowel	wifi	/ " w A j f A j /
O	ɔ	open-mid back rounded short vowel	دُولَار	/ d O " l A r /
O:	ɔː	open-mid back rounded long vowel	تِلْفِزِيُون	/ t i l f i z " j O: n /
e	e	mid front unrounded short vowel	إِنْتَرْنِت	/ ? e n t a r " n a: t /
e:	eː	mid front unrounded long vowel	سِكْرِتِير	/ s i k r i " t e: r /

These symbols provide full coverage for the sounds of Dutch. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Dutch skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-Amazon Symbol	Description	Keyword Token
p	p	Voiceless bilabial plosive	pen
b	b	Voiced bilabial plosive	biet
t	t	Voiceless alveolar plosive	tak
d	d	Voiced alveolar plosive	dak
k	k	Voiceless velar plosive	kat
g	g	Voiced velar plosive	goal
f	f	Voiceless labiodental fricative	fiets
v	v	Voiced labiodental fricative	vijf
s	s	Voiceless alveolar fricative	sok
z	z	Voiced alveolar fricative	zeep
ʃ	S	Voiceless postalveolar fricative	chef
ʒ	Z	Voiced alveolar fricative	jury
ɣ	G	Voiced velar fricative	geeuw
x	x	Voiceless velar fricative	acht
ɦ	h\	Voiced glottal fricative	hoed
m	m	bilabial nasal	mens
n	n	Alveolar nasal	nek
ŋ	N	Velar nasal	eng
l	l	Alveolar lateral approximant	land
r ʀ ɹ	r	Rhotic	rat maar
j	j	Palatal approximant	jas
ʋ	v\	Labiodental approximant	wind

Vowels

IPA	X-Amazon Symbol	Description	Keyword Token
i	i	Close front unrounded vowel	piet
y	y	Close front rounded vowel	fuut
u	u	Close back rounded vowel	boek
e:	e:	Long close-mid front unrounded vowel	pees
o:	o:	Long close-mid back rounded vowel	hoop
øː	2:	Long close-mid front rounded vowel	reus
a:	a:	Long open front unrounded vowel	kaas
ɛ	E:	Long open mid front unrounded vowel	creme
ɪ	I	Near close near front unrounded vowel	pit
ʏ	Y	Near close near front rounded vowel	hut
ɛ	E	Open mid front unrounded vowel	bed
ɔ	O	Open mid back rounded vowel	op
ɑ	A	Open back unrounded vowel	kat
ə	@	Schwa	geluk
œy	9y	Diphthong	huis
ɛi	Ei	Diphthong	tijd meid
ʌʊ	Vu	Diphthong	koud

Other symbols

Example	Meaning	Description	X-Amazon Transcription
land	primary stress	"	/ " l A n t /
jury	syllable boundary	.	/ " Z y . r i /
	Word boundary	#

English Vowels

IPA	X-Amazon Symbol	Description	Keyword Token
i:	i:	long close front unrounded vowel	fleece
u:	u:	long close back rounded vowel	goose
oʊ	oU	diphthong	boat
ɑː o:	A:	long open back unrounded vowel	father tall
ʊ	U	near-close near-back rounded vowel	foot
æ	{	near-open front unrounded vowel	bat
ʌ	V	open-mid back unrounded vowel	cut
ɝ ɚ	Yr\	mid central r-colored vowel	work butter

English Consonants

IPA	X-Amazon Symbol	Vowel Type	Keyword Token
p	p_e	Aspirated voiceless bilabial plosive	path
b	b_e	devoiced voiced bilabial plosive	beetle
t	t_e	Aspirated voiceless alveolar plosive	task
d	d_e	Devoiced voiced alveolar plosive	day
k	k_e	aspirated voiceless velar plosive	cat
v	v_e	voiced labiodental fricative	violet
s	s_e	voiceless alveolar fricative	sock
z	z_e	voiced alveolar fricative	zero
θ	T	voiceless dental fricative	thin
ð	D	voiced dental fricative	brother
l	l_e	Darkened alveolar lateral approximant	bleating
ɹ	r\	alveolar approximant	red
w	w_e	Labial-velar Approximant	ware

These symbols provide full coverage for the sounds of English (AU). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (AU) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

Vowels

IPA	X-SAMPA	Description	Example
ə	@	mid central vowel	arena
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɜ	3	open-mid central unrounded vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
əʊ	@U	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut
ɒ	Q	open back rounded vowel	bother
ɛə	E@	diphthong	bear
ɪə	I@	diphthong	beer
ʊə	U@	diphthong	tour

Additional Symbols

IPA	X-SAMPA	Description	Example
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

These symbols provide full coverage for the sounds of English (Canada). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (Canada) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

Vowels

IPA	X-SAMPA	Description	Example
ə	@	mid central vowel	arena
ɚ	@`	mid central r-colored vowel	reader
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɝ	3`	open-mid central unrounded r-colored vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
oʊ	oU	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut

Additional Symbols

IPA	X-SAMPA	Description	Example
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

These symbols provide full coverage for the sounds of English (India). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (India) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants
Vowels
Additional Symbols
Hindi Consonants
Hindi Vowels

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

Vowels

IPA	X-SAMPA	Description	Example
ə	@	mid central vowel	arena
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɜ	3	open-mid central unrounded vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
əʊ	@U	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut
ɒ	Q	open back rounded vowel	bother
ɛə	E@	diphthong	bear
ɪə	I@	diphthong	beer
ʊə	U@	diphthong	tour

Additional Symbols

IPA	X-SAMPA	Description	Example
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

Hindi Consonants

IPA	X-SAMPA	Description	Example
pʰ	p_h	voiceless aspirated bilabial plosive	फूल (phool)
bʱ	b_h	voiced aspirated bilabial plosive	भारी (bhaari)
t̪	t_d	voiceless dental plosive	तापमान (taapmaan)
t̪ʰ	t_d_h	voiceless aspirated dental plosive	थोड़ा (thoda)
d̪	d_d	voiced dental plosive	दिल्ली (dilli)
d̪ʱ	d_d_h	voiced aspirated dental plosive	धोबी (dhobi)
ʈ	t`	voiceless retroflex plosive	कटोरा (katora)
ʈʰ	t`_h	voiceless aspirated retroflex plosive	ठंड (thand)
ɖ	d`	voiced retroflex plosive	डर (darr)
ɖʱ	d`_h	voiced aspirated retroflex plosive	ढाल (dhal)
tʃʰ	tS_h	voiceless aspirated palatal affricate	छाल (chaal)
dʒʱ	dZ_h	voiced aspirated palatal affricate	झाल (jhaal)
kʰ	k_h	voiceless aspirated velar plosive	खान (khan)
ɡʱ	g_h	voiced aspirated velar plosive	घान (ghaan)
ɳ	n`	retroflex nasal	क्षण (kshan)
ɾ	4	alveolar flap	राम (ram)
ɽ	r`	plain retroflex flap	बड़ा (bada)
ɽʱ	r`_h	voiced aspirated retroflex flap	बढ़ी (barhi)
ʋ	v\	bilabial approximant	वसूल (wasool)

Hindi Vowels

IPA	X-SAMPA	Description	Example
ə	@_o	mid central vowel	अच्छा (achhaa)
ə̃	@~	nasalised mid central vowel	हँसना (hansnaa)
a	A_o	open front unrounded vowel	आग (aag)
ã	A~	nasalised open front unrounded vowel	घड़ियाँ (ghariyaan)
ɪ	I_o	near-close near-front unrounded vowel	इक्कीस (ikkees)
ɪ̃	I~	nasalised near-close near front unrounded vowel	सिंचाई (sinchai)
i	i_o	close front unrounded vowel	बिल्ली (billee)
ĩ	i~	nasalised close front unrounded vowel	नहीं (nahin)
ʊ	U_o	near-close near-back rounded vowel	उल्लू (ullu)
ʊ̃	U~	nasalised near-close near-back rounded vowel	मुँह (munh)
u	u_o	close back rounded vowel	फूल (phool)
ũ	u~	nasalised close back rounded vowel	ऊँट (oont)
ɔ	O_o	open-mid back rounded vowel	कौन (kaun)
ɔ̃	O~	nasalised open-mid back rounded vowel	भौं (bhaun)
o	o	close-mid back rounded vowel	सोना (sona)
õ	o~	nasalised close-mid back rounded vowel	क्यों (kyon)
ɛ	E_o	open-mid front unrounded vowel	पैसा (paisa)
ɛ̃	E~	nasalised open-mid front unrounded vowel	मैं (main)
e	e	close-mid front unrounded vowel	एक (ek)
ẽ	e~	nasalised close-mid front unrounded vowel	किताबें (kitabein)

These symbols provide full coverage for the sounds of English (UK). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (UK) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

Vowels

IPA	X-SAMPA	Description	Example
ə	@	mid central vowel	arena
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɜ	3	open-mid central unrounded vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
əʊ	@U	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut
ɒ	Q	open back rounded vowel	bother
ɛə	E@	diphthong	bear
ɪə	I@	diphthong	beer
ʊə	U@	diphthong	tour

Additional Symbols

IPA	X-SAMPA	Description	Example
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

These symbols provide full coverage for the sounds of English (US). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (US) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

Vowels

IPA	X-SAMPA	Description	Example
ə	@	mid central vowel	arena
ɚ	@`	mid central r-colored vowel	reader
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɝ	3`	open-mid central unrounded r-colored vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
oʊ	oU	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut

Additional Symbols

IPA	X-SAMPA	Description	Example
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

These symbols provide full coverage for the sounds of French (CA). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for French (CA) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bon
d	d	voice alveolar plosive	deux
f	f	voiceless labiodental fricative	faire
g	ɡ	voiced velar plosive	garçon
ɥ	H	labial-palatal approximant	huit
j	j	palatal approximant	travail
k	k	voiceless velar plosive	corps
l	l	alveolar lateral approximant	laisser
m	m	bilabial nasal	même
n	n	alveolar nasal	nous
ɲ	J	palatal nasal	gagner
ŋ	N	velar nasal	camping
p	p	voiceless bilabial plosive	père
ʁ	R	voiced uvular fricative	regarder
s	s	voiceless alveolar fricative	sans
ʃ	S	voiceless postalveolar fricative	chance
t	t	voiceless alveolar plosive	tout
tʃ	tS	voiceless postalveolar affricate	ciao
dʒ	dZ	voiced postalveolar affricate	Djakarta
v	v	voiced labiodental fricative	vous
w	w	labial-velar approximant	oui
z	z	voiced alveolar fricative	zéro
ʒ	Z	voiced postalveolar fricative	jamais

Vowels

IPA	X-SAMPA	Description	Example
i	i	close front unrounded vowel	si
y	y	close front rounded vowel	sûr
ɪ	I	near-close front unrounded vowel	pipe
ʏ	Y	near-close front rounded vowel	lutte
e	e	close-mid front unrounded vowel	clé
ø	2	close-mid front rounded vowel	ceux
ɛ	E	open-mid front unrounded vowel	mettre
ɛː	E:	long open-mid front unrounded vowel	maître
œ	9	open-mid front rounded vowel	sœur
a	a	open front unrounded vowel	patte
ə	@	mid central vowel	le
u	u	close back rounded vowel	roue
ʊ	U	near-close back rounded vowel	coupe
o	o	close-mid back rounded vowel	bureau
ɔ	O	open-mid back rounded vowel	minimum
ɑ	A	open back unrounded vowel	châle

Nasal Vowels

IPA	X-SAMPA	Description	Example
ɑ̃	A~	nasalized open back unrounded vowel	champ
ɛ̃	E~	nasalized open-mid front unrounded vowel	pain
œ̃	9~	nasalized open-mid front rounded vowel	parfum
ɔ̃	O~	nasalized open-mid back rounded vowel	nom

Foreign Phonemes

IPA	X-SAMPA	Description	Example
ɚ	@`	mid central r-colored vowel	reader
æ	{	open-mid central unrounded vowel	trap
ʌ	V	open-mid back unrounded vowel	bus
m̩	m=	syllabic bilabial nasal	rhythm
n̩	n=	syllabic alveolar nasal	griffon
pʰ	p_h	aspirated voiceless bilabial plosive	power
tʰ	t_h	aspirated voiceless alveolar plosive	torn
kʰ	k_h	aspirated voiceless velar plosive	cage
θ	T	voiceless dental fricative	cloth
ð	D	voiced dental fricative	this
h	h	voiceless glottal fricative	hello
ɹ	r\	alveolar approximant	rice
ɫ	l_e	alveolar lateral approximant	feel

These symbols provide full coverage for the sounds of French (FR). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for French (FR) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bon
d	d	voice alveolar plosive	deux
f	f	voiceless labiodental fricative	faire
g	ɡ	voiced velar plosive	garçon
ɥ	H	labial-palatal approximant	huit
j	j	palatal approximant	travail
k	k	voiceless velar plosive	corps
l	l	alveolar lateral approximant	laisser
m	m	bilabial nasal	même
n	n	alveolar nasal	nous
ɲ	J	palatal nasal	gagner
ŋ	N	velar nasal	camping
p	p	voiceless bilabial plosive	père
ʁ	R	voiced uvular fricative	regarder
s	s	voiceless alveolar fricative	sans
ʃ	S	voiceless postalveolar fricative	chance
t	t	voiceless alveolar plosive	tout
tʃ	tS	voiceless postalveolar affricate	ciao
dʒ	dZ	voiced postalveolar affricate	Djakarta
v	v	voiced labiodental fricative	vous
w	w	labial-velar approximant	oui
z	z	voiced alveolar fricative	zéro
ʒ	Z	voiced postalveolar fricative	jamais

Vowels

IPA	X-SAMPA	Description	Example
a	a	open front unrounded vowel	patte
e	e	close-mid front unrounded vowel	clé
ɛ	E	open-mid front unrounded vowel	faite
ə	@	mid central vowel	le
i	i	close front unrounded vowel	si
œ	9	open-mid front rounded vowel	sœur
ø	2	close-mid front rounded vowel	ceux
o	o	close-mid back rounded vowel	bureau
ɔ	O	open-mid back rounded vowel	minimum
u	u	close back rounded vowel	roue
y	y	close front rounded vowel	sûr

Nasal Vowels

IPA	X-SAMPA	Description	Example
ɑ̃	A~	nasalized open back unrounded vowel	champ
ɛ̃	E~	nasalized open-mid front unrounded vowel	pain
œ̃	9~	nasalized open-mid front rounded vowel	parfum
ɔ̃	O~	nasalized open-mid back rounded vowel	nom

Foreign Phonemes

IPA	X-SAMPA	Description	Example
ð	D	voiced dental fricative	this
h	h	voiceless glottal fricative	hello
ɹ	r\	alveolar approximant	rice
θ	T	voiceless dental fricative	cloth

These symbols provide full coverage for the sounds of German. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for German skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	Bier
d	d	voiced alveolar plosive	Dach
ç	C	voiceless palatal fricative	ich
d͡ʒ	dZ	voiced postalveolar affricate	Dschungel
f	f	voiceless labiodental fricative	Vogel
g	g	voiced velar plosive	Gabel
h	h	voiceless glottal fricative	Haus
j	j	palatal approximant	jemand
k	k	voiceless velar plosive	Kleid
l	l	alveolar lateral approximant	Loch
m	m	bilabial nasal	Milch
n	n	alveolar nasal	Natur
ŋ	N	velar nasal	klingen
p	p	voiceless bilabial plosive	Park
p͡f	pf	voiceless labiodental affricate	Apfel
ʀ	R	uvular trill	Regen
s	s	voiceless alveolar fricative	Messer
ʃ	S	voiceless postalveolar fricative	Fischer
t	t	voiceless alveolar plosive	Topf
t͡s	ts	voiceless alveolar affricate	Zahl
t͡ʃ	tS	voiceless postalveolar affricate	deutsch
v	v	voiced labiodental fricative	Wasser
x	x	voiceless velar fricative	kochen
z	z	voiced alveolar fricative	See
ʒ	Z	voiced postalveolar fricative	Orange

Vowels

IPA	X-SAMPA	Description	Example
a	a	open front unrounded vowel	Salz
aː	a:	long open front unrounded vowel	Sahne
aʊ	aU	diphthong	Augen
ə	@	mid central vowel	Rede
ɐ	6	near-open central vowel	besser
aɪ	aI	diphthong	nein
ɛ	E	open-mid front unrounded vowel	Kellner
eː	e:	long close-mid front unrounded vowel	Rede
øː	2:	long close-mid front rounded vowel	böse
ɪ	I	near-close near-front unrounded vowel	bitte
iː	i:	long close front unrounded vowel	Lied
ɔ	O	open-mid back rounded vowel	Koffer
œ	9	open-mid front rounded vowel	können
oː	o:	long close-mid back rounded vowel	Kohl
ɔʏ	OY	diphthong	neu
ʊ	U	near-close near-back rounded vowel	Wunder
ʏ	Y	near-close near-front rounded vowel	Küche
uː	u:	long close back rounded vowel	Bruder
yː	y:	long close front rounded vowel	kühl

Centralised Diphthongs

IPA	X-SAMPA	Description	Example
aɐ̯	a6_^	hart	aːɐ̯
a:6_^	Haar	ɛɐ̯	E6_^
Berg	eːɐ̯	e:6_^	schwer
øːɐ̯	2:6_^	Nadelöhr	ɪɐ̯
I6_^	Wirtschaft	iːɐ̯	i:6_^
Tier	ɔɐ̯	O6_^	dort
œɐ̯	96_^	Wörter	oːɐ̯
o:6_^	Ohr	ʊɐ̯	U6_^
Gurke	ʏɐ̯	Y6_^	Türkei
uːɐ̯	u:6_^	Kur	yːɐ̯
y:6_^	Tür

English Phonemes

IPA	X-SAMPA	Description	Example
ð	D	voiced dental fricative	brother
ɹ	r\	alveolar approximant	ripe
θ	T	voiceless dental fricative	north
w	w	labial-velar approximant	well
ɔː	O:	long open-mid back rounded vowel	callcenter
eɪ	eI	diphthong	rating
oʊ	oU	diphthong	windows

French Phonemes

IPA	X-SAMPA	Description	Example
ã:	a~:	nasalized long open front unrounded vowel	Croissant
ɛ̃ː	E~:	nasalized long open-mid front unrounded vowel	Terrain
õ:	o~:	nasalized long close-mid back rounded vowel	Annonce

Additional Symbols

IPA	X-SAMPA	Description	Example
ˈ	"	primary stress	genau
.	.	syllable boundary	ver.stan.den

These symbols provide full coverage for the sounds of Hindi (IN). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Hindi (IN) skills is discouraged, as it may result in suboptimal speech synthesis.

Hindi Consonants

IPA	XSAMPA	Description	Example
p	p	voiceless bilabial plosive	परिंदा
pʰ	p_h	voiceless aspirated bilabial plosive	फूल (phool)
b	b	voiced bilabial plosive	बिस्तर
bʱ	b_h	voiced aspirated bilabial plosive	भारी (bhaari)
t̪	t_d	voiceless dental plosive	तापमान (taapmaan)
t̪ʰ	t_d_h	voiceless aspirated dental plosive	थोड़ा (thoda)
d̪	d_d	voiced dental plosive	दिल्ली (dilli)
d̪ʱ	d_d_h	voiced aspirated dental plosive	धोबी (dhobi)
ʈ	t`	voiceless retroflex plosive	कटोरा (katora)
ʈʰ	t`_h	voiceless aspirated retroflex plosive	ठंड (thand)
ɖ	d`	voiced retroflex plosive	डर (darr)
ɖʱ	d`_h	voiced aspirated retroflex plosive	ढाल (dhal)
tʃ	tS	voiceless postalveolar affricate	चार
tʃʰ	tS_h	voiceless aspirated palatal affricate	छाल (chaal)
dʒ	dZ	voiced postalveolar affricate	जंगल
dʒʱ	dZ_h	voiced aspirated palatal affricate	झाल (jhaal)
k	k	voiceless velar plosive	कमाल
kʰ	k_h	voiceless aspirated velar plosive	खान (khan)
g	g	voiced velar plosive	गाँव
ɡʱ	g_h	voiced aspirated velar plosive	घान (ghaan)
l	l	alveolar lateral approximant	लम्हा
m	m	bilabial nasal	मंत्र
n	n	alveolar nasal	नाग
ŋ	N	velar nasal	मंगल
ɳ	n`	retroflex nasal	क्षण (kshan)
s	s	voiceless alveolar fricative	साल
z	z	voiced alveolar fricative	ज़रूर
ʃ	S	voiceless postalveolar fricative	शर्मिंदा
f	f	voiceless labiodental fricative	फ़ारसी
ɾ	4	alveolar flap	राम (ram)
ɽ	r`	plain retroflex flap	बड़ा (bada)
ɽʱ	r`_h	voiced aspirated retroflex flap	बढ़ी (barhi)
h	h	voiceless glottal fricative	हार
j	j	palatal approximant	यार
ʋ	v\	bilabial approximant	वसूल (wasool)

Hindi Vowels

IPA	XSAMPA	Description	Example
ə	@	mid central vowel	अच्छा (achhaa)
ə̃	@~	nasalised mid central vowel	हँसना (hansnaa)
a	A	open front unrounded vowel	आग (aag)
ã	A~	nasalised open front unrounded vowel	घड़ियाँ (ghariyaan)
ɪ	I	near-close near-front unrounded vowel	इक्कीस (ikkees)
ɪ̃	I~	nasalised near-close near front unrounded vowel	सिंचाई (sinchai)
i	i	close front unrounded vowel	बिल्ली (billee)
ĩ	i~	nasalised close front unrounded vowel	नहीं (nahin)
ʊ	U	near-close near-back rounded vowel	उल्लू (ullu)
ʊ̃	U~	nasalised near-close near-back rounded vowel	मुँह (munh)
u	u	close back rounded vowel	फूल (phool)
ũ	u~	nasalised close back rounded vowel	ऊँट (oont)
ɔ	O	open-mid back rounded vowel	कौन (kaun)
ɔ̃	O~	nasalised open-mid back rounded vowel	भौं (bhaun)
o	o	close-mid back rounded vowel	सोना (sona)
õ	o~	nasalised close-mid back rounded vowel	क्यों (kyon)
ɛ	E	open-mid front unrounded vowel	पैसा (paisa)
ɛ̃	E~	nasalised open-mid front unrounded vowel	मैं (main)
e	e	close-mid front unrounded vowel	एक (ek)
ẽ	e~	nasalised close-mid front unrounded vowel	किताबें (kitabein)

English Consonants

IPA	X-SAMPA	Description	Example
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

English Vowels

IPA	X-SAMPA	Description	Example
ə	@_o	mid central vowel	arena
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A_o	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɜ	3	open-mid central unrounded vowel	nurse
ɛ	E_o	open-mid front unrounded vowel	dress
i	i_o	long close front unrounded vowel	fleece
ɪ	I_o	near-close near-front unrounded vowel	kit
əʊ	@U	diphthong	goat
ɔ	O_o	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u_o	long close back rounded vowel	goose
ʊ	U_o	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut
ɒ	Q	open back rounded vowel	bother
ɛə	E@	diphthong	bear
ɪə	I@	diphthong	beer
ʊə	U@	diphthong	tour

Additional Symbols

IPA	X-SAMPA	Description	Example
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

These symbols provide full coverage for the sounds of Italian. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Italian skills is discouraged, as it may result in suboptimal speech synthesis.

X-SAMPA	IPA	Example
b	b	problema
tS	tʃ	pancia
d	d	diretto
dz	dz	benzina
f	f	difesa
g	g	erogazione
j	j	votazione
dZ	dʒ	legislatura
k	k	cascata
l	l	polvere
L	ʎ	dettaglio
m	m	settimo
n	n	comune
N	ŋ	anche
J	ɲ	dignità
p	p	pasta
r	r	promozione
s	s	vestito
S	ʃ	disciplina
t	t	articolo
ts	ts	esistenza
v	v	tuttavia
w	w	delinquenza
z	z	musicista
Z	ʒ	peugeot
i	i	musica
e	e	vestito
E	ɛ	veste
a	a	mano
u	u	uva
o	o	polacco
O	ɔ	povero
.	syllable boundary	rapido (" r a . p i . d o)
"	primary stress	certo (" c ɛ r . t o)
%	secondary stress	alfabeto (% a l . f a . " b e . t o)

These symbols provide full coverage for the sounds of Japanese. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Japanese skills is discouraged, as it may result in suboptimal speech synthesis.

子音

IPA

X-SAMPA

説明

例

有声両唇破裂音

ボート（booto）

有声歯茎破裂音

電車（densha）

有声軟口蓋破裂音

学校（gakkoo）

無声声門摩擦音

花火（hanabi）

硬口蓋接近音

夢（yume）

無声軟口蓋破裂音

会社（kaisha）

両唇鼻音

メガネ（megane）

歯茎鼻音

猫（neko）

無声両唇破裂音

ピアノ（piano）

無声歯茎摩擦音

寿司（sushi）

無声歯茎破裂音

テレビ（terebi）

両唇軟口蓋接近音

話題（wadai）

有声歯茎摩擦音

雑貨（zakka）

無声両唇摩擦音

冬（fuyu）

無声硬口蓋摩擦音

ヒント（hinto）

歯茎はじき音

冷蔵庫（reezooko）

t͡s

無声歯茎破擦音

月（tsuki）

無声硬口蓋破裂音

天気（tenki）

有声硬口蓋破裂音

将棋（shoogi）

無声歯茎硬口蓋摩擦音

紹介（shookai）

d͡ʑ

有声歯茎硬口蓋破擦音

ジュース（juusu）

硬口蓋鼻音

日本（nihon）

歯茎側面はじき音

リンゴ（ringo）

t͡ɕ

ts\

無声歯茎硬口蓋破擦音

宇宙（uchuu）

促音

ロボット（robotto）

口蓋垂鼻音

パソコン（pasokon）

母音

IPA

X-SAMPA

説明

例

非円唇中舌広母音

窓（mado）

非円唇前舌狭母音

イス（isu）

非円唇後舌狭母音

クジラ（kujira）

非円唇前舌中央母音

世界（sekai）

円唇後舌中央母音

お茶（ocha）

ä:

非円唇中舌広長母音

ギター（gitaa）

非円唇前舌狭長母音

チーム（chiimu）

ɯ:

非円唇後舌狭長母音

算数（sansuu）

非円唇前舌中央長母音

ケータイ（keetai）

円唇後舌中央長母音

飛行機（hikooki）

Consonants

IPA	X-SAMPA	Description	Example
ɾ	4	alveolar flap	pira
b	b	voiced bilabial plosive	bato
d	d	voiced alveolar plosive	dato
d͡ʒ	dZ	voiced postalveolar affricate	idade
f	f	voiceless labiodental fricative	facto
g	g	voiced velar plosive	gato
j	j	palatal approximant	paraguay
k	k	voiceless velar plosive	cacto
l	l	alveolar lateral approximant	galo
ʎ	L	palatal lateral approximant	galho
m	m	bilabial nasal	mato
n	n	alveolar nasal	nato
ɲ	J	palatal nasal	pinha
p	p	voiceless bilabial plosive	pato
s	s	voiceless alveolar fricative	saca
ʃ	S	voiceless postalveolar fricative	chato
t	t	voiceless alveolar plosive	tacto
t͡ʃ	tS	voiceless postalveolar affricate	noite
v	v	voiced labiodental fricative	vaca
w	w	labial-velar approximant	mau
χ	X	voiceless uvular fricative	carro
z	z	voiced alveolar fricative	zaca
ʒ	Z	voiced postalveolar fricative	jacto

Vowels

IPA	X-SAMPA	Description	Example
a	a	open front unrounded vowel	parto
ã	a~	nasal open front unrounded vowel	pensamos
e	e	close-mid front unrounded vowel	pega
ẽ	e~	nasal close-mid front unrounded vowel	movem
ɛ	E	open-mid front unrounded vowel	café
i	i	close front unrounded vowel	lingueta
ĩ	i~	nasal close front unrounded vowel	cinto
o	o	close-mid back rounded vowel	poder
õ	o~	nasal close-mid back rounded vowel	compra
ɔ	O	open-mid back rounded vowel	cotó
u	u	close back rounded vowel	fui
ũ	u~	nasal close back rounded vowel	sunto

prosody

Modifies the volume, pitch, and rate of the tagged speech.

Attribute Possible values

rate

Modify the rate of the speech.
Accepted values:

x-slow, slow, medium, fast, x-fast: Set the rate to a predefined value.
n%: specify a percentage to increase or decrease the speed of the speech:
- 100% indicates no change from the normal rate.
- Percentages greater than 100% increase the rate.
- Percentages below 100% decrease the rate.
- The minimum value you can provide is 20%.

pitch

Raise or lower the tone (pitch) of the speech.
Accepted values:

x-low, low, medium, high, x-high: Set the pitch to a predefined value.
+n%: Increase the pitch by the specified percentage. For example: +10%, +5%. The maximum value allowed is +50%. A value higher than +50% is rendered as +50%.
-n%: Decrease the pitch by the specified percentage. For example: -10%, -20%. The smallest value allowed is -33.3%. A value lower than -33.3% is rendered as -33.3%.

Note: When you modify the speech with the pitch tag, Alexa uses a legacy text-to-speech system, which might change the speech sound quality.

volume

Change the volume for the speech.
Accepted values:

silent, x-soft, soft, medium, loud, x-loud: Set volume to a predefined value for current voice.
+ndB: Increase volume relative to the current volume level. For example, +0dB means no change of volume. +6dB is approximately twice the current amplitude. The maximum positive value is about +4.08dB.
-ndB: Decrease the volume relative to the current volume level. For example, -6dB means approximately half the current amplitude.

You can combine <prosody> with all other tags when you set the rate and/or volume attributes. When you use the pitch attribute, you can't combine <prosody> with the tags shown in incompatible tags.

s

Represents a sentence. This tag provides strong breaks before and after the tag.

This is equivalent to:

Ending a sentence with a period (.).
Specifying a pause with <break strength="strong"/>.

say-as

Describes how the text should be interpreted. This lets you provide additional context to the text and eliminate any ambiguity on how Alexa renders the text. Indicate how Alexa should interpret the text with the interpret-as attribute.

Attribute Possible values

interpret-as

Specify how to interpret the text.
Accepted values:

characters, spell-out: Spell out each letter.
cardinal, number: Interpret the value as a cardinal number.
ordinal: Interpret the value as an ordinal number.
digits: Spell each digit separately .
fraction: Interpret the value as a fraction. This works for both common fractions (such as 3/20) and mixed fractions (such as 1+1/2).
unit: Interpret a value as a measurement. The value should be either a number or fraction followed by a unit (with no space in between) or just a unit.
date: Interpret the value as a date. Specify the format with the format attribute.
time: Interpret a value such as 1'21" as duration in minutes and seconds.
telephone: Interpret a value as a 7-digit or 10-digit telephone number. This can also handle extensions (for example, 2025551212x345).
address: Interpret a value as part of street address.
interjection: Interpret the value as an interjection. Alexa speaks the text in a more expressive voice. For optimal results, only use the supported interjections and surround each speechcon with a pause. For example: <say-as interpret-as="interjection">Wow.</say-as>. Speechcons are supported for the languages listed below.
expletive: "Bleep" out the content inside the tag.

format

Applies when interpret-as is set to date. Set to one of the following to indicate format of the date:

mdy
dmy
ymd
md
dm
ym
my
d
m
y

Alternatively, if you provide the date in YYYYMMDD format, the format attribute is ignored.

Include question marks (?) for portions of the date to leave out. For instance, Alexa speaks <say-as interpret-as="date">????0922</say-as> as "September twenty-second." For an example, see Example: Provide a date without the year.

Alexa attempts to interpret the provided text correctly based on the formatting even without this tag. For example, if your output speech includes "202-555-1212", Alexa interprets the number as a phone number and speaks each individual digit, with a brief pause for each dash. The <say-as interpret-as="telephone"> tag isn't necessary. However, if you provided the text "2025551212", but you wanted Alexa to speak it as a phone number, you must use <say-as interpret-as="telephone">.

Example: Telephone number

The following example shows the telephone attribute for interpret-as.

Example: Spell out words and numbers

The following example shows how to use interpret-as to spell out words and numbers. Note that Alexa interprets the number 12345 as a cardinal number automatically.

Example: Use an interjection (speechcon)

The following example shows the difference between normal speech and a speechcon.

Example: Provide a date without the year

The following example shows how Alexa interprets a date with question marks (?).

Supported speechcons

Speechcons are language specific. For more details about the available speechcons for each skill language, see the following references:

speak

The root element of an SSML document. When you use SSML with the Alexa Skills Kit, surround the text to be spoken with the <speak> tag.

sub

Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias attribute.

Attribute	Possible values
`alias`	The word or phrase to speak in place of the tagged text.

The following example replaces the abbreviated chemical elements with the full words.

voice

Speak the text with the specified Amazon Polly voice. Each listed voice has its own individual character. For advice about how to use different voices in your skill, see Best Practices for Using Amazon Polly Voices.

You can combine <voice> with all other tags, except for those listed in incompatible tags.

Attribute Possible values

name

The name of a supported Amazon Polly voice. Voice are specific to locale. To speak content in the same language as your skill, choose a voice supported for the locale of your skill. To speak content in a different language, combine the <voice> tag with the <lang> tag.

For example, in an English (US) skill, use an en-US voice for English content. For French content, you can use an fr-FR voice, but combine the <voice> tag with <lang> to speak the content properly in French.

For the list of supported voices for each locale, see Supported Amazon Polly voices.

Supported Amazon Polly voices

This table lists the Amazon Polly voices supported by Alexa. Voice names don't contain accented characters. Use a voice supported for the skill locale or use the voice with the <lang> tag.

To comply with Alexa skill policies, don't expose the Amazon-assigned name of an Amazon Polly voice to users.

Locale	Supported voices
English, American (en-US)	`Ivy`, `Joanna`, `Joey`, `Justin`, `Kendra`, `Kimberly`, `Matthew`, `Salli`
English, Australian (en-AU)	`Nicole`, `Russell`
English, British (en-GB)	`Amy`, `Brian`, `Emma`
English, Indian (en-IN)	`Aditi`, `Raveena`
English, Welsh (en-GB-WLS)	`Geraint`
French, Canadian (fr-CA)	`Chantal`
French, France (fr-FR)	`Celine`, `Lea`, `Mathieu`
German (de-DE)	`Hans`, `Marlene`, `Vicki`
Hindi (hi-IN)	`Aditi`
Italian (it-IT)	`Carla`, `Giorgio`, `Bianca`
Japanese (ja-JP)	`Mizuki`, `Takumi`
Portuguese, Brazilian (pt-BR)	`Vitoria`, `Camila`, `Ricardo`
Spanish, American (es-US)	`Penelope`, `Lupe`, `Miguel`
Spanish, Castilian (es-ES)	`Conchita`, `Enrique`, `Lucia`
Spanish, Mexican (es-MX)	`Mia`

Example: Standard Alexa voice and a specified Amazon Polly voice

In this example, assume this sample is from an en-US skill, and because Kendra is an en-US voice, no <lang> tag is required. If this sample was from a skill that doesn't have an en-US locale, then the <lang> tag should be added and set to en-US.

Example: Different voices in a dialog

The following example provides a dialog between an en-US voice and an en-GB voice, such as might occur if a story with two different characters were being read. The standard Alexa voice, which varies by locale, speaks the first and last sentence.

Example: French content in an English skill

In the following example, assume the locale of this skill is for an English-speaking region. Because Celine is an fr-FR voice, and you want this content spoken in French, set the <lang> tag to fr-FR.

<speak>
    Welcome to Ride Hailer. <voice name='Celine'><lang xml:lang='fr-FR'>Bienvenue à Ride Hailer</lang></voice>
    You can order a ride, or request a fare estimate.
    Which will it be?
</speak>

Tips for using Amazon Polly voices

Although all Amazon Polly voices use approximately the same volume, users might perceive some voices as louder or quieter than Alexa voices. Use the prosody tag to modify the volume, rate, and pitch of the voice you have chosen. Use other SSML tags to modify the spoken output.

You can enhance your skills with responses that include one or more Amazon Polly voices, as well as the default Alexa voice, and you can choose specific voices for specific responses. Refer to User Experience Guidelines for the Use of Amazon Polly Voices in Your Skills for guidance on using Amazon Polly voices in your skills.

There is no charge for Alexa developers to use Amazon Polly voices.

The locale of a skill refers to a combination of region and language, and all of the Amazon Polly voices are tagged with a locale. For example, the en-AU locale refers to the English language in Australia, whereas en-IN refers to the English language in India. You select the locale of your skill when you first create it.

To achieve the best results, if the voice you select is for a different locale than that specified by your skill, use the <lang> tag to specify the language. For more details, see lang tag.

Be mindful of the user experience if you combine voices from different locales in your skill responses.

Node.js sample code for voice

If building a Node.js skill, this switchVoice function can wrap speech output with <voice> tags to get a specific voice. If you use the Alexa Skills Kit SDK for Node.js, the SDK automatically wraps speech output in the <speak> tag.

function switchVoice(text,voice_name) {
  if (text){
    return "<voice name='" + voice_name + "'>" + text + "</voice>"
  }
}

Here is some sample speech output from a skill using multiple voices with the switchVoice function.

const speechOutput = "I am Alexa." + switchVoice("I am Matthew.","Matthew") + switchVoice("I am Kendra.","Kendra") + switchVoice("and I am Ivy.","Ivy") + "Don't we make a great team?"

If you want all of the skill responses to be in a particular voice, make sure that all speech outputs from the skill are specified as SSML and are wrapped with the appropriate <voice> tag.

w

Similar to <say-as>, the <w> tag customizes the pronunciation of words by specifying the word's part of speech.

Attribute Possible values

role

Specify the part of speech for the word.
Accepted values:

amazon:VB: Interpret the word as a verb (present simple).
amazon:VBD: Interpret the word as a past participle.
amazon:NN: Interpret the word as a noun.
amazon:SENSE_1: Use the non-default sense of the word. For example, the noun "bass" is pronounced differently depending on meaning. The "default" meaning is the lowest part of the musical range. The alternate sense, which is still a noun, is a freshwater fish. Specifying <speak><w role="amazon:SENSE_1">bass</w>"</speak> renders the non-default pronunciation (freshwater fish).

The following example shows the amazon:VB and amazon:VBD values for role.

The follow example shows the amazon:SENSE_1 value for role.

Note that these tags previously used the ivona namespace in the attribute names. The tags are backwards compatible, so existing SSML written with the ivona namespace continues to work.

Was this page helpful?

Provide feedback

Last updated: Aug 28, 2025

Speech Synthesis Markup Language (SSML) Reference

About SSML

Use SSML in your response

Supported SSML tags

Apply multiple SSML tags to the same speech

Incompatible tags

amazon:domain

Examples

Best practices for the amazon:domain tag

amazon:effect

amazon:emotion

Examples of amazon:emotion combined with other SSML tags

Best practices for the <amazon:emotion> tag

audio

Converting audio files to an Alexa-friendly format

Hosting the audio files for your skill

HTTP Live Streaming (HLS) of audio files

break

emphasis

lang

Supported locales for the xml:lang attribute

p

phoneme

Supported symbols

prosody

s

say-as

Example: Telephone number

Example: Spell out words and numbers

Example: Use an interjection (speechcon)

Example: Provide a date without the year

Supported speechcons

speak

sub

voice

Supported Amazon Polly voices

Example: Standard Alexa voice and a specified Amazon Polly voice

Example: Different voices in a dialog

Example: French content in an English skill

Tips for using Amazon Polly voices

Node.js sample code for voice

w

Related topics