Speech Synthesis Markup Language (SSML) Reference
You can use Speech Synthesis Markup Language (SSML) in your output speech response to control how Alexa generates the speech. For example, you can add pauses and other speech effects.
About SSML
When the your skill returns a response to a user's request, you provide text that the Alexa service converts to speech. Alexa automatically handles normal punctuation, such as pausing after a period, or speaking a sentence ending in a question mark as a question.
However, in some cases you may want additional control over how Alexa generates the speech from the text in your response. For example, you may want a longer pause within the speech, or you may want a string of digits read back as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support.
SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech. The Alexa Skills Kit supports a subset of the tags defined in the SSML specification. The specific tags supported are listed in Supported SSML Tags.
Use SSML in your response
To use SSML, construct your output speech using the supported SSML tags. When sending back a response from your service, you must indicate that it is using SSML rather than plain text. If you construct the JSON response directly, provide the marked-up text in the outputSpeech
property, but set the type
to SSML
instead of PlainText
. Use the ssml
property instead of text
for the marked-up text:
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>This output speech uses SSML.</speak>"
}
You can use SSML with both the normal output speech response and any re-prompt included in the response.
If you use the Alexa Skills Kit SDK for Node.js or Alexa Skills Kit SDK for Java, you do not have to include the speak
tag for the SSML you provide, as that is handled by the SDK. Otherwise, the SSML you provide must be wrapped within <speak>
tags. For example:
<speak>
Here is a number <w role="amazon:VBD">read</w>
as a cardinal number:
<say-as interpret-as="cardinal">12345</say-as>.
Here is a word spelled out:
<say-as interpret-as="spell-out">hello</say-as>.
</speak>
In the JSON output for the SSML, ensure either that quotation marks within the output are escaped, or use an appropriate mix of single and double quotation marks. Here single quotation marks are used for attributes, and the entire response is wrapped in double quotation marks.
{
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>
Here is a number <w role='amazon:VBD'>read</w>
as a cardinal number:
<say-as interpret-as='cardinal'>12345</say-as>.
Here is a word spelled out:
<say-as interpret-as='spell-out'>hello</say-as>.</speak>"
}
}
Supported SSML tags
The Alexa Skills Kit supports the following SSML tags (listed in alphabetic order):
amazon:domain
amazon:effect
amazon:emotion
audio
break
emphasis
lang
p
phoneme
prosody
s
say-as
speak
sub
voice
w
Note that the Alexa service strips out any unsupported SSML tags included in the text you provide.
Apply multiple SSML tags to the same speech
You can combine most supported tags with each other to apply multiple effects on the speech. For instance, this example uses both the say-as
and amazon:emotion
tags. This tells Alexa to speak the entire string in an "excited" voice, and speak the provided number as individual digits:
<speak>
<amazon:emotion name="excited" intensity="medium">
Five seconds till lift off!
<say-as interpret-as="digits">54321</say-as>.
Lift off!
</amazon:emotion>
</speak>
Incompatible tags
Not all tags can be combined. The following tags cannot be applied to the same speech:
amazon:domain
- You must combine
<amazon:domain name="conversational">
with the<voice>
tag and the Matthew or Joanna voice. Theconversational
style doesn't work with other voices, and it doesn't work on its own without<voice>
. - You can combine
<amazon:domain name="news">
with the<voice>
tag and the Matthew, Joanna, and Lupe voices. Thenews
style doesn't work with other voices. - You can't combine
<amazon:domain name="long-form">
,<amazon:domain name="music">
, or<amazon:domain name="fun">
with<voice>
.
- You must combine
amazon:emotion
-
speechcons
Speechcons are implemented as the
say-as
tag withinterpret-as
set tointerjection
, for example:<say-as interpret-as="interjection">wow</say-as>
.Note that
say-as
can be combined with other tags when you use other values for theinterpret-as
attribute. For example, you could combineamazon:emotion
oremphasis
with<say-as interpret-as="ordinal">1</say-as>
. voice
- You can combine
voice
with theamazon:domain
tag with the restrictions noted previously. - You can't combine
voice
with any of the other tags listed here.
- You can combine
emphasis
prosody
with thepitch
attribute (for example,<prosody pitch="x-low">…</pitch>
)
For example, the following examples don't work:
Invalid SSML: voice
used within amazon:emotion
<speak>
<amazon:emotion name="disappointed" intensity="medium">
I want to tell you a secret.
<voice name="Kendra">I am not a real human.</voice>.
Can you believe it?
</amazon:emotion>
</speak>
Invalid SSML: amazon:emotion
used within voice
<speak>
I want to tell you a secret.
<voice name="Kendra">
<amazon:emotion name="disappointed" intensity="medium">
I am not a real human.
</amazon:emotion>
</voice>.
Can you believe it?
</speak>
Incompatible voice used with conversational or news style
<speak>
<voice name="Kendra">
<amazon:domain name="conversational">I really didn't know how this morning was going to start. And if I had known, I think I might have just stayed in bed.</amazon:domain>
</voice>
</speak>
Invalid: Conversational style used without <voice>
tag
<speak>
<amazon:domain name="conversational">I really didn't know how this morning was going to start. And if I had known, I think I might have just stayed in bed.</amazon:domain>
</speak>
You can use the incompatible tags in the same speak
string, as long as they are not applied to the same text string. For example, the following combination is valid:
<speak>
<amazon:emotion name="disappointed" intensity="medium">
I want to tell you a secret.
</amazon:emotion>
<voice name="Kendra">
I am not a real human.
</voice>
<amazon:emotion name="disappointed" intensity="high">Can you believe it?</amazon:emotion>
</speak>
In this case, the first line is spoken in a disappointed voice, the second line is spoken in the "Kendra" voice, and the final line uses the disappointed voice again.
amazon:domain
amazon:domain
tag is available in the following locales: English (US), English (UK), English (CA), English (AU), German (DE), and Japanese (JP). Not all styles are available in all locales.Applies different speaking styles to the speech. The styles are curated text-to-speech voices that use different variations of intonation, emphasis, pausing, and other techniques to match the speech to the content. For example, the news
style makes Alexa's voice sound like what you might expect to hear in a TV or radio newscast, and was built primarily for customers to listen to news articles and other news-based content.
The amazon:domain
tag takes the following required parameters:
Attribute | Possible Values | |
---|---|---|
|
The name of the speaking style to apply to the speech. Available styles:
|
Examples
news
<speak>
<amazon:domain name="news">
Latest news: The conversational and news styles are now available for the Matthew or Joanna voices!
</amazon:domain>
</speak>
music
<speak>
<amazon:domain name="music">
Sweet Child O' Mine by Guns N' Roses became one of their most
successful singles topping the billboard Hot 100 in 1988. Slash's
guitar solo on this song was ranked the 37th greatest solo of all
time. Here's Sweet Child O' Mine.
</amazon:domain>
</speak>
long-form
<speak>
<amazon:domain name="long-form">
Meet Echo Dot. Our most popular Echo is now even better.
With a new speaker and design, Echo Dot is a voice-controlled smart speaker with Alexa,
perfect for any room. Just ask for music, news, information, and more.
You can also call almost anyone and control compatible smart home devices with your voice.
</amazon:domain>
</speak>
fun
<speak>
<amazon:domain name="fun">
布団が、ふっとんだ。
</amazon:domain>
</speak>
news combined with voice
This example uses two different voices in the same response.
<speak>
<voice name="Matthew">
<amazon:domain name="news">
Latest news: The conversational and news styles are now available for the Matthew or Joanna voices!
</amazon:domain>
</voice>
<voice name="Joanna">
<amazon:domain name="conversational">
That was all for today. Thank you.
</amazon:domain>
</voice>
</speak>
conversational combined with voice
The <amazon:domain name="conversational">
works with with the <voice>
tag and the Matthew and Joanna voices. You can't use conversational
without the <voice>
tag.
<voice name="Matthew">
<amazon:domain name="conversational">I really didn't know how this morning was going to start. And if I had known, I think I might have just stayed in bed.
</amazon:domain>
</voice>
You can combine amazon:domain
with all other tags, except for those listed in incompatible tags.
Best practices for the amazon:domain tag
These recommendations can help you build a better customer experience with the amazon:domain
tag:
- Use Alexa's default voice without the
amazon:domain
tag in the intro to your skill. This sets a "baseline", so the specialized speaking styled responses later have more impact. - Don't overdo the use of Alexa's speaking styles, as this might create a poor or unpleasant user experience. For example, don't switch between different speaking styles frequently.
- Test how your responses sound with a device or the simulator in the developer console and verify that Alexa's voice and speaking style is appropriate for the response.
amazon:effect
Applies Amazon-specific effects to the speech.
Attribute | Possible Values |
---|---|
|
The name of the effect to apply to the speech. Available effects:
|
<speak>
I want to tell you a secret.
<amazon:effect name="whispered">I am not a real human.</amazon:effect>.
Can you believe it?
</speak>
amazon:emotion
amazon:emotion
tag is available in the following locales: English (US), English (UK), German (DE), and Japanese (JP).The amazon:emotion
tag causes Alexa to express emotion when speaking. This can be useful for stories, games, news updates and other narrative content. For instance, in a game, you might use the "excited" emotion for correct answers and the "disappointed" emotion for incorrect answers.
The amazon:emotion
tag takes the following required parameters:
Attribute | Possible Values |
---|---|
|
The name of the emotion to apply to the speech. Available emotions:
|
|
The intensity or strength of the emotion to express. Possible values:
|
Examples
<speak>
<amazon:emotion name="excited" intensity="medium">
Christina wins this round!
</amazon:emotion>
</speak>
<speak>
<amazon:emotion name="disappointed" intensity="high">
Here I am with a brain the size of a planet
and they ask me to pick up a piece of paper.
</amazon:emotion>
</speak>
Examples of amazon:emotion combined with other SSML tags
You can combine amazon:emotion
with all other tags, except for those listed in incompatible tags.
For example, this adds a three-second pause in the middle of speech with the "excited" emotion:
<speak>
<amazon:emotion name="excited" intensity="medium">
Okay, let's be mindful and take a deep breath.
<break time="3s"/>
Now don't we feel better?
</amazon:emotion>
</speak>
This example uses prosody
to increase the volume of the "disappointed" speech.
<speak>
This is how I normally speak.
<amazon:emotion name="disappointed" intensity="high">
This is how I speak when I am disappointed.
<prosody volume="x-loud">Now I am telling you I am disappointed very loudly!</prosody>
</amazon:emotion>
</speak>
Best practices for the amazon:emotion tag
These recommendations can help you build a better customer experience with the amazon:emotion
tag:
- Use Alexa's default voice without the
amazon:emotion
tag in the intro to your skill. This sets a "baseline", so the emotional responses later can have more impact. - Don't overuse emotional responses, as this can create a poor or unpleasant customer experience. Consider these guidelines:
- Don't switch between excited and disappointed extremely frequently.
- Don't use the emotions in every response.
- Try the
medium
intensity initially, then adjust the intensity as needed. Usingmedium
in most instances gives you more options for adjusting the intensity up or down depending on the situation. - Test how your responses sound with a device or the simulator in the developer console and ensure that Alexa's voice is appropriate for the response.
audio
The audio
tag lets you provide the URL for an MP3 file that the Alexa service can play while rendering a response. You can use this to embed short, pre-recorded audio within your service's response. For example, you could include sound effects alongside your text-to-speech responses, or provide responses using a voice associated with your brand.
<audio>
tag, see the Alexa Skills Kit Sound Library.Attribute | Possible Values |
---|---|
|
Specifies the URL for the MP3 file. Note the following requirements and limitations:
You may need to use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps). |
Include the audio
tag within your text-to-speech response within the speak
tag. Alexa plays the MP3 at the specified point within the text to speech. For example:
<speak>
Welcome to Ride Hailer.
<audio src="soundbank://soundlibrary/transportation/amzn_sfx_car_accelerate_01" />
You can order a ride, or request a fare estimate.
Which will it be?
</speak>
When Alexa renders this response, it would sound like this:
Alexa: Welcome to Ride Hailer.
(the specified amzn_sfx_car_accelerate_01.mp3
audio file plays)
Alexa: You can order a ride, or request a fare estimate. Which will it be?
A single response sent by your service can include multiple audio
tags according to the following limits:
- No more than five audio files can be used in a single response.
- The combined total time for all audio files in the
outputSpeech
property of the response cannot be more than 240 seconds. - The combined total time for all audio files in the
reprompt
property of the response cannot be more than 90 seconds.
Converting audio files to an Alexa-friendly format
You can use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps). One option is a command-line tool, FFmpeg.
This sample command converts the provided <input-file>
to an MP3 file that works with the audio
tag. This version uses 16000 as the sample rate:
ffmpeg -i <input-file> -ac 2 -codec:a libmp3lame -b:a 48k -ar 16000 -write_xing 0 <output-file>
You might get better quality by increasing the sample rate to 24000 like this:
ffmpeg -i <input-file> -ac 2 -codec:a libmp3lame -b:a 48k -ar 24000 -write_xing 0 <output-file>
See the documentation for FFmpeg for details about command line options.
Another option is Audacity:
- Open the file to convert.
- Set the Project Rate in the lower-left corner to
16000
. - Click File > Export Audio and change the Save as type to
MP3 Files
. - Click Options, set the Quality to
48 kbps
and the Bit Rate Mode toConstant
.
This requires the Lame library, which can be found at: http://lame.buanzo.org/#lamewindl.
Hosting the audio files for your skill
The MP3 files you use to provide audio must be hosted on an endpoint that uses HTTPS. The endpoint must provide an SSL certificate signed by an Amazon-approved certificate authority. Many content hosting services provide this. For example, you could host your files at a service such as Amazon Simple Storage Service (Amazon S3) (an Amazon Web Services offering).
We don't require that you authenticate the requests for the audio files. Therefore, you must not include any customer-specific or sensitive information in these audio files. For example, building a custom MP3 file in response to a user's request, and including sensitive information within the audio, is not allowed.
For optimal performance, we recommend that you host your MP3 files for SSML responses in close proximity to where your skill is hosted. For example, if the Lambda function for your skill is hosted in the US West (Oregon) region, you will get better performance if you upload your MP3s to a US West (Oregon) S3 bucket.
HTTP Live Streaming (HLS) of audio files
Alexa supports SSML audio
tags that point toward HTTP Live Streaming (HLS) streams, provided that the audio data conforms to the listed specifications. Due to the streaming approach that Alexa uses, there is no benefit to using HLS streams instead of statically served MP3 files. Furthermore, unlike with statically served MP3 files, an SSML response that contains an HLS stream that violates the 240-second duration limit will fail silently. This silent failure means that the playback is stopped before the limit is hit, no error message is generated on the customer device, and the skill does not receive an error request. If your skill uses SSML responses that contain HLS streams, ensure that you take particular care to test the audio returned in its responses.
break
Represents a pause in the speech. Set the length of the pause with the strength
or time
attributes.
Attribute | Possible Values |
---|---|
|
|
|
Duration of the pause; up to 10 seconds ( |
The default is medium
. This is used if you don't specify any attributes, or if you provide any unsupported attribute values.
<speak>
There is a three second pause here <break time="3s"/>
then the speech continues.
</speak>
emphasis
Emphasize the tagged words or phrases. Emphasis changes rate and volume of the speech. More emphasis is spoken louder and slower. Less emphasis is quieter and faster.
Attribute | Possible Values |
---|---|
|
|
<speak>
I already told you I
<emphasis level="strong">really like</emphasis>
that person.
</speak>
You can combine emphasis
with all other tags, except for those listed in incompatible tags.
emphasis
tag, Alexa uses a legacy text-to-speech system, which might change the speech sound quality. lang
Use lang
to specify the language model and rules to speak the tagged content as if it were written in the language specified by the xml:lang
attribute. Words and phrases in other languages usually sound better when enclosed with the lang
tag. This is useful for short phrases in other languages, such as the names of restaurants or shops.
For example, here is how to pronounce "Paris" using the language code fr-FR
(which refers to the French language as spoken in France).
<speak>
In Paris, they pronounce it <lang xml:lang="fr-FR">Paris</lang>
</speak>
Alexa adapts the pronunciation to use the sounds available in the original language of the skill, so it may not sound exactly like a native speaker. To achieve a more natural voice than what you get with the lang
tag alone, use the lang
tag and the voice tag together. With the voice tag, you can select a voice customized for a specific language. Thus, ensure the language of the tagged text matches the lang
attribute, and that the voice
attribute represents the language of the text also.
For example, consider the French phrase "J'adore chanter" in an English (US) skill, using the lang
tag without the voice
tag. Alexa speaks the phrase with English-like pronunciation.
No <lang> Tag | With <lang> Tag Set to French |
---|---|
|
Here, Alexa uses French pronunciation with sounds available in English for a "French-like" pronunciation: A perfect French pronunciation would include a uvular trill (/R/) in the word "adore." The French-like English pronunciation achieved with the |
Supported locales for the xml:lang attribute
The following locales are supported:
de-DE
en-AU
en-CA
en-GB
en-IN
en-US
es-ES
es-MX
es-US
fr-CA
fr-FR
hi-IN
it-IT
ja-JP
pt-BR
p
Represents a paragraph. This tag provides extra-strong breaks before and after the tag. This is equivalent to specifying a pause with <break strength="x-strong"/>
.
<speak>
<p>This is the first paragraph. There should be a pause after this text is spoken.</p>
<p>This is the second paragraph.</p>
</speak>
phoneme
Provides a phonemic/phonetic pronunciation for the contained text. For example, people may pronounce words like "pecan" differently.
Attribute | Possible Values |
---|---|
|
Set to the phonetic alphabet to use:
|
|
The phonetic pronunciation to speak. See below for a list of supported symbols in each of the supported skill languages. |
When using this tag, Alexa uses the pronunciation provided in the ph
attribute rather than the text contained within the tag. However, you should still provide human-readable text within the tags. In the following example, the word "pecan" shown within the tags is never spoken. Instead, Alexa speaks the text provided in the ph
attribute:
<speak>
You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
</speak>
Additional examples of writing words with a phonetic alphabet:
Word | IPA | X-SAMPA |
---|---|---|
bottle | ˈbɑ.təl | "bA.t@l |
frozen | ˈfɹoʊ.zən | "fr\oU.z@n |
blossom | ˈblɑ.səm | "blA.s@m |
Supported symbols
The following tables list the supported symbols for use with the phoneme
tag. The symbols are specific to the skill's language.
These symbols provide full coverage for the sounds of Arabic (SA). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Arabic (SA) skills is discouraged, as it may result in suboptimal speech synthesis.
X-SAMPA | IPA | Description | Example | Pronunciation |
---|---|---|---|---|
? |
ʔ |
glottal stop |
أَنَا |
/ " ʔ a n a: / |
b |
b |
voiced bilabial plosive |
بَلَد |
/ " b a l a d / |
t |
t |
voiceless alveolar plosive |
تَمَر |
/ " t a m a r / |
T |
θ |
voiceless interdental fricative |
ثَلَاثَة |
/ T a " l a: T a / |
dZ |
ʤ |
voiced postalveolar affricate |
جَمِيل |
/ dZ a " m i: l / |
X\ |
ħ |
voiceless pharyngeal fricative |
حَيَوَان |
/ X\ a j a " w a: n / |
x |
x |
voiceless velar fricative |
خَرُوف |
/ x a " r u: f / |
d |
d |
voiced alveolar plosive |
دَرْدَار |
/ d a r " d a: r / |
D |
ð |
voiced interdental fricative |
ذَلِك |
/ " D a: l i k a / |
r |
r |
alveolar trill |
رَمْل |
/ " r a m l / |
z |
z |
voiced alveolar fricative |
زُهُور |
/ z u " h u: r / |
s |
s |
voiceless alveolar fricative |
سَمْسَم |
/ " s a m s a m / |
S |
ʃ |
voiceless postalveolar fricative |
شَمْس |
/ " S a m s / |
s_?\ |
sˤ |
pharyngealised voiceless alveolar fricative |
صَاحِب |
/ " s_?\ A_?: X\ i b / |
d_?\ |
dˤ |
pharyngealised voiced alveolar plosive |
ضَوْء |
/ " d_?\ a w ? / |
t_?\ |
tˤ |
pharyngealised voiceless alveolar plosive |
طَالِب |
/ " t_?\ A_?: l i b / |
D_?\ |
ðˤ |
pharyngealised voiced interdental fricative |
ظَلَام |
/ " D_?\ A_?\ l a: m / |
?\ |
ʕ |
voiced pharyngeal fricative |
عُمْر |
/ " ?\ u m r / |
G |
ɣ |
voiced velar fricative |
غَرْب |
/ " G a r b / |
f |
f |
voiceless labiodental fricative |
فَصْل |
/ " f A_?\ s_?\ l / |
q |
q |
voiceless uvular plosive |
قَصْر |
/ " q A_?\ s_?\ r / |
k |
k |
voiceless velar plosive |
كَامِل |
/ " k a: m i l / |
l |
l |
voiced alveolar lateral approximant |
لَيْل |
/ " l a j l / |
l_G |
lˤ |
pharyngealised voiced alveolar lateral approximant |
والله |
/ w A_?\ " l_G l_G A_?: h / |
m |
m |
bilabial nasal stop |
مَصْر |
/ " m A_?\ s_?\ r / |
n |
n |
alveolar nasal stop |
نُور |
/ " n u: r / |
h |
ɦ |
voiced glottal fricative |
هِلَال |
/ h i " l a: l / |
w |
w |
voiced labiovelar approximant |
وَلَد |
/ " w a l a d / |
j |
j |
voiced palatal approximant |
يُسْر |
/ " j u s r / |
g |
g |
voiced velar plosive |
إِنْجِلِتْرَا |
/ ? i N " g l i t r a: / |
v |
v |
voiced labiodental fricative |
فِيتَامِين |
/ v i: t A " m i: n / |
p |
p |
voiceless bilabial plosive |
أُوبِرَا |
/ " ? O p e r a: / |
N |
ŋ |
velar nasal stop |
ْهُونْغْ كُونْغ |
/ h O N " k O N g / |
Z |
ʒ |
voiced postalveolar fricative |
جاكيت |
/ Z a " k e: t / |
a |
æ |
mid-open front unrounded short vowel |
لَوْن |
/ " l a w n / |
A_?\ |
ɑˤ |
pharyngealised open back unrounded short vowel |
صَلْب |
/ " s_?\ A_?\ l b / |
a: |
æː |
mid-open front unrounded long vowel |
بَاب |
/ " b a: b / |
A_?: |
ɑˤː |
pharyngealised open back unrounded long vowel |
نَاضِج |
/ " n A_?: d_?\ i_?\ dZ / |
u |
u |
close back rounded short vowel |
شُرْب |
/ " S u r b / |
u_?\ |
uˤ |
pharyngealised close back rounded short vowel |
عُصْفُور |
/ ?\ u_?\ s_?\ " f u: r / |
u: |
uː |
close back rounded long vowel |
تُوت |
/ " t u: t / |
u_?: |
uˤː |
pharyngealised close back rounded long vowel |
صُور |
/ " s_?\ u_?: r / |
i |
i |
close front unrounded short vowel |
بِنْت |
/ " b i n t / |
i_?\ |
iˤ |
pharyngealised close front unrounded short vowel |
طِفْل |
/ " t_?\ i_?\ f l / |
i: |
iː |
close front unrounded long vowel |
سَبِيل |
/ s a " b i: l / |
i_?: |
iˤː |
pharyngealised close front unrounded long vowel |
رَطِيب |
/ r A_?\ " t_?\ i_?: b / |
A |
a |
open central unrounded short vowel |
wifi |
/ " w A j f A j / |
O |
ɔ |
open-mid back rounded short vowel |
دُولَار |
/ d O " l A r / |
O: |
ɔː |
open-mid back rounded long vowel |
تِلْفِزِيُون |
/ t i l f i z " j O: n / |
e |
e |
mid front unrounded short vowel |
إِنْتَرْنِت |
/ ? e n t a r " n a: t / |
e: |
eː |
mid front unrounded long vowel |
سِكْرِتِير |
/ s i k r i " t e: r / |
These symbols provide full coverage for the sounds of English (AU). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (AU) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bed |
d | d | voiced alveolar plosive | dig |
d͡ʒ | dZ | voiced postalveolar affricate | jump |
ð | D | voiced dental fricative | then |
f | f | voiceless labiodental fricative | five |
g | g | voiced velar plosive | game |
h | h | voiceless glottal fricative | house |
j | j | palatal approximant | yes |
k | k | voiceless velar plosive | cat |
l | l | alveolar lateral approximant | lay |
m | m | bilabial nasal | mouse |
n | n | alveolar nasal | nap |
ŋ | N | velar nasal | thing |
p | p | voiceless bilabial plosive | speak |
ɹ | r\ | alveolar approximant | red |
s | s | voiceless alveolar fricative | seem |
ʃ | S | voiceless postalveolar fricative | ship |
t | t | voiceless alveolar plosive | trap |
t͡ʃ | tS | voiceless postalveolar affricate | chart |
θ | T | voiceless dental fricative | thin |
v | v | voiced labiodental fricative | vest |
w | w | labial-velar approximant | west |
z | z | voiced alveolar fricative | zero |
ʒ | Z | voiced postalveolar fricative | vision |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ə | @ | mid central vowel | arena |
æ | { | near-open front unrounded vowel | trap |
aɪ | aI | diphthong | price |
aʊ | aU | diphthong | mouth |
ɑ | A | long open back unrounded vowel | father |
eɪ | eI | diphthong | face |
ɜ | 3 | open-mid central unrounded vowel | nurse |
ɛ | E | open-mid front unrounded vowel | dress |
i | i | long close front unrounded vowel | fleece |
ɪ | I | near-close near-front unrounded vowel | kit |
əʊ | @U | diphthong | goat |
ɔ | O | long open-mid back rounded vowel | thought |
ɔɪ | OI | diphthong | choice |
u | u | long close back rounded vowel | goose |
ʊ | U | near-close near-back rounded vowel | foot |
ʌ | V | open-mid back unrounded vowel | strut |
ɒ | Q | open back rounded vowel | bother |
ɛə | E@ | diphthong | bear |
ɪə | I@ | diphthong | beer |
ʊə | U@ | diphthong | tour |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ˈ | " | primary stress | Alabama |
ˌ | % | secondary stress | Alabama |
. | . | syllable boundary | A.la.ba.ma |
These symbols provide full coverage for the sounds of English (Canada). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (Canada) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bed |
d | d | voiced alveolar plosive | dig |
d͡ʒ | dZ | voiced postalveolar affricate | jump |
ð | D | voiced dental fricative | then |
f | f | voiceless labiodental fricative | five |
g | g | voiced velar plosive | game |
h | h | voiceless glottal fricative | house |
j | j | palatal approximant | yes |
k | k | voiceless velar plosive | cat |
l | l | alveolar lateral approximant | lay |
m | m | bilabial nasal | mouse |
n | n | alveolar nasal | nap |
ŋ | N | velar nasal | thing |
p | p | voiceless bilabial plosive | speak |
ɹ | r\ | alveolar approximant | red |
s | s | voiceless alveolar fricative | seem |
ʃ | S | voiceless postalveolar fricative | ship |
t | t | voiceless alveolar plosive | trap |
t͡ʃ | tS | voiceless postalveolar affricate | chart |
θ | T | voiceless dental fricative | thin |
v | v | voiced labiodental fricative | vest |
w | w | labial-velar approximant | west |
z | z | voiced alveolar fricative | zero |
ʒ | Z | voiced postalveolar fricative | vision |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ə | @ | mid central vowel | arena |
ɚ | @` | mid central r-colored vowel | reader |
æ | { | near-open front unrounded vowel | trap |
aɪ | aI | diphthong | price |
aʊ | aU | diphthong | mouth |
ɑ | A | long open back unrounded vowel | father |
eɪ | eI | diphthong | face |
ɝ | 3` | open-mid central unrounded r-colored vowel | nurse |
ɛ | E | open-mid front unrounded vowel | dress |
i | i | long close front unrounded vowel | fleece |
ɪ | I | near-close near-front unrounded vowel | kit |
oʊ | oU | diphthong | goat |
ɔ | O | long open-mid back rounded vowel | thought |
ɔɪ | OI | diphthong | choice |
u | u | long close back rounded vowel | goose |
ʊ | U | near-close near-back rounded vowel | foot |
ʌ | V | open-mid back unrounded vowel | strut |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ˈ | " | primary stress | Alabama |
ˌ | % | secondary stress | Alabama |
. | . | syllable boundary | A.la.ba.ma |
These symbols provide full coverage for the sounds of English (India). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (India) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bed |
d | d | voiced alveolar plosive | dig |
d͡ʒ | dZ | voiced postalveolar affricate | jump |
ð | D | voiced dental fricative | then |
f | f | voiceless labiodental fricative | five |
g | g | voiced velar plosive | game |
h | h | voiceless glottal fricative | house |
j | j | palatal approximant | yes |
k | k | voiceless velar plosive | cat |
l | l | alveolar lateral approximant | lay |
m | m | bilabial nasal | mouse |
n | n | alveolar nasal | nap |
ŋ | N | velar nasal | thing |
p | p | voiceless bilabial plosive | speak |
ɹ | r\ | alveolar approximant | red |
s | s | voiceless alveolar fricative | seem |
ʃ | S | voiceless postalveolar fricative | ship |
t | t | voiceless alveolar plosive | trap |
t͡ʃ | tS | voiceless postalveolar affricate | chart |
θ | T | voiceless dental fricative | thin |
v | v | voiced labiodental fricative | vest |
w | w | labial-velar approximant | west |
z | z | voiced alveolar fricative | zero |
ʒ | Z | voiced postalveolar fricative | vision |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ə | @ | mid central vowel | arena |
æ | { | near-open front unrounded vowel | trap |
aɪ | aI | diphthong | price |
aʊ | aU | diphthong | mouth |
ɑ | A | long open back unrounded vowel | father |
eɪ | eI | diphthong | face |
ɜ | 3 | open-mid central unrounded vowel | nurse |
ɛ | E | open-mid front unrounded vowel | dress |
i | i | long close front unrounded vowel | fleece |
ɪ | I | near-close near-front unrounded vowel | kit |
əʊ | @U | diphthong | goat |
ɔ | O | long open-mid back rounded vowel | thought |
ɔɪ | OI | diphthong | choice |
u | u | long close back rounded vowel | goose |
ʊ | U | near-close near-back rounded vowel | foot |
ʌ | V | open-mid back unrounded vowel | strut |
ɒ | Q | open back rounded vowel | bother |
ɛə | E@ | diphthong | bear |
ɪə | I@ | diphthong | beer |
ʊə | U@ | diphthong | tour |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ˈ | " | primary stress | Alabama |
ˌ | % | secondary stress | Alabama |
. | . | syllable boundary | A.la.ba.ma |
IPA | XSAMPA | Description | Examples |
---|---|---|---|
pʰ | p_h | voiceless aspirated bilabial plosive | फूल (phool) |
bʱ | b_h | voiced aspirated bilabial plosive | भारी (bhaari) |
t̪ | t_d | voiceless dental plosive | तापमान (taapmaan) |
t̪ʰ | t_d_h | voiceless aspirated dental plosive | थोड़ा (thoda) |
d̪ | d_d | voiced dental plosive | दिल्ली (dilli) |
d̪ʱ | d_d_h | voiced aspirated dental plosive | धोबी (dhobi) |
ʈ | t` | voiceless retroflex plosive | कटोरा (katora) |
ʈʰ | t`_h | voiceless aspirated retroflex plosive | ठंड (thand) |
ɖ | d` | voiced retroflex plosive | डर (darr) |
ɖʱ | d`_h | voiced aspirated retroflex plosive | ढाल (dhal) |
tʃʰ | tS_h | voiceless aspirated palatal affricate | छाल (chaal) |
dʒʱ | dZ_h | voiced aspirated palatal affricate | झाल (jhaal) |
kʰ | k_h | voiceless aspirated velar plosive | खान (khan) |
ɡʱ | g_h | voiced aspirated velar plosive | घान (ghaan) |
ɳ | n` | retroflex nasal | क्षण (kshan) |
ɾ | 4 | alveolar flap | राम (ram) |
ɽ | r` | plain retroflex flap | बड़ा (bada) |
ɽʱ | r`_h | voiced aspirated retroflex flap | बढ़ी (barhi) |
ʋ | v\ | bilabial approximant | वसूल (wasool) |
IPA | XSAMPA | Description | Examples |
---|---|---|---|
ə | @_o | mid central vowel | अच्छा (achhaa) |
ə̃ | @~ | nasalised mid central vowel | हँसना (hansnaa) |
a | A_o | open front unrounded vowel | आग (aag) |
ã | A~ | nasalised open front unrounded vowel | घड़ियाँ (ghariyaan) |
ɪ | I_o | near-close near-front unrounded vowel | इक्कीस (ikkees) |
ɪ̃ | I~ | nasalised near-close near front unrounded vowel | सिंचाई (sinchai) |
i | i_o | close front unrounded vowel | बिल्ली (billee) |
ĩ | i~ | nasalised close front unrounded vowel | नहीं (nahin) |
ʊ | U_o | near-close near-back rounded vowel | उल्लू (ullu) |
ʊ̃ | U~ | nasalised near-close near-back rounded vowel | मुँह (munh) |
u | u_o | close back rounded vowel | फूल (phool) |
ũ | u~ | nasalised close back rounded vowel | ऊँट (oont) |
ɔ | O_o | open-mid back rounded vowel | कौन (kaun) |
ɔ̃ | O~ | nasalised open-mid back rounded vowel | भौं (bhaun) |
o | o | close-mid back rounded vowel | सोना (sona) |
õ | o~ | nasalised close-mid back rounded vowel | क्यों (kyon) |
ɛ | E_o | open-mid front unrounded vowel | पैसा (paisa) |
ɛ̃ | E~ | nasalised open-mid front unrounded vowel | मैं (main) |
e | e | close-mid front unrounded vowel | एक (ek) |
ẽ | e~ | nasalised close-mid front unrounded vowel | किताबें (kitabein) |
These symbols provide full coverage for the sounds of English (UK). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (UK) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bed |
d | d | voiced alveolar plosive | dig |
d͡ʒ | dZ | voiced postalveolar affricate | jump |
ð | D | voiced dental fricative | then |
f | f | voiceless labiodental fricative | five |
g | g | voiced velar plosive | game |
h | h | voiceless glottal fricative | house |
j | j | palatal approximant | yes |
k | k | voiceless velar plosive | cat |
l | l | alveolar lateral approximant | lay |
m | m | bilabial nasal | mouse |
n | n | alveolar nasal | nap |
ŋ | N | velar nasal | thing |
p | p | voiceless bilabial plosive | speak |
ɹ | r\ | alveolar approximant | red |
s | s | voiceless alveolar fricative | seem |
ʃ | S | voiceless postalveolar fricative | ship |
t | t | voiceless alveolar plosive | trap |
t͡ʃ | tS | voiceless postalveolar affricate | chart |
θ | T | voiceless dental fricative | thin |
v | v | voiced labiodental fricative | vest |
w | w | labial-velar approximant | west |
z | z | voiced alveolar fricative | zero |
ʒ | Z | voiced postalveolar fricative | vision |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ə | @ | mid central vowel | arena |
æ | { | near-open front unrounded vowel | trap |
aɪ | aI | diphthong | price |
aʊ | aU | diphthong | mouth |
ɑ | A | long open back unrounded vowel | father |
eɪ | eI | diphthong | face |
ɜ | 3 | open-mid central unrounded vowel | nurse |
ɛ | E | open-mid front unrounded vowel | dress |
i | i | long close front unrounded vowel | fleece |
ɪ | I | near-close near-front unrounded vowel | kit |
əʊ | @U | diphthong | goat |
ɔ | O | long open-mid back rounded vowel | thought |
ɔɪ | OI | diphthong | choice |
u | u | long close back rounded vowel | goose |
ʊ | U | near-close near-back rounded vowel | foot |
ʌ | V | open-mid back unrounded vowel | strut |
ɒ | Q | open back rounded vowel | bother |
ɛə | E@ | diphthong | bear |
ɪə | I@ | diphthong | beer |
ʊə | U@ | diphthong | tour |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ˈ | " | primary stress | Alabama |
ˌ | % | secondary stress | Alabama |
. | . | syllable boundary | A.la.ba.ma |
These symbols provide full coverage for the sounds of English (US). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (US) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bed |
d | d | voiced alveolar plosive | dig |
d͡ʒ | dZ | voiced postalveolar affricate | jump |
ð | D | voiced dental fricative | then |
f | f | voiceless labiodental fricative | five |
g | g | voiced velar plosive | game |
h | h | voiceless glottal fricative | house |
j | j | palatal approximant | yes |
k | k | voiceless velar plosive | cat |
l | l | alveolar lateral approximant | lay |
m | m | bilabial nasal | mouse |
n | n | alveolar nasal | nap |
ŋ | N | velar nasal | thing |
p | p | voiceless bilabial plosive | speak |
ɹ | r\ | alveolar approximant | red |
s | s | voiceless alveolar fricative | seem |
ʃ | S | voiceless postalveolar fricative | ship |
t | t | voiceless alveolar plosive | trap |
t͡ʃ | tS | voiceless postalveolar affricate | chart |
θ | T | voiceless dental fricative | thin |
v | v | voiced labiodental fricative | vest |
w | w | labial-velar approximant | west |
z | z | voiced alveolar fricative | zero |
ʒ | Z | voiced postalveolar fricative | vision |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ə | @ | mid central vowel | arena |
ɚ | @` | mid central r-colored vowel | reader |
æ | { | near-open front unrounded vowel | trap |
aɪ | aI | diphthong | price |
aʊ | aU | diphthong | mouth |
ɑ | A | long open back unrounded vowel | father |
eɪ | eI | diphthong | face |
ɝ | 3` | open-mid central unrounded r-colored vowel | nurse |
ɛ | E | open-mid front unrounded vowel | dress |
i | i | long close front unrounded vowel | fleece |
ɪ | I | near-close near-front unrounded vowel | kit |
oʊ | oU | diphthong | goat |
ɔ | O | long open-mid back rounded vowel | thought |
ɔɪ | OI | diphthong | choice |
u | u | long close back rounded vowel | goose |
ʊ | U | near-close near-back rounded vowel | foot |
ʌ | V | open-mid back unrounded vowel | strut |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ˈ | " | primary stress | Alabama |
ˌ | % | secondary stress | Alabama |
. | . | syllable boundary | A.la.ba.ma |
These symbols provide full coverage for the sounds of French (CA). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for French (CA) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bon |
d | d | voice alveolar plosive | deux |
f | f | voiceless labiodental fricative | faire |
g | ɡ | voiced velar plosive | garçon |
ɥ | H | labial-palatal approximant | huit |
j | j | palatal approximant | travail |
k | k | voiceless velar plosive | corps |
l | l | alveolar lateral approximant | laisser |
m | m | bilabial nasal | même |
n | n | alveolar nasal | nous |
ɲ | J | palatal nasal | gagner |
ŋ | N | velar nasal | camping |
p | p | voiceless bilabial plosive | père |
ʁ | R | voiced uvular fricative | regarder |
s | s | voiceless alveolar fricative | sans |
ʃ | S | voiceless postalveolar fricative | chance |
t | t | voiceless alveolar plosive | tout |
tʃ | tS | voiceless postalveolar affricate | ciao |
dʒ | dZ | voiced postalveolar affricate | Djakarta |
v | v | voiced labiodental fricative | vous |
w | w | labial-velar approximant | oui |
z | z | voiced alveolar fricative | zéro |
ʒ | Z | voiced postalveolar fricative | jamais |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
i | i | close front unrounded vowel | si |
y | y | close front rounded vowel | sûr |
ɪ | I | near-close front unrounded vowel | pipe |
ʏ | Y | near-close front rounded vowel | lutte |
e | e | close-mid front unrounded vowel | clé |
ø | 2 | close-mid front rounded vowel | ceux |
ɛ | E | open-mid front unrounded vowel | mettre |
ɛː | E: | long open-mid front unrounded vowel | maître |
œ | 9 | open-mid front rounded vowel | sœur |
a | a | open front unrounded vowel | patte |
ə | @ | mid central vowel | le |
u | u | close back rounded vowel | roue |
ʊ | U | near-close back rounded vowel | coupe |
o | o | close-mid back rounded vowel | bureau |
ɔ | O | open-mid back rounded vowel | minimum |
ɑ | A | open back unrounded vowel | châle |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ɑ̃ | A~ | nasalized open back unrounded vowel | champ |
ɛ̃ | E~ | nasalized open-mid front unrounded vowel | pain |
œ̃ | 9~ | nasalized open-mid front rounded vowel | parfum |
ɔ̃ | O~ | nasalized open-mid back rounded vowel | nom |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ɚ | @` | mid central r-colored vowel | reader |
æ | { | open-mid central unrounded vowel | trap |
ʌ | V | open-mid back unrounded vowel | bus |
m̩ | m= | syllabic bilabial nasal | rhythm |
n̩ | n= | syllabic alveolar nasal | griffon |
pʰ | p_h | aspirated voiceless bilabial plosive | power |
tʰ | t_h | aspirated voiceless alveolar plosive | torn |
kʰ | k_h | aspirated voiceless velar plosive | cage |
θ | T | voiceless dental fricative | cloth |
ð | D | voiced dental fricative | this |
h | h | voiceless glottal fricative | hello |
ɹ | r\ | alveolar approximant | rice |
ɫ | l_e | alveolar lateral approximant | feel |
These symbols provide full coverage for the sounds of French (FR). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for French (FR) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bon |
d | d | voice alveolar plosive | deux |
f | f | voiceless labiodental fricative | faire |
g | ɡ | voiced velar plosive | garçon |
ɥ | H | labial-palatal approximant | huit |
j | j | palatal approximant | travail |
k | k | voiceless velar plosive | corps |
l | l | alveolar lateral approximant | laisser |
m | m | bilabial nasal | même |
n | n | alveolar nasal | nous |
ɲ | J | palatal nasal | gagner |
ŋ | N | velar nasal | camping |
p | p | voiceless bilabial plosive | père |
ʁ | R | voiced uvular fricative | regarder |
s | s | voiceless alveolar fricative | sans |
ʃ | S | voiceless postalveolar fricative | chance |
t | t | voiceless alveolar plosive | tout |
tʃ | tS | voiceless postalveolar affricate | ciao |
dʒ | dZ | voiced postalveolar affricate | Djakarta |
v | v | voiced labiodental fricative | vous |
w | w | labial-velar approximant | oui |
z | z | voiced alveolar fricative | zéro |
ʒ | Z | voiced postalveolar fricative | jamais |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
a | a | open front unrounded vowel | patte |
e | e | close-mid front unrounded vowel | clé |
ɛ | E | open-mid front unrounded vowel | faite |
ə | @ | mid central vowel | le |
i | i | close front unrounded vowel | si |
œ | 9 | open-mid front rounded vowel | sœur |
ø | 2 | close-mid front rounded vowel | ceux |
o | o | close-mid back rounded vowel | bureau |
ɔ | O | open-mid back rounded vowel | minimum |
u | u | close back rounded vowel | roue |
y | y | close front rounded vowel | sûr |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ɑ̃ | A~ | nasalized open back unrounded vowel | champ |
ɛ̃ | E~ | nasalized open-mid front unrounded vowel | pain |
œ̃ | 9~ | nasalized open-mid front rounded vowel | parfum |
ɔ̃ | O~ | nasalized open-mid back rounded vowel | nom |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ð | D | voiced dental fricative | this |
h | h | voiceless glottal fricative | hello |
ɹ | r\ | alveolar approximant | rice |
θ | T | voiceless dental fricative | cloth |
These symbols provide full coverage for the sounds of German. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for German skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-Sampa | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | Bier |
d | d | voiced alveolar plosive | Dach |
ç | C | voiceless palatal fricative | ich |
d͡ʒ | dZ | voiced postalveolar affricate | Dschungel |
f | f | voiceless labiodental fricative | Vogel |
g | g | voiced velar plosive | Gabel |
h | h | voiceless glottal fricative | Haus |
j | j | palatal approximant | jemand |
k | k | voiceless velar plosive | Kleid |
l | l | alveolar lateral approximant | Loch |
m | m | bilabial nasal | Milch |
n | n | alveolar nasal | Natur |
ŋ | N | velar nasal | klingen |
p | p | voiceless bilabial plosive | Park |
p͡f | pf | voiceless labiodental affricate | Apfel |
ʀ | R | uvular trill | Regen |
s | s | voiceless alveolar fricative | Messer |
ʃ | S | voiceless postalveolar fricative | Fischer |
t | t | voiceless alveolar plosive | Topf |
t͡s | ts | voiceless alveolar affricate | Zahl |
t͡ʃ | tS | voiceless postalveolar affricate | deutsch |
v | v | voiced labiodental fricative | Wasser |
x | x | voiceless velar fricative | kochen |
z | z | voiced alveolar fricative | See |
ʒ | Z | voiced postalveolar fricative | Orange |
IPA | X-Sampa | Description | Examples |
---|---|---|---|
a | a | open front unrounded vowel | Salz |
aː | a: | long open front unrounded vowel | Sahne |
aʊ | aU | diphthong | Augen |
ə | @ | mid central vowel | Rede |
ɐ | 6 | near-open central vowel | besser |
aɪ | aI | diphthong | nein |
ɛ | E | open-mid front unrounded vowel | Kellner |
eː | e: | long close-mid front unrounded vowel | Rede |
øː | 2: | long close-mid front rounded vowel | böse |
ɪ | I | near-close near-front unrounded vowel | bitte |
iː | i: | long close front unrounded vowel | Lied |
ɔ | O | open-mid back rounded vowel | Koffer |
œ | 9 | open-mid front rounded vowel | können |
oː | o: | long close-mid back rounded vowel | Kohl |
ɔʏ | OY | diphthong | neu |
ʊ | U | near-close near-back rounded vowel | Wunder |
ʏ | Y | near-close near-front rounded vowel | Küche |
uː | u: | long close back rounded vowel | Bruder |
yː | y: | long close front rounded vowel | kühl |
IPA | X-Sampa | Examples |
---|---|---|
aɐ̯ | a6_^ | hart |
aːɐ̯ | a:6_^ | Haar |
ɛɐ̯ | E6_^ | Berg |
eːɐ̯ | e:6_^ | schwer |
øːɐ̯ | 2:6_^ | Nadelöhr |
ɪɐ̯ | I6_^ | Wirtschaft |
iːɐ̯ | i:6_^ | Tier |
ɔɐ̯ | O6_^ | dort |
œɐ̯ | 96_^ | Wörter |
oːɐ̯ | o:6_^ | Ohr |
ʊɐ̯ | U6_^ | Gurke |
ʏɐ̯ | Y6_^ | Türkei |
uːɐ̯ | u:6_^ | Kur |
yːɐ̯ | y:6_^ | Tür |
IPA | X-Sampa | Description | Examples |
---|---|---|---|
ð | D | voiced dental fricative | brother |
ɹ | r\ | alveolar approximant | ripe |
θ | T | voiceless dental fricative | north |
w | w | labial-velar approximant | well |
ɔː | O: | long open-mid back rounded vowel | callcenter |
eɪ | eI | diphthong | rating |
oʊ | oU | diphthong | windows |
IPA | X-Sampa | Description | Examples |
---|---|---|---|
ã: | a~: | nasalized long open front unrounded vowel | Croissant |
ɛ̃ː | E~: | nasalized long open-mid front unrounded vowel | Terrain |
õ: | o~: | nasalized long close-mid back rounded vowel | Annonce |
IPA | X-Sampa | Description | Examples |
---|---|---|---|
ˈ | " | primary stress | genau |
. | . | syllable boundary | ver.stan.den |
These symbols provide full coverage for the sounds of Hindi (IN). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Hindi (IN) skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | XSAMPA | Description | Examples |
---|---|---|---|
p | p | voiceless bilabial plosive | परिंदा |
pʰ | p_h | voiceless aspirated bilabial plosive | फूल (phool) |
b | b | voiced bilabial plosive | बिस्तर |
bʱ | b_h | voiced aspirated bilabial plosive | भारी (bhaari) |
t̪ | t_d | voiceless dental plosive | तापमान (taapmaan) |
t̪ʰ | t_d_h | voiceless aspirated dental plosive | थोड़ा (thoda) |
d̪ | d_d | voiced dental plosive | दिल्ली (dilli) |
d̪ʱ | d_d_h | voiced aspirated dental plosive | धोबी (dhobi) |
ʈ | t` | voiceless retroflex plosive | कटोरा (katora) |
ʈʰ | t`_h | voiceless aspirated retroflex plosive | ठंड (thand) |
ɖ | d` | voiced retroflex plosive | डर (darr) |
ɖʱ | d`_h | voiced aspirated retroflex plosive | ढाल (dhal) |
tʃ | tS | voiceless postalveolar affricate | चार |
tʃʰ | tS_h | voiceless aspirated palatal affricate | छाल (chaal) |
dʒ | dZ | voiced postalveolar affricate | जंगल |
dʒʱ | dZ_h | voiced aspirated palatal affricate | झाल (jhaal) |
k | k | voiceless velar plosive | कमाल |
kʰ | k_h | voiceless aspirated velar plosive | खान (khan) |
g | g | voiced velar plosive | गाँव |
ɡʱ | g_h | voiced aspirated velar plosive | घान (ghaan) |
l | l | alveolar lateral approximant | लम्हा |
m | m | bilabial nasal | मंत्र |
n | n | alveolar nasal | नाग |
ŋ | N | velar nasal | मंगल |
ɳ | n` | retroflex nasal | क्षण (kshan) |
s | s | voiceless alveolar fricative | साल |
z | z | voiced alveolar fricative | ज़रूर |
ʃ | S | voiceless postalveolar fricative | शर्मिंदा |
f | f | voiceless labiodental fricative | फ़ारसी |
ɾ | 4 | alveolar flap | राम (ram) |
ɽ | r` | plain retroflex flap | बड़ा (bada) |
ɽʱ | r`_h | voiced aspirated retroflex flap | बढ़ी (barhi) |
h | h | voiceless glottal fricative | हार |
j | j | palatal approximant | यार |
ʋ | v\ | bilabial approximant | वसूल (wasool) |
IPA | XSAMPA | Description | Examples |
---|---|---|---|
ə | @ | mid central vowel | अच्छा (achhaa) |
ə̃ | @~ | nasalised mid central vowel | हँसना (hansnaa) |
a | A | open front unrounded vowel | आग (aag) |
ã | A~ | nasalised open front unrounded vowel | घड़ियाँ (ghariyaan) |
ɪ | I | near-close near-front unrounded vowel | इक्कीस (ikkees) |
ɪ̃ | I~ | nasalised near-close near front unrounded vowel | सिंचाई (sinchai) |
i | i | close front unrounded vowel | बिल्ली (billee) |
ĩ | i~ | nasalised close front unrounded vowel | नहीं (nahin) |
ʊ | U | near-close near-back rounded vowel | उल्लू (ullu) |
ʊ̃ | U~ | nasalised near-close near-back rounded vowel | मुँह (munh) |
u | u | close back rounded vowel | फूल (phool) |
ũ | u~ | nasalised close back rounded vowel | ऊँट (oont) |
ɔ | O | open-mid back rounded vowel | कौन (kaun) |
ɔ̃ | O~ | nasalised open-mid back rounded vowel | भौं (bhaun) |
o | o | close-mid back rounded vowel | सोना (sona) |
õ | o~ | nasalised close-mid back rounded vowel | क्यों (kyon) |
ɛ | E | open-mid front unrounded vowel | पैसा (paisa) |
ɛ̃ | E~ | nasalised open-mid front unrounded vowel | मैं (main) |
e | e | close-mid front unrounded vowel | एक (ek) |
ẽ | e~ | nasalised close-mid front unrounded vowel | किताबें (kitabein) |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
b | b | voiced bilabial plosive | bed |
d | d | voiced alveolar plosive | dig |
d͡ʒ | dZ | voiced postalveolar affricate | jump |
ð | D | voiced dental fricative | then |
f | f | voiceless labiodental fricative | five |
g | g | voiced velar plosive | game |
h | h | voiceless glottal fricative | house |
j | j | palatal approximant | yes |
k | k | voiceless velar plosive | cat |
l | l | alveolar lateral approximant | lay |
m | m | bilabial nasal | mouse |
n | n | alveolar nasal | nap |
ŋ | N | velar nasal | thing |
p | p | voiceless bilabial plosive | speak |
ɹ | r\ | alveolar approximant | red |
s | s | voiceless alveolar fricative | seem |
ʃ | S | voiceless postalveolar fricative | ship |
t | t | voiceless alveolar plosive | trap |
t͡ʃ | tS | voiceless postalveolar affricate | chart |
θ | T | voiceless dental fricative | thin |
v | v | voiced labiodental fricative | vest |
w | w | labial-velar approximant | west |
z | z | voiced alveolar fricative | zero |
ʒ | Z | voiced postalveolar fricative | vision |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ə | @_o | mid central vowel | arena |
æ | { | near-open front unrounded vowel | trap |
aɪ | aI | diphthong | price |
aʊ | aU | diphthong | mouth |
ɑ | A_o | long open back unrounded vowel | father |
eɪ | eI | diphthong | face |
ɜ | 3 | open-mid central unrounded vowel | nurse |
ɛ | E_o | open-mid front unrounded vowel | dress |
i | i_o | long close front unrounded vowel | fleece |
ɪ | I_o | near-close near-front unrounded vowel | kit |
əʊ | @U | diphthong | goat |
ɔ | O_o | long open-mid back rounded vowel | thought |
ɔɪ | OI | diphthong | choice |
u | u_o | long close back rounded vowel | goose |
ʊ | U_o | near-close near-back rounded vowel | foot |
ʌ | V | open-mid back unrounded vowel | strut |
ɒ | Q | open back rounded vowel | bother |
ɛə | E@ | diphthong | bear |
ɪə | I@ | diphthong | beer |
ʊə | U@ | diphthong | tour |
IPA | X-SAMPA | Description | Examples |
---|---|---|---|
ˈ | " | primary stress | Alabama |
ˌ | % | secondary stress | Alabama |
. | . | syllable boundary | A.la.ba.ma |
These symbols provide full coverage for the sounds of Italian. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Italian skills is discouraged, as it may result in suboptimal speech synthesis.
X-SAMPA | IPA | Examples |
---|---|---|
b | b | problema |
tS | tʃ | pancia |
d | d | diretto |
dz | dz | benzina |
f | f | difesa |
g | g | erogazione |
j | j | votazione |
dZ | dʒ | legislatura |
k | k | cascata |
l | l | polvere |
L | ʎ | dettaglio |
m | m | settimo |
n | n | comune |
N | ŋ | anche |
J | ɲ | dignità |
p | p | pasta |
r | r | promozione |
s | s | vestito |
S | ʃ | disciplina |
t | t | articolo |
ts | ts | esistenza |
v | v | tuttavia |
w | w | delinquenza |
z | z | musicista |
Z | ʒ | peugeot |
i | i | musica |
e | e | vestito |
E | ɛ | veste |
a | a | mano |
u | u | uva |
o | o | polacco |
O | ɔ | povero |
. | syllable boundary | rapido (" r a . p i . d o) |
" | primary stress | certo (" c ɛ r . t o) |
% | secondary stress | alfabeto (% a l . f a . " b e . t o) |
These symbols provide full coverage for the sounds of Japanese. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for Japanese skills is discouraged, as it may result in suboptimal speech synthesis.
IPA | X-SAMPA | 説明 | 例 |
---|---|---|---|
b | b | 有声両唇破裂音 | ボート(booto) |
d | d | 有声歯茎破裂音 | 電車(densha) |
g | g | 有声軟口蓋破裂音 | 学校(gakkoo) |
h | h | 無声声門摩擦音 | 花火(hanabi) |
j | j | 硬口蓋接近音 | 夢(yume) |
k | k | 無声軟口蓋破裂音 | 会社(kaisha) |
m | m | 両唇鼻音 | メガネ(megane) |
n | n | 歯茎鼻音 | 猫(neko) |
p | p | 無声両唇破裂音 | ピアノ(piano) |
s | s | 無声歯茎摩擦音 | 寿司(sushi) |
t | t | 無声歯茎破裂音 | テレビ(terebi) |
w | w | 両唇軟口蓋接近音 | 話題(wadai) |
z | z | 有声歯茎摩擦音 | 雑貨(zakka) |
ɸ | p\ | 無声両唇摩擦音 | 冬(fuyu) |
ç | C | 無声硬口蓋摩擦音 | ヒント(hinto) |
ɾ | 4 | 歯茎はじき音 | 冷蔵庫(reezooko) |
t͡s | ts | 無声歯茎破擦音 | 月(tsuki) |
c | c | 無声硬口蓋破裂音 | 天気(tenki) |
ɟ | j\ | 有声硬口蓋破裂音 | 将棋(shoogi) |
ɕ | s\ | 無声歯茎硬口蓋摩擦音 | 紹介(shookai) |
d͡ʑ | z\ | 有声歯茎硬口蓋破擦音 | ジュース(juusu) |
ɲ | J | 硬口蓋鼻音 | 日本(nihon) |
ɺ | l | 歯茎側面はじき音 | リンゴ (ringo) |
t͡ɕ | ts\ | 無声歯茎硬口蓋破擦音 | 宇宙(uchuu) |
Q | 促音 | ロボット(robotto) | |
ɴ | NN | 口蓋垂鼻音 | パソコン(pasokon) |
IPA | X-SAMPA | 説明 | 例 |
---|---|---|---|
ä | a | 非円唇中舌広母音 | 窓(mado) |
i | i | 非円唇前舌狭母音 | イス(isu) |
ɯ | M | 非円唇後舌狭母音 | クジラ(kujira) |
e | e | 非円唇前舌中央母音 | 世界(sekai) |
o | o | 円唇後舌中央母音 | お茶(ocha) |
ä: | a: | 非円唇中舌広長母音 | ギター(gitaa) |
i: | i: | 非円唇前舌狭長母音 | チーム(chiimu) |
ɯ: | M: | 非円唇後舌狭長母音 | 算数(sansuu) |
e: | e: | 非円唇前舌中央長母音 | ケータイ(keetai) |
o: | o: | 円唇後舌中央長母音 | 飛行機(hikooki) |
IPA | X-SAMPA | Description | Example |
---|---|---|---|
ɾ | 4 | alveolar flap | pira |
b | b | voiced bilabial plosive | bato |
d | d | voiced alveolar plosive | dato |
d͡ʒ | dZ | voiced postalveolar affricate | idade |
f | f | voiceless labiodental fricative | facto |
g | g | voiced velar plosive | gato |
j | j | palatal approximant | paraguay |
k | k | voiceless velar plosive | cacto |
l | l | alveolar lateral approximant | galo |
ʎ | L | palatal lateral approximant | galho |
m | m | bilabial nasal | mato |
n | n | alveolar nasal | nato |
ɲ | J | palatal nasal | pinha |
p | p | voiceless bilabial plosive | pato |
s | s | voiceless alveolar fricative | saca |
ʃ | S | voiceless postalveolar fricative | chato |
t | t | voiceless alveolar plosive | tacto |
t͡ʃ | tS | voiceless postalveolar affricate | noite |
v | v | voiced labiodental fricative | vaca |
w | w | labial-velar approximant | mau |
χ | X | voiceless uvular fricative | carro |
z | z | voiced alveolar fricative | zaca |
ʒ | Z | voiced postalveolar fricative | jacto |
a | a | open front unrounded vowel | parto |
ã | a~ | nasal open front unrounded vowel | pensamos |
e | e | close-mid front unrounded vowel | pega |
ẽ | e~ | nasal close-mid front unrounded vowel | movem |
ɛ | E | open-mid front unrounded vowel | café |
i | i | close front unrounded vowel | lingueta |
ĩ | i~ | nasal close front unrounded vowel | cinto |
o | o | close-mid back rounded vowel | poder |
õ | o~ | nasal close-mid back rounded vowel | compra |
ɔ | O | open-mid back rounded vowel | cotó |
u | u | close back rounded vowel | fui |
ũ | u~ | nasal close back rounded vowel | sunto |
prosody
Modifies the volume, pitch, and rate of the tagged speech.
Attribute | Possible Values |
---|---|
|
Modify the rate of the speech:
|
|
Raise or lower the tone (pitch) of the speech:
Note: When you modify the speech with the
pitch tag, Alexa uses a legacy text-to-speech system, which might change the speech sound quality. |
|
Change the volume for the speech:
|
<speak>
Normal volume for the first sentence.
<prosody volume="x-loud">Louder volume for the second sentence</prosody>.
When I wake up, <prosody rate="x-slow">I speak quite slowly</prosody>.
I can speak with my normal pitch,
<prosody pitch="x-high"> but also with a much higher pitch </prosody>,
and also <prosody pitch="low">with a lower pitch</prosody>.
</speak>
You can combine prosody
with all other tags when you set the rate
and/or volume
attributes. When you use the pitch
attribute, you cannot combine prosody
with the tags shown in incompatible tags.
s
Represents a sentence. This tag provides strong breaks before and after the tag.
This is equivalent to:
- Ending a sentence with a period (.).
- Specifying a pause with
<break strength="strong"/>
.
<speak>
<s>This is a sentence</s>
<s>There should be a short pause before this second sentence</s>
This sentence ends with a period and should have the same pause.
</speak>
say-as
Describes how the text should be interpreted. This lets you provide additional context to the text and eliminate any ambiguity on how Alexa should render the text. Indicate how Alexa should interpret the text with the interpret-as
attribute.
Attribute | Possible Values |
---|---|
|
|
|
Only used when
Alternatively, if you provide the date in YYYYMMDD format, the |
Note that the Alexa service attempts to interpret the provided text correctly based on the text's formatting even without this tag. For example, if your output speech includes "202-555-1212", Alexa speaks each individual digit, with a brief pause for each dash. You don't need to use <say-as interpret-as="telephone">
in this case. However, if you provided the text "2025551212", but you wanted Alexa to speak it as a phone number, you would need to use <say-as interpret-as="telephone">
.
<speak>
Here is a number spoken as a cardinal number:
<say-as interpret-as="cardinal">12345</say-as>.
Here is the same number with each digit spoken separately:
<say-as interpret-as="digits">12345</say-as>.
Here is a word spelled out: <say-as interpret-as="spell-out">hello</say-as>
</speak>
Supported speechcons
Speechcons are language specific. See the following pages for the available speechcons for each skill language:
- English (AU)
- English (CA)
- English (IN)
- English (UK)
- English (US)
- French (CA)
- French (FR)
- German (DE)
- Hindi (IN)
- Italian (IT)
- Japanese (JP)
- Portuguese (BR)
- Spanish (ES)
- Spanish (MX)
- Spanish (US)
speak
This is the root element of an SSML document. When using SSML with the Alexa Skills Kit, surround the text to be spoken with this tag.
<speak>
This is what Alexa sounds like without any SSML.
</speak>
sub
Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias
attribute.
Attribute | Possible Values |
---|---|
|
The word or phrase to speak in place of the tagged text. |
This example replaces the abbreviated chemical elements with the full words:
<speak>
My favorite chemical element is <sub alias="aluminum">Al</sub>,
but Al prefers <sub alias="magnesium">Mg</sub>.
</speak>
voice
Use the voice
tag to speak the text with the specified Amazon Polly voice. Each listed voice has its own individual character. See Best Practices for Using Amazon Polly Voices for advice on how to use different voices in your skill to create a good customer experience.
You can combine voice
with all other tags, except for those listed in incompatible tags.
Attribute | Possible Values |
---|---|
|
The name of a supported Amazon Polly voice. Voice are specific to locale. To speak content in the same language as your skill, choose a voice supported for the locale of your skill. To speak content in a different language, combine the For example, in an English (US) skill, use an For the list of supported voices for each locale, see Supported Amazon Polly voices. |
Supported Amazon Polly voices
This table lists the Amazon Polly voices supported by Alexa. Voice names don't contain accented characters. Use a voice supported for the skill locale or use the voice with the lang
tag.
To comply with Alexa skill policies, don't expose the Amazon-assigned name of a Polly voice to users.
Locale | Supported voices |
---|---|
English, American (en-US) |
Ivy, Joanna, Joey, Justin, Kendra, Kimberly, Matthew, Salli |
English, Australian (en-AU) |
Nicole, Russell |
English, British (en-GB) |
Amy, Brian, Emma |
English, Indian (en-IN) |
Aditi, Raveena |
English, Welsh (en-GB-WLS) |
Geraint |
French, Canadian (fr-CA) |
Chantal |
French, France (fr-FR) |
Celine, Lea, Mathieu |
German (de-DE) |
Hans, Marlene, Vicki |
Hindi (hi-IN) |
Aditi |
Italian (it-IT) |
Carla, Giorgio, Bianca |
Japanese (ja-JP) |
Mizuki, Takumi |
Portuguese, Brazilian (pt-BR) |
Vitoria, Camila, Ricardo |
Spanish, American (es-US) |
Penelope, Lupe, Miguel |
Spanish, Castilian (es-ES) |
Conchita, Enrique, Lucia |
Spanish, Mexican (es-MX) |
Mia |
Example–Standard Alexa voice and a specified Amazon Polly voice
In this example, assume this sample is from an en-US
skill, and because "Kendra" is an en-US
voice, no lang
tag is required. If this sample was from a skill that does not have an en-US
locale, then the lang
tag should be added and set to en-US
.
<speak>
I want to tell you a secret.
<voice name="Kendra">I am not a real human.</voice>.
Can you believe it?
</speak>
Example–two voices used in a dialog
This example provides a dialog between an en-US voice and an en-GB voice, such as might occur if a story with two different characters were being read. The standard Alexa voice, which varies by locale, speaks the first and last sentence.
<speak>
Here's a surprise you did not expect.
<voice name="Kendra"><lang xml:lang="en-US">I want to tell you a secret.</lang></voice>
<voice name="Brian"><lang xml:lang="en-GB">Your secret is safe with me!</lang></voice>
<voice name="Kendra"><lang xml:lang="en-US">I am not a real human.</lang></voice>.
Can you believe it?
</speak>
In the following example, the default Alexa voice is for an en-US skill. If the skill were for en-GB, the default Alexa voice would correspond to that.
Example–French content in an English skill
In this example, assume the locale of this skill is for an English-speaking region. Because "Celine" is an "fr-FR" voice, and you want Celine's content spoken in French, lang
should be specified as "fr-FR".
<speak>
Welcome to Ride Hailer. <voice name="Celine"><lang xml:lang="fr-FR">Bienvenue à Ride Hailer</lang></voice>
You can order a ride, or request a fare estimate.
Which will it be?
</speak>
Tips for using Amazon Polly voices
Although all Amazon Polly voices use approximately the same volume, some voices may be perceived as louder or quieter than Alexa voices. Use the prosody tag to modify the volume, rate, and pitch of the voice you have chosen. Other SSML tags supported by Alexa may also be used to modify the spoken output.
Developers can enhance their skills with responses that include one or more Amazon Polly voices, as well as the default Alexa voice, and can choose specific voices for specific responses. Refer to User Experience Guidelines for the Use of Amazon Polly Voices in Your Skills for guidance on using Amazon Polly voices in your skills.
There is no charge for Alexa developers to use Amazon Polly voices.
The locale of a skill refers to a combination of region and language, and all of the Amazon Polly voices are tagged with a locale. For example, the "en-AU" locale refers to the English language in Australia, whereas "en-IN" refers to the English language in India. You select the locale of your skill when you first create it.
To achieve the best results, if the voice you select is for a different locale than that specified by your skill, use the lang
tag to specify the language in which the content will be spoken. See more about the lang tag.
Be mindful of the customer experience if you combine voices from different locales in your skill responses.
Node.js sample code for voice
If building a Node.js skill, this switchVoice
function can be used to wrap speech output with voice
tags to get a specific voice. If you use the Alexa Skills Kit SDK for Node.js, you do not need to wrap the speech output in <speak>
tags, as that is handled by the SDK.
function switchVoice(text,voice_name) {
if (text){
return "<voice name='" + voice_name + "'>" + text + "</voice>"
}
}
Here is some sample speech output from a skill using multiple voices with the switchVoice function.
const speechOutput = "I am Alexa." + switchVoice("I am Matthew.","Matthew") + switchVoice("I am Kendra.","Kendra") + switchVoice("and I am Ivy.","Ivy") + "Don't we make a great team?"
If you want all of the skill responses to be in a particular voice, ensure that all speech outputs from the skill are specified as SSML and are wrapped with the appropriate voice
tag.
w
Similar to say-as
, this tag customizes the pronunciation of words by specifying the word's part of speech.
Attribute | Possible Values |
---|---|
|
Set to one of the following
|
<speak>
The word <say-as interpret-as="characters">read</say-as> may be interpreted
as either the present simple form <w role="amazon:VB">read</w>,
or the past participle form <w role="amazon:VBD">read</w>.
</speak>
Note that these tags previously used the ivona
namespace in the attribute names. The tags are backwards compatible, so existing SSML written with the ivona
namespace continues to work.
Related topics
Last updated: Oct 05, 2022