On a relatively regular basis, I find myself building skills that use uncommon words. Most of the time, Alexa pronounces them as I would expect. Other times, the pronunciation is different enough that I want to change it. Today I’ll share my process for identifying those words, and how I go about changing the pronunciation to create a more natural voice experience. Throughout this post I’ll use the Dev Tips skill as an example to illustrate this process.
The first step is identifying words that you’d like to change the pronunciation for. The best way to do this is to listen to them. The way you pronounce a word can vary greatly depending on the region of the world you’re from, or sometimes even the region of the same country you’re in. (Ask your friends how they pronounce “pecan,” for an example.) I prefer to use the testing tools in the developer portal. There are three different tools available: Alexa Simulator, Manual JSON, and Voice & Tone.
The Alexa Simulator is what you’d expect. It allows you to type or speak commands to Alexa, and see the responses that are received. It’s a powerful testing tool that can show you the JSON that is passed to and from your skills as you execute your commands, and you can do it as a complete conversation.
Manual JSON allows you to submit a specific JSON request, and shows you what your skill will respond with. This is another very useful tool for testing, especially if you already have the JSON generated.
Voice & Tone is different in that it isn’t actually testing your skill. It is available to test the way Alexa says a specific block of text. It is also a great place to experiment with SSML. This is the tool I will be doing all of my testing with for the rest of this post. For the “pecan” example above, try this block of SSML to hear both pronunciations.
<speak>
<phoneme alphabet='ipa' ph='pɪˈkɑːn'>pecan</phoneme>
<break time='1s'/>
<phoneme alphabet='ipa' ph='ˈpi.kæn'>pecan</phoneme>
</speak>
We built the Dev Tips skill to help Alexa developers easily find answers to common questions. Every week we compile a list of answers to those questions and write concise answers for each one.
One of the scenarios that we identified for Dev Tips was that our users would want to be able to get the contact information of our evangelists. Perhaps they recently attended a Dev Days event, or read a blog post similar to this one and had additional questions for the author. We wanted to make sure that the Dev Tips skill could answer any of those questions about the Alexa evangelism team. This worked really well for most of us, which was surprising, because most people don’t pronounce “Blankenburg” correctly on their first try.
However, there were a couple of exceptions:
To update these, we added a pronunciation property to our data for these answers. In Paul’s case, we took the simple route, which was to simply create his last name with two words instead of one: “Paul Cut Singer.”
This is often an easy way to get Alexa to pronounce a word the way that you prefer, but it has its limitations. For example, this spelling now works perfectly in English US pronunciations, but does not translate for Japanese pronunciations.
There is a more specific and accurate way to modify pronunciations for Alexa, however. Using SSML, we can spell our word out using phonemes. Phonemes are phonetic pronunciations that you can assemble together to make words. (You can read more about phonemes in our developer documentation.)
In Andrea Muttoni’s case, we recreated his entire name with phonemes. Here’s a quick look at the phonemes for his last name. (You can see the entire list of supported symbols for phonemes here).
Letter to Pronounce |
IPA Symbol |
Sounds Like |
M |
m |
mouse |
U |
u |
goose |
T |
t |
trap |
T |
t |
trap |
O |
oʊ |
goat |
N |
n |
nap |
I |
i |
fleece |
<phoneme alphabet="ipa" ph="muttoʊni">muttoni</phoneme>
As you can see, there are several elements included in this SSML. The first is the <phoneme> tag, indicating that this is a phonetic pronunciation.
Second is the alphabet indication. Alexa supports IPA (International Phonetic Alphabet), and X-SAMPA (Extended Speech Assessment Methods Phonetic Alphabet). These two alphabets use different symbols, but still both map to the same phonetic sounds. IPA tends to use more complicated symbols that more resemble the sound, while X-SAMPA uses common characters that you can easily find on your keyboard. You can use either alphabet, but you can’t combine them. This is why you must indicate an alphabet in your SSML tag.
The last two values are the actual collection of phonemes to be pronounced, ph, and the actual spelling of the word. It should be noted that the spelling is not a required value, and you could write your phoneme with this syntax instead:
<phoneme alphabet="ipa" ph="muttoʊni" />
That is all it takes to create your own phonetic pronunciations using phonemes and SSML. Identify each of the phonetic sounds you want Alexa to make, and put them together as one value inside a <phoneme> tag.
And, in the spirit of reaching out to your evangelists, I’m on Twitter as @jeffblankenburg. Let me know what kinds of phonemes you’ve created, or just tell me about your latest skill! I’d love to try it.
Every month, developers can earn money for eligible skills that drive some of the highest customer engagement. Developers can increase their level of skill engagement and potentially earn more by improving their skill, building more skills, and making their skills available in in the US, UK and Germany. Learn more about our rewards program and start building today.