Internationalization

Internationalization, or localization, is the process of designing features or skills for different languages, regional differences, and technical requirements of different target markets. Building your experience with this in mind reduces the resources required for others to localize your product for new marketplaces. Localization is not just simple language translation, it's a cultural shift in the way Alexa converses with different customers.

When building an experience, you'll need to think beyond your own native culture and language(s). When you design your feature or skill, consider which markets you're planning to ship to? What languages will you need to support? What level of translation or localization is required? We've all had the experience of reading the instructions for a product made in another country or language that was translated poorly. It's very clear it is not native to your own language or culture and negatively impacts your experience and trust in that product. Not properly handling translation and cultural differences not only negatively affects your skill, it slowly breaks trust with Alexa across different markets as well.

Best practices for internationalization include using Unicode (UTF-8), externalizing string resources, adapting design and controls for different languages, handling data based on the customer locale, and handling input by different languages.

Voice design

Write short, complete sentences
Long, complex sentences are difficult to translate, and difficult to understand. Be mindful of long strings of nouns or adjectives, or very long sentences that work better as short ones. Read them aloud to get a true sense of length and ease of comprehension.

Avoid colloquialisms, puns, or local jargon when not critical to content
This general rule is especially important to localization, because they may have no equivalent or be misconstrued in other languages, a specific region where the language is spoken, or even for a certain generational demographic.

If it is critical to your skill, you may want to work with translators to find equivalent jargon in the specific locale. For example, acronyms like “FAQ” won't translate well and neither will “9 to 5”, since other countries and regions have different concepts on how long a work day is.

Define and use terms consistently
If your source text is inconsistent in how you present certain terms, or if you don’t give proper term definitions to those translating your content, it is very difficult for them to provide quality translations. Providing glossaries and pre-populating terminology databases will force you to define the important terms for your product and use those terms consistently in the source text.

Review all content for geopolitical inaccuracies or sensitivities
Being geopolitically accurate and sensitive means you make sure that the content you provide in your skill does not cause offense to your customers. For example, an image or graphic in one culture can have a totally different meaning or can be very offensive in another culture. Failure to do so may offend, can lead to exposure of legal and regulatory risk, or can incur negative media exposure, boycotts, or public demonstrations.

Both text and not-text content should be reviewed for geopolitical issues. Text includes UI strings, documents, help, marketing communication, etc. Non-text includes videos, audio, games, and images.

Images can create special risks because they can evoke emotional reactions, influence beliefs, recall memories, and are not always globally appropriate. Other information that needs to be checked includes colors, numbers, names, and other current events.

Here is the list of image types that need to be checked:

  • Flags (flags are recognized differently by some countries)
  • Maps (boundaries are recognized differently by countries)
  • People (including gestures, hands, feet, and race)
  • Symbols/Icons (national, political, religious, spiritual)
  • Mature/risqué images

When determining whether to use an image for a locale consider: Does this image have the same meaning, or will it be offensive in other locales? Or do we need to find a culturally appropriate metaphor for those locales?

Development best practices

Don't build prompt structure into code
Different languages have different word order. A noun phrase is a noun plus optionally adjectives and articles. Grammatical rules in respective language will dictate in which order these words need to come. When a noun phrase (NP) contains dynamic content (variables), it should not be concatenated.

Here is how the word order is different in a few languages and you can easily see that the code should not prescribe the word order, but instead those translating the skill should provide the correct word order for the locale. In addition, the adjectives and articles need to agree in gender and case with the noun and its function in the sentence (subject or direct/indirect object.)

Example of word order by language

Language Subject Verb Object Comments
English The big man eats a red delicious apple. English does not have genders for nouns, nor do adjectives need to change form depending on case (function of the noun in sentence).
German Der große Mann(Masculine article, adjectives in nominative case, masculine) isst einen roten, köstlichen Apfel.(masculine, accusative case) Same NP word order as in English, but in German it doesn't hurt to put a comma between adjectives. The article and adjectives need to agree in gender and case.
Swedish Den stora mannen äter ett gott rött äpple. Definitive article reflected in two places, adjective agreement. Second NP change order of adjectives to sound
French Le gros homme (The big man) mange (eats) une pomme rouge et délicieuse.(an apple red and delicious) Different word order.
Japanese 大きくて毛むくじゃらな男の人が Ookikute otokonohito ga 食べています。tabete imasu. おいしそうな赤いりんごをOishisouna akai ringo o The word order in Japanese is different (verb at the end). Also, it isn't necessarily clear that "red delicious" refers to the specific variety of apple vs. generic "red and delicious" adjectives.
Hindi बड़े बालों वाला आदमीBade wala admi खाता हैKhata hai एक लाल स्वादिष्ट सेबek lal swadisht seb Just like Japanese, the word order is different for Hindi and the verb is in the end. Also, there are no article in Hindi (the, a, an).
Spanish El hombre grande (The man big) come (eats) una manzana roja y deliciosa (an apple red and delicious) Different word order.

The lesson here is that neither subject nor object should be concatenated (code dictates word order). Code it as one string and let translator shuffle the variables in the string. (if either the nouns or adjectives are the variables).

Avoid string concatenation in code
Concatenating strings affects both internationalization and localization. Concatenating multiple strings to form a sentence is a common bad coding practice that developers use. Although the strings concatenated together may form an acceptable English sentence, many languages have different grammatical rules. Translators may need to change the order of the words in a sentence, which may be impossible to do if you split a sentence into multiple strings.

Visual Design

When an experience is localized, the customer interface and any other text will typically require more room than the English source. As a general rule, languages other than English have longer words, longer sentences, and may have larger characters (non-Roman characters). In addition, any data, such as dates or numbers, may take more room as well.

Images

Use overlays, layers, or callouts with graphics
Just like translating the strings in the UI, you also need to translate any text in a graphic. If text is embedded into a graphic, it becomes much harder and expensive to update the graphic with translated text when the time comes. The best way to avoid dealing with localizing graphics is to minimize using graphics in the first place or to remove text in graphics.

Review all images
Use general images that are appropriate and easily understood in your intended countries and marketplaces. Cultural references may not be global. Even literal imagery can feel foreign. For example, mailboxes and tractors look very different in Japan. If you use general images that can be used for a worldwide audience, you don't have to replace them when you launch globally or across different countries and marketplaces.

Beyond checking to make sure images are appropriate for each culture or country, you also need to make sure if the images are displaying properly in each locale and if the right images are even being displayed. For example, are the translated text overlays displaying properly on the images?

Use overlays, layers, or callouts with graphics
Just like translating the strings in the UI, you also need to translate any text in a graphic. If text is embedded into a graphic, it becomes much harder and expensive to update the graphic with translated text when the time comes. The best way to avoid dealing with localizing graphics is to minimize using graphics in the first place or to remove text in graphics.

Text

30% Rule
Account for at least 30% extra space on your UI components beyond what the English source required. For example, some strings translated from English to German take 50% more space.

Some guidelines:

  • Use pseudo-localization to test your content strings as soon as possible, e.g. generate on Google Translate.
  • Controls should grow to additional lines unless this would break the customer experience.
  • Specify where truncations should occur.
  • Allow for at least 30% text expansion in translation.

Line wrap & truncation
Define line wrap and truncation behavior for all layouts using the text component. Text in layouts should be allowed to wrap and flow to as many lines as needed. Use the maxLines property in the text component to call out text components that limit the number of lines – single line, two lines, etc.

Text that is used for navigating the experience must not be truncated. For such text components, if the visual design limits the number of lines available, the text needs to be wide enough to accommodate string expansion. (See 30% Rule above.) If there are translations that don't fit, the design may need non-trivial changes and it's better to know and to make such changes early.

Text labels on buttons & controls
Button labels should be short and concise. While the label fits in the design with the message in English, the translations often do not fit in the button, or the button elongates resulting in the button and other controls to not fit inside the available space. The same problem also often occurs for customers that choose to enlarge the UI font size for accessibility. Instead of having a wordy label, break the message out into a separate body above the button and the label can then just say, "Okay", "Agree", or other short and concise action. Avoid placing multiple buttons on a single row.

Font styles & capitalization for emphasis/differentiation
Font styles like bold, italics, underline, strike-thorough, and capitalization (like making a label all-caps) are often used in English or other Roman-character languages to emphasize or visually differentiate. However, this may break in language scripts that don't use bold, italics, uppercase/lowercase etc, e.g. Chinese, Japanese, Arabic, Hindi, Korean, etc. Define other differentiation formatting, like fg/bg color, quotation marks, visible control border and fill, etc.

When using capitalization or all-caps, more vertical clearance (line height and spacing) need to be allocated because some languages with heavy diacritics need more room to accommodate uppercase letters with accent and descent than the same in lowercase.

Spaces between words
In certain languages, such as Japanese, words do not have spaces between them. This makes it difficult to process tasks like disabling substring matches. Because there is no hyphenation or spaces between words, there is no easy way to programmatically prevent "widows" in GUI displays. For example, the last line of a string may contain only one character.

Multi-line clause
Do not purposely divide a single clause into multiple lines in the visual layout. Allow natural line wrap instead, or break the clause such that the parts are grammatically independent. The reason is that when a single clause is broken into multiple lines, developers have only two options to implement this and both will not allow for grammatically correct translation:

  1. Externalize the single phrase into multiple string resources, one string resource per line of text. Translators translate each resource independently and the translations, when visually combined, do not make sense or has incorrect grammar.
  2. Externalize a single string resource with embedded newline ('\n') characters to force the line breaks. Translators will not know what to do with the embedded newlines. Sometimes they would omit them altogether. Other times the newlines are placed in the translated string that do not provide the expected visual because translators do not get to see the on-screen placement of the translated text while translating.

In addition, if a multi-line clause includes an argument to be substituted dynamically, the translated string as a whole may not make grammatical sense because the placement of the argument is being forced to be at a certain position in the clause. To produce grammatically correct translations for such clauses, the position of the argument within the clause need to be adjustable. Consider the following example:

The placement of the "<n> days" argument in other languages could be at another location in the clause. Translators will translate each line individually and the clause as a whole over two lines will not have the proper grammar and will read strangely in another language.

Do not purposely line break. Or, if you do want the "<n> days" to be on its own line, then break the single clause into independent parts in a "subject:predicate-nominal" form (note the colon replaces the "to be" verb):

Dynamic text elements in a clause
Don't design messages that have an argument for substituting another piece of text dynamically. For example, a message like "Remove <personal name>'s profile from this household". Translating this in a grammatically correct way in some languages requires knowing the grammatical gender and cases of the text being substituted in (which may not be an information that is available) and require different message template for each potential grammatical gender and case. If the message can be rephrased to omit the argument, that would be preferred. If that's not acceptable, try rephrasing into a "subject:predicate-nominal" form, e.g. "Profile to remove from this household: <personal name>".

Another example is the message "Sort by <title|author|recent>" that often appears as a text-labeled control that lets the customer select one of the sort options. The verb "sort" and preposition "by" will need to change gender form in some languages depending on the dynamic text. Rephrasing into "subject:predicate-nominal" form, "Sort Order: <title|author|recent>" simplifies the implementation for proper localization.

Numbers, dates, times
For date and time, phone number, file size and other general number formatting, follow local custom.

Examples of dates and times:

Language Number Date Month Day Time
EN-US -123,456,789.988 September 7, 2018 Sep Sat 2:32:55 PM
EN-UK -123.456.789,988 7 September 2018 Sep Sat 14:32:55
FR -123.456.789,988 7 septembre 2018 sept. sam. 14:32:55
DE -123.456.789,988 7. September 2018 Sep. Sa. 14:32:55
JP -123,456,789.988 2018年9月7日 9月 土曜日 14:32:55

Addresses
Different countries have different address field labels and field orders. For example, Japanese addresses start with postal code, then from largest to smallest region (country, prefecture, city, address).

Best practices by locale

Japan

Names

  • If your prompt contains names of people, Japanese customers will expect to hear them with the honorific title "san" appended. A prompt that says, "Message from <name>," should be read "Message from Tanaka-san." Unfortunately, however, there are cases where the <name> field contains "san" or a variant, so that simply embedding "san" into the Japanese prompt can result in duplicate titles. For example, the Japanese word for "mother" is "okaasan," so if you have your contacts list has the word "Mom" in it, automatically appending the "san" will read out as "Message from Okaasan-san."
  • If your prompt or utterance uses first and last names, don't build dependencies based on US English conventions for using first and last names. For example, If you have a prompt that says "Call <name>," Japanese are most likely to say <last name><first name>(+ "san"). If they use one name, the last name is more likely for casual acquaintances and co-workers than first name.
  • If you use an utterance to input names, Alexa may pronounce the names incorrectly or convert them to the wrong kanji character, because the ASR pipeline does not store pronunciation metadata (the "homography" problem). For example, you may say "What is the weather in Misato?" and Alexa might respond "The weather in Sango is…." (because both cities are written using the same kanji characters).

English words in data catalogs

If your skill uses a catalog that contains western words, entity matching accuracy may be a challenge. Japanese liberally intersperse English phrases in movie or book titles, musician or band names, restaurant or building names, companies, etc. These words may be entered in romanized letters (romaji) or a phonetic alphabet called ˆkatakana" (or sometimes kana for short). Unfortunately, there are multiple accepted forms of kana representation for certain phrases. Alexa does not yet have a generic kana to English conversion algorithm, so you may need to append metadata to your catalogs.

Lists
If your visual response contains a list that you sort alphabetically, Japanese words will not sort properly unless you append pronunciation metadata. Sorting in Japanese has been described as an unsolved problem. There are tens of thousands of kanji characters that do not have an intuitive visual sorting methodology, and sorting by sound is unreliable because kanji characters can have multiple pronunciations. The Japanese version of Excel, for example, sorts by unicode number, which presents no intuitive value to a human trying to scan a list. To solve this problem, databases provide additional fields to append pronunciation metadata.

Counters

If your prompts contain numeric variables, Japanese might generate TTS errors. Japanese append specific suffixes to numbers, depending on the word. This is analogous to saying "1 loaf of bread" in English. The pronunciation of the number can vary depending on the counter and even the context (for example the characters 1日 are pronounced either "tsuitachi" when referring to the first day of the month, or "ichinichi" when referring to a duration of one day). In general, TTS should handle these issues for you, with the exception of a space is inserted in your prompts between the number variable and the counter term, the TTS will likely generate an error because it will try to read the number separately from the counter.

Things to look out for:

  1. Japanese names are ordered differently than English names. 1) The order is last name + first name which is opposite to those in English. 2) Japanese append suffixes to names, most commonly "san." Automating this concatenation for names pulled from contact lists or databases generally works, but can fail in various cases.
  2. Different hierarchy of location in Japan. Although Tokyo is considered a city, it is technically a prefecture, and therefore will map to a "state" in skills that use location. As a result, "what is the weather in Tokyo?" will generate the same response as an American asking "what is the weather in Texas?"
  3. Utterance variants. From localization perspective, Japanese utterances literally translated from English might not be enough to work for the corresponding prompts well. For example, for the utterance of "Call to kitchen", we need both "daidokoro ni kakete" and "daidokoro ni denwa shite" to get Alexa to make a call.
  4. Utterance disambiguation. Similarity of golden utterances in Japanese may overlap in different ways than in English. For example, in Japanese, the same verb is used for "play an artist" and "call a contact" ("Taylor Swift wo kakete" = "Play Taylor Swift," while "Taylor Swift ni kakete" = "Call Taylor Swift").
  5. Prompts that generate yes / no answers. For confirmation prompts such as "xxx, (is that) right?", you should make sure the Japanese let the customers reply yes or no. The literal translation "attemasuka" could let them say "attemasu" (that's right), not "hai" (yes) / "iie" (no), which causes error prompts.
  6. Multiple options. If you provide several choices to the customers, avoid having customers reply in the middle of the options readout.
  7. Error prompts. Due to the Japanese mentality, every error prompt in Japanese (not other languages) should begin with "sumimasen" (sorry), regardless of whether the English has it or not. The literal translation of "I'm not quite sure how to help you with that" sounds unfriendly / odd. Although error messages are particularly difficult because there are a lot of cases that trigger the error prompts, they need to be proper for all of them.

Addresses

If your voice output or visuals include the ability to enter addresses, note that Japanese address order starts from postal code, then largest unit to smallest unit. In addition Tokyo is technically a prefecture (equivalent to state) due to its size. As a result, prompts that use city and state information in the US may have unintended results in Japanese (for example, asking for the weather in Tokyo will generate an error prompt asking you to be more specific, in the same way that asking for the weather in Texas would).

Units of measure and time

  • Japanese use the metric system.
  • Japanese use a combination of 12-hour and 24-hour time, depending on the use case. Trains, for example are typically listed in 24-hour time, but colloquially you are more likely to say meet me at 3PM than 15:00.
  • Japanese represent months and dates with a number and counter, e.g., months are 1月、2月、3月… and dates are 1日、2日、3日…
  • Japan has a parallel year calendar based on the Emperor's reign. For example, 2018 is often represented as "Heisei 30" in official sources of news or documentation. The current Emperor is retiring next year, so there will be a new imperial reign name that will be announced. That said, although iOS has a preference setting to support the Japanese calendar system, Japanese are very comfortable with the western calendar.

Germany

Alexa uses “Du” to address the customer. She will speak Hochdeutsch and avoid local dialectal expressions. This is the same in Austria as in Germany.

Alexa should pronounce foreign words “like an educated German” would, or as someone who is “clever” and knows modern contemporary language.

Alexa will pronounce the words

  • Amazon (German pronunciation)
  • Echo (German pronunciation)
  • Echo Go … German pronunciation


Back to top