Synchronize Spoken Text with Text on the Screen


Your skill response can associate speech with an APL Text component, and issue a command that highlights lines of text as the speech audio is played, to create a "karoke" effect that shows the lines that are in focus for a block of text.

To use this feature, you must provide speech data as plain text or as marked-up text using Speech Synthesis Markup Language (SSML) expressions. Before this data can be consumed by an Alexa-enabled device, it must be transformed into speech. To enable this transformation, you can use the ssmlToSpeech transformer to transform the text to speech and strip SSML tags from an SSML expression. These transformers cannot be used with the audio tag.

ssmlToSpeech and ssmlToText transformers

Property Type Required Description
transformer enum: ssmlToSpeech | ssmlToText Yes The type of transformation required. Initially, two transformers will be available: 1) ssmlToSpeech converts a data source value to a text-to-speech URL, and 2) ssmlToText converts an SSML expression to plain text by stripping out any SSML tags.
inputPath string Yes The path of the data source value that needs to be transformed.
outputName string No The name of the data source property where the transformed output will be stored. This output property will always be a sibling of the input property. If an outputName isn't provided, the value in the inputPath will be replaced with the output of the transformer.

The following sample APL document shows a version of a "Cat Facts" skill that associates speech with a Text component bound to a cat fact. The Text component is wrapped in a ScrollView component. This means the device will automatically scroll to the parts of the cat fact that aren't visible on screen as they are spoken.

Part of an APL document that shows a Text component that binds to speech

{
    "type": "ScrollView",
    "item": {
        "type": "Text",
        "id": "catFactText",
        "text": "${catFactData.properties.catFact}",
        "speech": "${catFactData.properties.catFactSpeech}"
    }
}

The following sample shows the corresponding object data source and transformers sent by skill developers.

Object data source and transformer bound to the APL document

{
 "datasources": {
  "catFactData": {
   "type": "object",
   "properties": {
    "backgroundImage": "https://.../catfacts.png",
    "title": "Cat Fact #9",
    "logoUrl": "https://.../logo.png",
    "image": "https://.../catfact9.png",
    "catFactSsml": "<speak>Not all cats like <emphasis level='strong'>catnip</emphasis>.</speak>"
   },
   "transformers": [{
     "inputPath": "catFactSsml",
     "outputName": "catFactSpeech",
     "transformer": "ssmlToSpeech"
    },
    {
     "inputPath": "catFactSsml",
     "outputName": "catFact",
     "transformer": "ssmlToText"
    }
   ]
  }
 }
}

In this snippet, the transformed data source is now set to the device.

Transformed data source received by the device

{
    "datasources": {
        "catFactData": {
            "type": "object",
            "properties": {
                "backgroundImage": "https://.../catfacts.png",
                "title": "Cat Fact #9",
                "logoUrl": "https://.../logo.png",
                "image": "https://.../catfact9.png",
                "catFactSsml": "<speak>Not all cats like <emphasis level='strong'>catnip</emphasis>.</speak>",
                "catFactSpeech": "https://tinyurl.amazon.com/aaaaaa/catfact.mp3",
                "catFact": "Not all cats like catnip."**
            }
        }
    }
}

To read the cat fact, you must use the Alexa.Presentation.APL.ExecuteCommands directive with the SpeakItem command. The next snippet shows the Alexa.Presentation.APL.ExecuteCommands directive that you can use to read the cat fact. The token supplied in the ExecuteCommands directive is required, and must match the token provided by the skill in the RenderDocument directive used to render the APL document.

An Alexa.Presentation.APL.ExecuteCommands skill directive with a SpeakItem command

{
    "type" : "Alexa.Presentation.APL.ExecuteCommands",
    "token": "[SkillProvidedToken]",
    "commands": [{
        "type": "SpeakItem",
        "componentId" : "catFactText"
    }]
}

Was this page helpful?

Last updated: Nov 28, 2023