Integrate Visual and Audio Responses

With Alexa Presentation Language (APL), you can build a visual response for devices with screens. With APL for audio, you can build a rich audio response that mixes speech with sound effects and other audio. Combine your visual response and your audio response to provide an engaging user experience.

About integrating a visual and audio response

You can combine a visual response and an audio response in two different ways:

  • Return two RenderDocument directives in the same response – Your skill returns two separate directives in the same response:
  • Embed the audio in the visual response – Your skill returns a single Alexa.Presentation.APL.RenderDocument that includes both the APL document for the visual response and the APL for audio document for the audio response. You then manually invoke the audio with an APL command.

For details, see the following sections:

Return two RenderDocument directives in the same response

When you provide the RenderDocument directives for both visual and audio responses in a single response, Alexa displays the content on the screen, speaks any provided output speech, and then plays the APL for audio response. The normal speech output and the audio response never overlap.

To return the RenderDocument directive with both an audio and visual response

  1. Build the APL document for your visual response and the APL for audio document for your audio response.
  2. In your skill code, add both the Alexa.Presentation.APL.RenderDocument and Alexa.Presentation.APLA.RenderDocument to the directives array.

    The order of the directives in the directives array doesn't matter.

For example, the following APL document and data source displays the AlexaHeadline template with a "hello world" welcome message.

An APL document that displays a welcome message
An APL document that displays a welcome message

Assume you wanted to display this content and also play audio that combines speech and background sound effects. The following APL for audio document and data source provides this audio.

To send both of these examples at the same time, you include RenderDocument from both the APL and the APLA interfaces in your skill response.

The following example shows a response that adds the two RenderDocument directives to the directives array. The response also sets the outputSpeech property. Alexa displays the content on the screen, speaks the outputSpeech, and then plays the APL for audio response.

Embed the audio in the visual response

You can embed the audio response in the APL document for the visual response. Your skill returns a single Alexa.Presentation.APL.RenderDocument directive, and you manually invoke the audio with the APL SpeakItem or SpeakList command.

This option lets you more precisely synchronize the audio with your visual content. For example, Alexa can play your audio for each item in a list and automatically scroll and highlight each list item. Embedding the audio is also useful if you want to play the audio in response to a user event, such as the user tapping a button on the screen.

Because the audio plays in response to the command, it's possible for the audio to start playing before Alexa finishes saying the normal output speech. You can use different approaches to avoid this. For example, you can include all the relevant speech for the response in the APL for audio document, and then don't include the outputSpeech property.

Embedding the audio requires multiple steps. You use the aplAudioToSpeech transformer to convert the APL for audio document to an audio file, and then you bind a reference to that audio file to a component in your APL document.

To embed the audio in the visual response and invoke it from the document

  1. Build an APL document for your visual response and an APL for audio document for your audio response.
  2. In the data source for the APL document, configure the aplAudioToSpeech transformer.

    The transformer converts the APL for audio document to an audio file you can invoke with an APL command. For details, see Configure the aplAudioToSpeech transformer.

  3. Bind the transformer output to the speech property on a component in your APL document. For details, see Bind the transformer output to a component.
  4. Run either the SpeakItem or SpeakList APL command and target the component with the speech property set.

    You can start these commands in several different ways, such as in response to the user tapping the screen, or when the document initially displays. For details, see Run the SpeakItem or SpeakList command to invoke the audio.

  5. In your skill code, return the Alexa.Presentation.APL.RenderDocument. Include the APL for audio document in the same directive in the sources property.

    For details, see Include the audio response as part of the RenderDocument directive.

The following sections provide more details about each of these steps.

Build the APL and APL for audio documents

Build the APL and APL for audio documents. You can save the documents in the authoring tool and then link to them from your code when you send the RenderDocument directive, or you can include the full JSON of each document in your skill code.

Configure the aplAudioToSpeech transformer

The aplAudioToSpeech transformer converts your APL for audio document into audio file you can reference within your APL document. You include the transformer in a transformers array in the data source for your APL document.

A transformer converts data you provide in your data source and then writes the output to a property in the same data source.

For the aplAudioToSpeech transformer, you provide:

  • template – The name of an APL for audio document to convert.
  • outputName – The name of the property where the transformer stores the URL to the converted audio file.
  • inputPath – (optional) A property in the data source that contains data to use in the APL for audio document. Use this property to create audio for each item in an array. For an example, see Play the audio for each item in a list.

Copied to clipboard.

{
  "transformers": [
    {
      "template": "helloWorldEmbedAPLAudioExampleAudio",
      "transformer": "aplAudioToSpeech",
      "outputName": "welcomeSpeech"
    }
  ]
}

To use a transformer, you must define the data source as an object data source by setting the type property to object. Define any properties you want to convert with the transformer within a properties object.

The following example shows a valid data source that you could use with the "hello world" document shown earlier. The aplAudioToSpeech transformer converts the APL for audio document called helloWorldEmbedAPLAudioExampleAudio to an audio file and stores the URL for this audio file in the property helloWorldData.properties.welcomeSpeech.url.

Copied to clipboard.

{
  "helloWorldData": {
    "type": "object",
    "objectId": "helloWorldSample",
    "properties": {
      "headerTitle": "Example: Invoke an audio response from the visual response",
      "primaryText": {
        "type": "PlainText",
        "text": "Welcome to APL, with an audio response!"
      },
      "secondaryText": {
        "type": "PlainText",
        "text": "This example embeds the APL for audio response in an APL document."
      },
      "welcomeText": {
        "contentType": "SSML",
        "textToSpeak": "<speak><amazon:emotion name='excited' intensity='medium'>Welcome to APL!</amazon:emotion> This example integrates the APL for audio response in the APL document. To do this, use the APL audio to speech transformer to create the audio clip. Next, bind the output of the transformer to the speech property on a component in the APL document. Finally, invoke the audio with the SpeakItem command. This example runs the command from the onMount handler, so the command runs when the document displays.</speak>"
      }
    },
    "transformers": [
      {
        "template": "helloWorldEmbedAPLAudioExampleAudio",
        "transformer": "aplAudioToSpeech",
        "outputName": "welcomeSpeech"
      }
    ]
  }
}

The following example shows the data source after the aplAudioToSpeech transformer runs. The properties object now has an additional property welcomeSpeech with the results.

Bind the transformer output to a component

To use the transformer output in your APL document, you use an APL data binding expression to bind the output to the speech property of a component. Bind to the outputName.url property.

Copied to clipboard.

{
  "type": "AlexaHeadline",
  "id": "helloWorldHeadline",
  "headerTitle": "${payload.helloWorldData.properties.headerTitle}",
  "primaryText": "${payload.helloWorldData.properties.primaryText.text}",
  "secondaryText": "${payload.helloWorldData.properties.secondaryText.text}",
  "speech": "${payload.helloWorldData.properties.welcomeSpeech.url}"
}

Run the SpeakItem or SpeakList command to invoke the audio

To play the audio, run the SpeakItem or SpeakList command and target the component with the speech property.

To play the audio… Do this…

When the document displays

Invoke SpeakItem or SpeakList from the document or component onMount handler.

In this scenario, the audio begins to play when the document displays, even if Alexa is still speaking the outputSpeech in the response. To prevent overlapping speech, don't include outputSpeech in the response.

When the user selects a component on the screen

Invoke the SpeakItem or SpeakList command from a handler on the component the user can tap:

  • For a responsive component, such as AlexaButton, set the primaryAction property to the SpeakItem or SpeakList command.
  • For a custom layout built with APL components, configure the onPress handler on a Touchable Component with the SpeakItem or SpeakList command.

For an example, see Play the audio in response to user interactions.

For a good user experience, you should also let users select buttons and other touchable items by voice.

When the user makes a request by voice

Create an intent in your interaction model to capture the request. In the handler for this intent, return the ExecuteCommands directive with the SpeakItem or SpeakList` command.

For details, see Play the audio in response to user interactions.

The following example plays the speech bound to the speech property on the component with the ID helloWorldHeadline when the document displays.

Copied to clipboard.

{
  "onMount": [
    {
      "type": "SpeakItem",
      "componentId": "helloWorldHeadline"
    }
  ]
}

Include the audio response as part of the RenderDocument directive

To use the embedded audio, return the Alexa.Presentation.APL.RenderDocument directive that includes both the APL document and the APL for audio document:

  • Set the document property to the APL document. You can set document to either a link to the document saved in the authoring tool, or to the full JSON for the document.
  • Set the sources property to a string/object map. Within this map, set the string to a name for the APL for audio document and set the object to the document.
    • The name must match the string you used for the template property in the aplAudioToSpeech transformer.
    • You can provide the document as either a link to the document saved in the authoring tool, or the full JSON for the document.

      The following example shows the sources map with one source called helloWorldEmbedAPLAudioExampleAudio. This example assumes you saved the APL for audio document in the authoring tool with the name "helloWorldEmbedAPLAudioExampleAudio".

      {
        "sources": {
          "helloWorldEmbedAPLAudioExampleAudio": {
            "src": "doc://alexa/apla/documents/helloWorldEmbedAPLAudioExampleAudio",
            "type": "Link"
          }
        }
      }
      
  • Set the datasources property to a string/object map containing the data sources you use in both the APL and APL for audio documents.

The following example shows a response that sends a single RenderDocument directive that includes both the APL document and the APL for audio document. Note that the response doesn't include the outputSpeech because the APL document uses the onMount handler to start the audio. If you did include outputSpeech, the speech and audio would overlap.

Play audio for each item in a list

The APL SpeakList command plays speech associated with each item in a list, such as a Sequence. The list automatically scrolls each item into view and can highlight the item by changing the item appearance for a karaoke effect. You can use SpeakList with APL for audio to play an audio clip for each item in a list.

The overall steps are the same as described in Embed the audio in the visual response. You configure the aplAudioToSpeech transformer, bind the transformer output to the list items, and then invoke the SpeakList command. When you configure the transformer, you configure it to process an array of items and create a clip for each item instead of generating a single clip.

To play audio for each item in the list

  1. Build an APL document that uses a multi-child component, such as Sequence or GridSequence, to display your list. Include the id property on the component. For an example, see Build an APL document and data source to display a list.
  2. Create a data source with an array of the list items to display.
    • Use an object data source, and put the data array within the properties object.
    • In your document, bind the data array to the data property of the Sequence or GridSequence.

    For an example, see Build an APL document and data source to display a list.

  3. Set the inputPath property on the aplAudioToSpeech transformer to refer to the path to your data array with your list items. For details, see Configure the aplAudioToSpeech transformer to process an array of items.
  4. Build an APL for audio document that plays the audio you want for a single list item. Use data binding to access the data from the array of list items. For details, see Build the APL for audio document.
  5. Bind the output of the transformer to the child component of the Sequence or GridSequence. For details, see Bind the transformer output to the Sequence child component.
  6. Run the SpeakList command and target the Sequence or GridSequence. For details, see Run the SpeakList command and configure the karaoke style.
  7. In your skill code, return the Alexa.Presentation.APL.RenderDocument. Include the APL for audio document in the same directive in the sources property.

Build an APL document and data source to display a list

The following examples show a Sequence that displays a list of sound names. The data to display in the list comes from the array listOfSounds in the listOfSoundsData data source. When this document displays, the Text component in the Sequence displays one time for each item in data.

Copied to clipboard.

{
  "type": "Sequence",
  "height": "100%",
  "width": "100%",
  "id": "listOfSoundsSequence",
  "numbered": true,
  "padding": [
    "@marginHorizontal",
    "@spacingLarge"
  ],
  "items": [
    {
      "type": "Text",
      "text": "${ordinal}. ${data.name}",
      "spacing": "@spacingLarge"
    }
  ],
  "data": "${payload.listOfSoundsData.properties.listOfSounds}"
}

Use an object type data source and put the array within the properties object in the data source so that the transformer can access the array later. The following example shows this data source with the first four list items.

Copied to clipboard.

{
  "listOfSoundsData": {
    "type": "object",
    "objectId": "speakListAplForAudioExample",
    "properties": {
      "listOfSounds": [
        {
          "audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_chickadee_chirp_1x_01",
          "duration": 0.63,
          "name": "Bird Chickadee Chirp 1x (1)"
        },
        {
          "audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_forest_01",
          "duration": 2.9,
          "name": "Bird Forest (1)"
        },
        {
          "audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_robin_chirp_1x_01",
          "duration": 0.67,
          "name": "Bird Robin Chirp 1x (1)"
        },
        {
          "audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_raven_caw_1x_01",
          "duration": 0.63,
          "name": "Raven Caw 1x (1)"
        }
      ]
    }
  }
}
A Sequence component displays a list of items from a data array
A Sequence component displays a list of items from a data array

Configure the aplAudioToSpeech transformer to process an array of items

To create an audio clip for each item in an array, you use the inputPath property on the aplAudioToSpeech transformer. The inputPath property specifies the path to an object in the data source that contains data to use in the APL for audio document.

Set the inputPath to the same array you're displaying in the Sequence. Set the template to the name of the APL for audio document, and the outputName to the property to hold the transformer output.

In the following example, Alexa uses the provided template (soundItemToPlay) to generate an audio clip for each item in the listOfSounds array.

{
  "transformers": [
    {
      "transformer": "aplAudioToSpeech",
      "template": "soundItemToPlay",
      "outputName": "speech",
      "inputPath": "listOfSounds.*"
    }
  ]
}

Build the APL for audio document

The APL for audio document that you build should play the audio for a single item in the list. The transformer generates a separate audio clip for each item in the inputPath array using the APL for audio document as a template.

In the APL for audio document, you can use data binding to access the data for an array item. Use the expression ${payload.data} to access this data.

The following example shows a document with a Sequencer component that speaks the sound name, followed by audio of the sound effect. The values for ${payload.data.name} and ${payload.data.audioUrl} come from the array referenced in the inputPath property — in this example, the listOfSounds property shown earlier.

Copied to clipboard.

{
  "type": "APLA",
  "version": "0.91",
  "mainTemplate": {
    "parameters": [
      "payload"
    ],
    "item": {
      "type": "Sequencer",
      "items": [
        {
          "type": "Speech",
          "contentType": "SSML",
          "content": "${payload.data.name}"
        },
        {
          "type": "Silence",
          "duration": 500
        },
        {
          "type": "Audio",
          "source": "${payload.data.audioUrl}"
        }
      ]
    }
  }
}

When the aplAudioToSpeech transformer runs, it does the following for each item in the listOfSounds array:

  1. Replaces the expressions ${payload.data.name} and ${payload.data.audioUrl} with the values from the item in listOfSounds.
  2. Creates an audio clip based on the APL for audio document. In this example, the clip speaks the name of the sound, followed by a sample of the sound itself.
  3. Adds a new object to the item in the array in the property outputName. This object has a url property with the URL of the generated sound clip. The following example shows the transformer output for the first item in the list:

     {
       "duration": 0.63,
       "audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_chickadee_chirp_1x_01",
       "speech": {
         "url": "https://tinyaudio.amazon.com/ext/v1/apl/audio/AYADeIg.../resource.mp3"
       },
       "name": "Bird Chickadee Chirp 1x (1)"
     }
    

Bind the transformer output to the Sequence child component

In the APL document, bind the speech property for the Sequence or GridSequence child component to the transformer output. Make sure you bind the speech component on the child component, not on the Sequence itself.

The following example shows the earlier Sequence component. The data property is bound to the listOfSounds array in the data source. The speech property of the Text component is bound to the output of the aplAudioToSpeech component.

Copied to clipboard.

{
  "type": "Sequence",
  "height": "100%",
  "width": "100%",
  "id": "listOfSoundsSequence",
  "numbered": true,
  "padding": [
    "@marginHorizontal",
    "@spacingLarge"
  ],
  "items": [
    {
      "type": "Text",
      "text": "${ordinal}. ${data.name}",
      "spacing": "@spacingLarge",
      "speech": "${data.speech.url}"
    }
  ],
  "data": "${payload.listOfSoundsData.properties.listOfSounds}"
}

Run the SpeakList command and configure the karaoke style

To play the audio, run the SpeakList command and target the Sequence or GridSequence component. As described in Run the SpeakItem or SpeakList command to invoke the audio, you can invoke the SpeakList command in several different ways. Make sure you target the component for your list (the Sequence or GridSequence).

The following example defines a button that plays the audio for each item in the Sequence.

Copied to clipboard.

{
  "type": "AlexaButton",
  "buttonText": "Listen to these sample sounds",
  "alignSelf": "center",
  "spacing": "@spacingMedium",
  "primaryAction": [
    {
      "type": "SpeakList",
      "componentId": "listOfSoundsSequence",
      "start": 0,
      "count": "${payload.listOfSoundsData.properties.listOfSounds.length}",
      "align": "center"
    }
  ]
}

Users expect to be able to select individual items within a list. To play the audio for a single list item when the user selects that item, wrap the Text component in a TouchWrapper and set the onPress property to the SpeakItem command. The speech property must be on the Sequence child component. Set the property on the TouchWrapper instead of the Text component. In this example, the SpeakItem command doesn't need the componentId property because the command targets its own component, the TouchWrapper itself.

Copied to clipboard.

{
  "type": "TouchWrapper",
  "spacing": "@spacingLarge",
  "speech": "${data.speech.url}",
  "items": [
    {
      "type": "Text",
      "text": "${ordinal}. ${data.name}"
    }
  ],
  "onPress": [
    {
      "type": "SpeakItem"
    }
  ]
}

The SpeakList command can also highlight each item during the audio. To enable this, add a style that changes the visual appearance of the Sequence child component based on the karaoke state. A component has the karaoke state during the time Alexa plays its speech. Assign the style to the style property on the component.

For example, the following style changes the color of a component to blue when Alexa plays the speech for the component:

Copied to clipboard.

{
  "styles": {
    "textStyleListItem": {
      "values": [
        {
          "when": "${state.karaoke}",
          "color": "blue"
        }
      ]
    }
  }
}

The following example shows the TouchWrapper with the Text component for a list item, now with the text style. The karaoke state applies to the item with the speech property, which is the TouchWrapper in this example. To apply this state to the Text component, set inheritParentState to true.

Copied to clipboard.

{
  "type": "TouchWrapper",
  "spacing": "@spacingLarge",
  "speech": "${data.speech.url}",
  "items": [
    {
      "type": "Text",
      "text": "${ordinal}. ${data.name}",
      "style": "textStyleListItem",
      "inheritParentState": true
    }
  ],
  "onPress": [
    {
      "type": "SpeakItem"
    }
  ]
}

The following examples show the complete APL document and data source that displays a list of items. The user can select the button to hear all items on the list, or select an individual item to hear a single item.

A Sequence component with an AlexaButton to start the SpeakList command
A Sequence component with an AlexaButton to start the SpeakList command

Return the RenderDocument directive

To use the embedded audio, return the Alexa.Presentation.APL.RenderDocument directive that includes both the APL document and the APL for audio document as described in Include the audio response as part of the RenderDocument directive.

Play the audio in response to user interactions

You can run the SpeakItem or SpeakList commands in response to user interactions, such as when the user taps the screen or makes a request by voice.

The overall steps are the same as described in Embed the audio in the visual response:

  1. Configure the aplAudioToSpeech transformer and bind the transformer output to the speech property of a component.
  2. For a tap event, such as tapping a button, run the SpeakItem or SpeakList command from a handler on the component the user can tap.
  3. For a voice request, create an intent in your interaction model to capture the request. In the handler for this intent, return the ExecuteCommands directive with the SpeakItem or SpeakList command and the ID of the component with the speech property set.

For the best user experience, make your skill respond to both tap events and voice requests. The user can then choose how to interact with your skill.

Respond to tap events with the audio

The following APL document displays a list of sounds, similar to the example in Play audio for each item in a list. This example improves the overall look of the content by using the AlexaTextListItem responsive component instead of a custom component. The primaryAction property on an AlexaTextListItem specifies the command to run when the user taps the list item.

An improved Sequence that uses AlexaTextListItem for the list items. Alexa highlights the item being spoken
An improved Sequence that uses AlexaTextListItem for the list items. Alexa highlights the item being spoken

Respond to spoken requests with the audio

Users expect to interact with Alexa by voice, even when viewing content on the screen. For example, when viewing a list of items, the user might want to ask for a specific item with an utterance like "select the second one."

Create intents to capture the voice request relevant to your APL content. To respond to an intent with audio embedded in the document, invoke SpeakItem or SpeakList with the Alexa.Presentation.APL.ExecuteCommands directive. When your skill returns the ExecuteCommands directive, Alexa speaks any outputSpeech in the response first, and then invokes the specified commands.

Continuing the previous example with the list of sounds, add multiple intents to fully voice-enable the visual content:

  • Use the built-in intent AMAZON.SelectIntent to let the user ask for a particular item on the list. The user can say phrases like "select the fourth item." The AMAZON.SelectIntent sends an IntentRequest to your skill that includes information about the selected item.
  • Create a custom intent to let the user ask to read the entire list. For example, this intent might include the utterances "listen to these sample sounds" and "read me this list."
  • Create a custom intent with a slot to let the user ask to for an item by name. This lets the user say utterances like "play the bird forest one sound." Use a slot on this intent to collect the name of the sound to play.

For each of these intents, your skill responds with the ExecuteCommands directive and either the SpeakList or SpeakItem command.

Example: Respond to AMAZON.SelectIntent with audio for a single list item

The built-in AMAZON.SelectIntent intent includes a slot called ListPosition, which captures the ordinal position of the list item and converts it to a number. When the user says "select the third one," the ListPosition slot value is 3.

The ListPosition slot also can provide the component ID of the list item when both of the following are true:

  • The item the user asks about is visible on the screen.
  • The component for the item has an ID defined.

You can use this intent and slot to capture the specific item the user requests and respond with ExecuteCommands to play the speech associate with that item.

The following handler runs when both of the following conditions are true:

The handler attempts to get the item the user asked about from the ListPostion slot and responds with ExecuteCommands to play the audio associated with the selected item.