Alexa Multi-Modal API Overview


The Alexa Voice Service (AVS) provides a collection of APIs for Alexa Built-in devices with visual displays, such as smart TVs, which allow those devices to receive and present Alexa visual responses. Some of these APIs help device makers to publish the visual characteristics and functionality of their visual displays to enable Alexa Skills and services to respond with appropriate visual content for a device display.

To add visual Alexa experiences to your device, you have several options for different levels of implementation:

  • Full APL integration – To add a visual Alexa experience to your device, integrate the APL Core Library and implement an APL View Host in your code. See the APL Core Library repository and documentation to learn how to implement these two requirements.
  • Receive and render APL on a device – To receive and render APL documents on your device screen, implement the Alexa.Presentation.APL namespace.
  • Publish device characteristics to Alexa – To publish information about the visual characteristics of your device and its display for Alexa Skill developers, you can implement Alexa.Display, Alexa.Display.Window, and Alexa.InteractionMode.
  • AVS Device SDK – Simplify your APL implementation by integrating the AVS Device SDK, which includes a sample app for voice-only devices and an IPC Server sample app for smart screen devices: AVS Device SDK repository.

Alexa APIs for publishing device characteristics

Use the following set of APIs to publish your device characteristics to Alexa to help Alexa Skill developers deliver appropriate content experiences to your device. Click the link for each API to learn more about the APL individual interfaces and their parameters:

  • Alexa.Display: Describes the raw properties of a device display.
  • Alexa.Display.Window: Describes the properties of a window that a device can display.
  • Alexa.Display.Window.WindowState: The Alexa.Display.Window.WindowState event reports the available windows to Alexa. A device then receives APL directives targeted for one of the reported active windows.
  • Alexa.InteractionMode: Describes the supported interaction modes for a device, such as "touch," "keyboard," or "video."

Alexa APIs for rendering visual experiences on devices

Use the following set of APIs to enable your device to render multimodal APL documents. Click the link for each API to learn more about the APL individual interfaces and their parameters:

  • Alexa.Presentation: The Alexa.Presentation namespace extends the Alexa Voice Service (AVS) to include rendering interactive visual experiences on a display.
  • Alexa.Presentation.APL: The Presentation.APL namespace contains APIs that render an APL document and report user interactions from the device.
  • Alexa.Presentation.APL.Video: Describes the video codecs that the APL runtime supports for a device.

APL vs. TemplateRuntime

APL replaces the TemplateRuntime Interface for visual experiences on Alexa Built-in devices.

If you already implemented support for the TemplateRuntime API, you also had to implement your own layouts and rendering solutions for presenting TemplateRuntime directive metadata for visual experiences. APL simplifies this implementation by including UI markup with metadata so that you no longer have to implement your own layouts and rendering solutions. For more details, see the APL Core Library repository and documentation.

About viewports

Keep the following guidelines in mind when planning viewport support for your device:

  • TV Devices must support a full screen viewport and could optionally support one overlay viewport.
  • Responders aren't required to support non-standard viewports with APL directives.
  • APL responses on non-standard viewports scale to fit the viewport.

APL-enabled Alexa Skills must support all required viewport types as described in the next section.

TV-based viewport support

APL-enabled Alexa Skills targeting TV screens must support following required viewport:

  • TV Fullscreen (Required): Fullscreen rectangle display, such as all Fire TV devices. 1080p - 960 x 540 (dp), 16 : 9.
  • TV Overlay Landscape (Optional): Rectangle display. Not yet supported by any skill or domain. 1080p - 960 x 200 (dp), 24 : 5.

Hub-based viewport support

APL-enabled Alexa Skills targeting hub displays must support at least the three following required viewports:

  • Hub Round (Required): Round display, such as the Echo Spot. 480 x 480 (dp).
  • Hub Landscape Medium (Required): Rectangle display, such as the first generation Echo Show. 1024 x 600 (dp), ~16 : 9.
  • Hub (Rectangle) Landscape Large (Required): Rectangle display, such as the second generation Echo Show. 1280 x 800 (dp), 16 : 10.
  • Hub (Rectangle) Landscape Small (Optional): Rectangle display, such as the Echo Show 5. 960 x 480 (dp), 2 : 1.

Visual characteristic examples

The following examples show how to communicate visual characteristics to Alexa for two common devices: a Smart Screen device and a smart TV. The first example is for a Smart Screen device, which has a viewing distance interaction mode similar to an Echo Show. The second example is for a smart TV, which has two windows with different interaction modes.

Smart Screen example

Consider a 8.6 inch Smart Screen device with a standard seven foot viewing distance interaction mode, similar to an Echo Show. For this example, the following example describes one display, one window, and a single interaction mode for that window:

[
  {
    "type": "AlexaInterface",
    "interface": "Alexa.InteractionMode",
    "version": "1.0",
    "configurations": {
      "interactionModes": [
        {
          "id": "smart_screen",
          "uiMode": "HUB",
          "interactionDistance": {
            "unit": "INCHES",
            "value": 84
          },
          "touch": "SUPPORTED",
          "keyboard": "UNSUPPORTED",
          "video": "SUPPORTED",
          "dialog": "SUPPORTED"
        }
      ]
    }
  },
  {
    "type": "AlexaInterface",
    "interface": "Alexa.Presentation.APL.Video",
    "version": "1.0",
    "configurations": {
      "video": {
        "codecs": [
          "H_264_42",
          "H_264_41"
        ]
      }
    }
  },
  {
    "type": "AlexaInterface",
    "interface": "Alexa.Display.Window",
    "version": "1.0",
    "configurations": {
      "templates": [
        {
          "id": "smartScreenLandscape",
          "type": "STANDARD",
          "configuration": {
            "sizes": [
              {
                "type": "DISCRETE",
                "id": "fullscreen",
                "value": {
                  "unit": "PIXEL",
                  "value": {
                    "width": 1280,
                    "height": 800
                  }
                }
              }
            ],
            "interactionModes": [
              "smart_screen"
            ]
          }
        }
      ]
    }
  },
  {
    "type": "AlexaInterface",
    "interface": "Alexa.Display",
    "version": "1.0",
    "configurations": {
      "display": {
        "type": "PIXEL",
        "touch": [
          "SINGLE"
        ],
        "shape": "RECTANGLE",
        "dimensions": {
          "resolution": {
            "unit": "PIXEL",
            "value": {
              "width": 1280,
              "height": 800
            }
          },
          "physicalSize": {
            "unit": "INCHES",
            "value": {
              "width": 8.6,
              "height": 5.4
            }
          },
          "pixelDensity": {
            "unit": "DPI",
            "value": 160
          },
          "densityIndependentResolution": {
            "unit": "DP",
            "value": {
              "width": 1280,
              "height": 800
            }
          }
        }
      }
    }
  }
]

Smart TV example

Consider a smart TV with two different types of windows for displaying content: a partial overlay window covering the lower third of the screen, and a standard, full screen window. Assume a 10 foot viewing distance with remote keyboard support for both windows. The full screen window can display video, but the overlay can't. For this example, the partial screen window, marked as OVERLAY, overlays other active content on the device, and the full screen window, marked as STANDARD, replaces active content:

[
  {
    "type": "AlexaInterface",
    "interface": "Alexa.InteractionMode",
    "version": "1.0",
    "configurations": {
      "interactionModes": [
        {
          "id": "tv",
          "uiMode": "TV",
          "interactionDistance": {
            "unit": "INCHES",
            "value": 130
          },
          "touch": "UNSUPPORTED",
          "keyboard": "SUPPORTED",
          "video": "SUPPORTED"
        },
        {
          "id": "tv_overlay",
          "uiMode": "TV",
          "interactionDistance": {
            "unit": "INCHES",
            "value": 130
          },
          "touch": "UNSUPPORTED",
          "keyboard": "SUPPORTED",
          "video": "UNSUPPORTED",
          "dialog": "SUPPORTED"
        }
      ]
    }
  },
  {
    "type": "AlexaInterface",
    "interface": "Alexa.Presentation.APL.Video",
    "version": "1.0",
    "configurations": {
      "video": {
        "codecs": [
          "H_264_42",
          "H_264_41"
        ]
      }
    }
  },
  {
    "type": "AlexaInterface",
    "interface": "Alexa.Display.Window",
    "version": "1.0",
    "configurations": {
      "templates": [
        {
          "id": "tvFullscreen",
          "type": "STANDARD",
          "configuration": {
            "sizes": [
              {
                "type": "DISCRETE",
                "id": "fullscreen",
                "value": {
                  "unit": "PIXEL",
                  "value": {
                    "width": 1920,
                    "height": 1080
                  }
                }
              }
            ],
            "interactionModes": [
              "tv"
            ]
          }
        },
        {
          "id": "tvOverlayLandscape",
          "type": "OVERLAY",
          "configuration": {
            "sizes": [
              {
                "type": "DISCRETE",
                "id": "landscapePanel",
                "value": {
                  "unit": "PIXEL",
                  "value": {
                    "width": 1920,
                    "height": 400
                  }
                }
              }
            ],
            "interactionModes": [
              "tv_overlay"
            ]
          }
        }
      ]
    }
  },
  {
    "type": "AlexaInterface",
    "interface": "Alexa.Display",
    "version": "1.0",
    "configurations": {
      "display": {
        "type": "PIXEL",
        "touch": [
          "UNSUPPORTED"
        ],
        "shape": "RECTANGLE",
        "dimensions": {
          "resolution": {
            "unit": "PIXEL",
            "value": {
              "width": 1920,
              "height": 1080
            }
          },
          "physicalSize": {
            "unit": "INCHES",
            "value": {
              "width": 56.7,
              "height": 31.9
            }
          },
          "pixelDensity": {
            "unit": "DPI",
            "value": 320
          },
          "densityIndependentResolution": {
            "unit": "DP",
            "value": {
              "width": 960,
              "height": 540
            }
          }
        }
      }
    }
  }
]

Was this page helpful?

Last updated: Nov 27, 2023