APL Visual Context in the Skill Request

The Alexa Presentation Language (APL) visual context provides your skill with information about the content displayed on the screen when the user invokes an intent or triggers a user event. Your skill can use the context to determine the state of on-screen elements, such as which parts of a list are visible on the screen.

About the visual context

The APL Visual context is information sent to your skill about the content the user sees on the device. The context provides both structural and semantic information about the content the user sees:

  • Structural: How visual components appear on the screen – For example, there's a picture on the left, some scrolling text on the right, and two buttons under the text.
  • Semantic: What the components represent – For example, the picture is a picture of a box of a particular brand of protein bars, the text describes these protein bars, the left button is a "more information" button and the right button is a "buy now" button.

The APL runtime constructs and reports the structural context. To make this semantic context useful, you provide information in your APL document that describes the meaning of your components. You can define the semantic data for a component in two places:

  • Component id property – The visual context includes the id you provide for a component.
  • Component entities property – The visual context includes the entities you provide for a component.

In the earlier example, the entities for the protein bar picture might contain an identifier for the product. The Component id for the two buttons might be buttonTellMeMore and buttonBuyNow to identify the buttons.

Visual context in the skill request

A request sent to your skill includes the visual context when the user's device has a screen and the screen is displaying an APL document your skill sent with the RenderDocument directive.

The visual context is available in the Alexa.Presentation.APL property within the top-level context property in the request. The top-level context has the properties shown in the following table.

Property Type Description



The token identifying the document displayed on the device. You define the token in the RenderDocument directive you return to display the document.



The version of the APL runtime that reported the visual context.



Contains the elements that were visible on the screen when the user triggered the request to your skill. For details about the properties within an element, see Context core properties.

The following example shows the visual context when the screen was displaying a touchable element with the ID fadeHelloTextButton.

  "version": "1.0",
  "session": {},
  "context": {
    "Viewports": [],
    "Alexa.DataStore.PackageManager": {
      "installedPackages": []
    "Viewport": {},
    "Extensions": {},
    "System": {},
    "Alexa.Presentation.APL": {
      "token": "helloworldWithButtonToken",
      "version": "AriaRuntimeLibrary-2023.2.449.0",
      "componentsVisibleOnScreen": [
          "uid": ":1000",
          "position": "960x480+0+0:0",
          "type": "text",
          "tags": {
            "viewport": {}
          "children": [
              "id": "fadeHelloTextButton",
              "uid": ":1002",
              "position": "273x76+344+360:0",
              "type": "text",
              "tags": {
                "focused": false,
                "clickable": true
              "entities": []
          "entities": []
  "request": {}

Context core properties

The componentsVisibleOnScreen array organizes the visual elements into a hierarchy. A given element can contain child elements. For example, an element representing a scrolling region of the screen might include one or more child elements that represent the list times displayed within the scrolling region.

Each element in the hierarchy corresponds to a single component in your APL document. However, the context doesn't include all components defined in your APL document. For details, see Rules for generating the element hierarchy.

The following table defines the core properties for an element in the context.

Property Type Default Source Description





Array of child elements




Component entity property

Array of entity data copied from the component




Component id property

The id of the component





Global position of the element, as seen by a user.




Component role property.

The role or purpose of the component





Any number of valid element tags




Component transform property

A visual transformation applied against the position.


One of: graphic, text, mixed, video, empty



Describes the visual appearance of the element




Generated by the runtime

The unique runtime-generated id for the component





Relative visibility of the element.

To save space, the context omits properties that contain default values. For example, assume the device is displaying the following TouchWrapper component that contains a Text component.

  "type": "TouchWrapper",
  "id": "idForTheTouchWrapper",
  "item": [
      "type": "Text",
      "id": "idForTextWrappedInTouchWrapper",
      "text": "Text component wrapped in the TouchWrapper",
      "inheritParentState": true,
      "style": "touchableText"
  "onPress": []

The TouchWrapper is fully visible on the screen when the user invokes an intent that sends a request to your skill. The context reports the element as shown in the following example.

  "id": "idForTheTouchWrapper",
  "uid": ":1022",
  "position": "952x51+35+224:0",
  "type": "text",
  "tags": {
    "focused": false,
    "clickable": true
  "entities": []

Because the TouchWrapper is fully visible, the visibility property contains the default value 1. Alexa therefore omits this property. The TouchWrapper has no values for the transform property. Alexa omits this property. The Text child component for this TouchWrapper doesn't meet any of the requirements to be included in the context Therefore, Alexa omits the children property.


An array containing the elements that fall logically under this element. For example, a scrolling list might contain multiple child elements in the list. The element defines the order of the children in the array.

An element in the context omits the children property when the element has no reported children.

When generating the element hierarchy, the children of a component that isn't reported are attached to the parent of the component. For details on when a component isn't reported, see Rules for generating the element hierarchy.

For example, a document might have the following hierarchy of components.

Container "A" - reported
  Container "B" - not reported
    Text "C" - reported
    Image "D" - not reported
    Container "F" - not reported
      Text "G" - reported
      Text "H" - reported

This set of components produces the following element hierarchy in the context.

Element "A", type "mixed"
  Element "C", type "text"
  Element "G", type "text"
  Element "H", type "text"

In this hierarchy, the children of "Container B" and "Container F" are within "A" because the context doesn't include "B" or "F."


An array of entity data copied from the component. This data is opaque. You can provide data in the entities property for the component to describe the meaning of the component.

When you set the entities property for a component, provide an array of objects. The object can have the properties id, type, and value. Any other properties aren't included in the visual context.


The id property for the component as specified in the APL document. An element in the context omits the id property when the corresponding component doesn't have an id property.


An identifier generated by the APL runtime. Each component is assigned a uid. The value is an opaque string and is guaranteed to be unique in the scope of the document and not clash with any assigned id value. Each element in the context always includes the uid property.


Specifies the position of the element on the screen, in the form of a 5-tuple of width, height, x-position, y-position, and layer. These values are in global coordinates, and aren't relative to the parent element. The values are the default or resting position of the items before applying any transformations. For details about transformations, see the component transform property.

For compactness and interpretation, the position is a single string:

"position": "<WIDTH>x<HEIGHT>[+-]<XPOSITION>[+-]<YPOSITION>:<LAYER>"

The numeric values reported are dimensionless non-negative integers. The x and y-positions are measured from the top-left of the viewport. The layer value must meet the following requirements:

  • No two elements have the same layer
  • When two elements overlap, the element with the larger layer value is drawn on top of the element with the lower layer value.

The reported position is always in global coordinates. Using global coordinates ensures that the position uses the perspective of the user. You can also compare the relative position of any two elements.

The following example shows how the position value represents different positions on the screen.

1280x800+0+0:0     // Top-level element on a 1280x800 dp screen
620x780+10+10:1    // The left column of the above top-level element
620x780+650+10:2   // The right column of the above top-level element

Each element in the context always includes the position property.


The role property is the value of the role property for the component. The role property is omitted if the component does not have a role property set to a value.


A map of attributes and data about those attributes. An element in the context includes the tags property when the element has at least one tag.

For details about the possible tags, see Element tags.


A 6-element array containing the 2D homogeneous transformation matrix applied against this element. The center of the transformation coordinate system is the center of the component. The transformation array is ordered as [A,B,C,D,Tx,Ty].

The transform property is reported if the transformation isn't the identify transformation.


An enumerated value that describes how the user perceives the element. The following table shows the valid type values.

Type Description


An empty component with no visible content.


A bitmap image or vector graphic


A blend of graphics, video, and text


Human-readable text


A video player

Alexa uses the rules shown in the following table to generate the type for an element.

Component Rules


The combination of all visible children. For example, if all the visible child components map to text, the type for the Container is text. If the visible children map to different types of elements, the type for the Container is mixed.




The child type.


The combination of all visible children.


The combination of all visible children.




The child type of the current page.


The combination of all visible children.




The child type.


The combination of all visible children.





The combination of any two of text, graphic, or video is mixed. The type property defaults to empty if the component has no valid content and has no children. An element in the context always includes the type property.


The visibility property is an approximate calculation of how well the user can see the object. The visibility is defined as the percentage of the bounding box of the element that's visible in its parent multiplied by the opacity of the element.

For example, assume a vertically scrolling list where the last item in the list is 50 percent off the screen and has an 80 percent opacity. The visibility for this item is 40%, which is reported as 0.4.

The visibility calculations don't consider applied component transform values. The visibility calculation also doesn't consider that a component might be obscured by a child component on top of it.

Components with a display property of invisible or none have zero visibility.

An element in the context includes the visibility property when the visibility is greater than zero. The element omits the visibility property when the value is 1 (the default, fully visible). The element includes the visibility property when the value is zero in certain circumstances. For details on when items with zero visibility are reported, see Rules for generating the element hierarchy.

Element tags

An element tag provides additional information about the element. Most elements included in the context contain at least one tag. An element can have multiple tags.

The following table lists the available tags.

Tag Type Description Created By



The checked state of a component that has two states.

Any component with the checked state set to true.



A button or item that the user can press.

Touchable component



True when this component is disabled.

Component with the disabled state.



The focused state of a component that can take focus

The following components:

  • EditText
  • GridSequence
  • Pager
  • ScrollView
  • Sequence
  • TouchWrapper
  • VectorGraphic



An ordered list of items

Sequence, GridSequence



Information about a Sequence child

Sequence or GridSequence child component



Media player




A visibly numbered element

Sequence, GridSequence, or Container child component



A collection of objects displayed one at a time.




A region of the screen that can scroll.

ScrollView, Sequence, or GridSequence



A region of the screen that can be read by text-to-speech

Component with the speech property



The entire screen in which a document is rendered

Top-level component

Each tag is either a basic data type (Boolean, String, or Integer) or an Object data type containing more granular information.


Boolean tag indicating that the checked state for the component is true. Because all components can have a checked state, any type of component might report the checked tag.

  "id": "XXXX",
  "uid": ":1234",
  "position": "10x10+0+0:0",
  "type": "graphic",
  "tags": {
    "checked": true

A component with the inheritParentState property set to true doesn't report the checked tag. To save space, an element in the context reports the checked tag when its value is true.


Boolean tag indicating that this component can be "clicked." This means that the user can activate the component by touch, from a keyboard, or with a remote. All touchable components are clickable.

  "id": "XXXX",
  "uid": ":1234",
  "position": "10x10+0+0:0",
  "type": "mixed",
  "tags": {
    "clickable": true,
    "focused": false

An element in the context includes the clickable tag if it's a touchable component. The tag returns true for touchable components with the disabled state.


Boolean tag indicating that the user can't interact with this component. All components can set the disabled state, including components that don't receive clicks or focus. The following example shows the element reported for a disabled Text component with the checked state.

  "id": "XYZZY",
  "uid": ":1235",
  "position": "100x50+10+10:5",
  "type": "text",
  "tags": {
    "disabled": true,
    "checked": true

Unlike the checked state, the disabled state is reported for components that have the inheritParentState property set. To save space, the disabled tag is reported when it's true.


Boolean tag indicating that a component can take keyboard focus. The value of the tag indicates the current state of the control. For example, a touchable item that doesn't have focus reports the focused tag as false.

  "id": "XXXX",
  "uid": ":1236",
  "position": "10x10+0+0:0",
  "type": "text",
  "tags": {
    "focused": false,
    "clickable": true

The context includes the focused tag when the component can take focus. The following components can take focus:

The focused tag reports true if the component has focus and false if the component doesn't have focus.


An object with a collection of properties reported for a Sequence, FlexSequence, or GridSequence. The list tag object contains properties shown in the following table.

Property Type Description



Total number of items in the list.



The index of the highest item seen



The ordinal of the highest ordinal-equipped item seen



The index of the lowest item seen



The ordinal of the lowest ordinal-equipped item seen

Lists track the lowest and highest index/ordinal seen so you can make informed inferences about what the user might have observed on the screen. For example, if a new list displays ordinals 10 through 20, but 10 through 12 are visible on the screen, it's reasonable to disallow the user from saying "pick number 18" because the user doesn't know what item 18 contains.

The following example shows a list tag.

  "id": "myListOfDogs",
  "uid": ":138",
  "position": "1280x800+0+0:0",
  "type": "mixed",
  "tags": {
    "list": {
      "itemCount": 190,
      "lowestIndexSeen": 0,
      "highestIndexSeen": 3,
      "lowestOrdinalSeen": 1,
      "highestOrdinalSeen": 4
    "scrollable": {
      "direction": "vertical",
      "allowForward": true,
      "allowBackwards": true
    "focused": false
  "children": [
      "position": "800x600+20+33:0",
      "uid": ":2352",
      "type": "mixed",
      "tags": {
        "clickable": true,
        "ordinal": 2,
        "listitem": {
          "index": 2
      "position": "800x600+20+633:0",
      "uid": ":23112",
      "visibility": 0.16,
      "type": "mixed",
      "tags": {
        "clickable": true,
        "ordinal": 3,
        "listItem": {
          "index": 3

The list tag isn't reported for an empty Sequence or GridSequence.


The total number of items in the list. If the length of the list is unknown, the itemCount is –1.


The highest index of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even when it's a small number of pixels.

The highestIndexSeen value is zero-based. For example, if a list contains three items and all displayed on the screen, the highestIndexSeen is 2.


The highest ordinal value of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even when it's a small number of pixels. This tag applies to list items with an ordinal value. A list item has an ordinal value when the component for the item has the ordinal property set.

The highestOrdinalSeen value is reported when at least one child with an ordinal value has been seen, either currently or in the past.


The lowest index of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even if it's a small number of pixels.

The lowestIndexSeen value is zero-based. An APL list is commonly first displayed with the lowest item in the list visible on the screen. Therefore, lowestIndexSeen usually returns zero.


The lowest ordinal value of any child seen for this list. An item is "seen" if any part of the item was displayed on the screen, even if it was just a few pixels. This tag applies to list items with an ordinal value. A list item has an ordinal value when the component for the item has the ordinal property set.

The lowestOrdinalSeen value is reported when at least one child with an ordinal value has been seen, either currently or in the past.


Information about the child of a Sequence or GridSequence. The listItem property has the properties shown in the following table.

Property Type Description



Zero-based index of this element in its parent

The listItem is reported as an object to reserve space for reporting the row and column of a list item displayed in a grid.


The index of this list item in its parent. The index is zero-based. You can compare the index with the lowestIndexSeen and highestIndexSeen values in the list tag.


An object with a collection of properties reported for a media player, such as a video player. The media tag describes the current state of the media player and what operations are possible on the media player. The media tag object has the properties shown in the following table.

Property Type Description



Can seek forward relative to the current position



Can seek backwards relative to the current position.



Can move forward to the next track.



Can move backward to the previous track.


One of: foreground, background, none

The audio track the media plays on.


Array (default [])

Current track entity data



Whether audio playback is currently muted.



Current position of the play head from the start of the track.


One of: idle, playing, paused

The current operating state.



Current track source URL

The media tag is reported if there is at least one media track available for playing.

The following example shows the media tag.

  "id": "myVideoPreview",
  "uid": ":1138",
  "position": "1024x600+0+0:0",
  "type": "video",
  "tags": {
    "media": {
      "allowAdjustSeekPositionForward": true,
      "allowAdjustSeekPositionBackwards": true,
      "allowNext": true,
      "allowPrevious": false,
      "audioTrack": "foreground",
      "muted": false,
      "entities": [
      "positionInMilliseconds": 34214,
      "state": "playing",
      "url": "https://myvideolocationhere"


When true, this media track supports seeking forward in time to a new position. Live media streams normally report false for this property.


When true, this media track supports seeking backwards in time to a new position. Live media streams normally report false for this property.


When true, the media player can advance forward to the next media track. The allowNext property is false if the media player is on the final track.


If true, the media player can move back to the previous media track. The allowPrevious property is false if the media player is on the first track.


The audio track the media player uses when playing.


Entity data associated with the current media track (see video_source_property_entity). The media object omits the entities property when the media track doesn't have any entity data associated with it.


When true, audio playback for the media is currently muted.


The media player head position within the current media track, measured in milliseconds.


The current playing state of the media track. The playing state is one of the following values:

Name Description


The media player hasn't played any content.


The media player has played some content, but is now paused.


The media player is actively playing content.


The URL of the current media track.


Reported if the element has a defined ordinal value. The ordinal value is a natural number (a positive integer). The ordinal tag is assigned to children of a Multi-child component with the numbered property set to true.

  "id": "myListItem8",
  "uid": ":1231",
  "position": "200x100+23+26:1",
  "type": "text",
  "tags": {
    "ordinal": 6     // Ordinal is not always equal to index - 1.
    "clickable": true,
    "focused": false

An element with a listItem might also compare its ordinal value against the highestOrdinalSeen and lowestOrdinalSeen values of the parent list tag.


An object with a collection of properties reported for a Pager component if it has at least two pages. The pager tag object has the properties shown in the following table.

Property Type Description



Index of the current page. The index for a Pager is zero-based. Therefore, the first page has an index of 0.



Total number of pages



When true, the user can move the pager forward.



When true, the user can move the pager backwards.

The allowForward and allowBackwards properties indicate what the user can do, based the navigation property for the Pager and the current page. These properties don't consider what you can do programmatically with the SetPage command.

For example, assume navigation is normal, which lets the user navigate freely back and forth in the Pager.

  • When the Pager is on the first page, allowForward reports true and allowBackwards reports false.
  • When the Pager is on a page that's neither the first nor the last page, both properties report true.
  • When the Pager is on the last page, allowForward reports false and allowBackwards reports true.

When navigation is none, the user can't navigate the Pager at all. In this scenario, both properties report false, regardless of the page displayed.

When navigation is wrap, the user can always navigate forward or backwards. When the user is on the last page, navigating forward wraps back to the first page. In this scenarios, both properties report true, regardless of the page displayed.

The following example shows the pager tag.

  "id": "weatherPager",
  "uid": ":111",
  "position": "1024x600+0+0:0",
  "type": "mixed",
  "tags": {
    "pager": {
      "index": 0,
      "pageCount": 4,
      "allowForward": true,
      "allowBackwards": false
    "focused": false


An object indicating that a region can scroll forward or backwards. The following components can scroll content:

These components report the scrollable tag when the component contains enough content to require scrolling. When all the content within the component is fully visible, the visual context doesn't include the scrollable tag.

The scrollable tag object has the properties shown in the following table.

Property Type Description


One of: horizontal, vertical

Direction of scrolling.



When true, the content in the scrolling area can scroll forward.



When true, the content in the scrolling area can scroll backwards.

For example, assume a Sequence contains 10 items and is large enough to display 5 items at a time.

  • When the component shows the items with the index zero through four, allowForward is true and allowBackward is false.
  • When the user scrolls down to show items with the index two through six, both allowForward and allowBackward are true.
  • When the user scrolls all the way to the end of the list, allowForward is false and allowBackward is true.

The scrollable tag is reported when either allowForward or allowBackward is true. When both properties are false, the tag isn't included.

The following example shows a scrollable tag.

  "id": "todoList",
  "uid": ":211",
  "position": "1024x550+0:50:0",
  "type": "mixed",
  "tags": {
    "scrollable": {
      "direction": "horizontal",
      "allowForward": true,
      "allowBackwards": false
    "focused": false


Boolean tag indicating that this element has content that Alexa can read out loud. Any component that sets the speech property returns the spoken tag.

The following example shows spoken tag.

  "id": "myListItem",
  "uid": ":444",
  "position": "800x80+72+437:0",
  "type": "text",
  "tags": {
    "clickable": true,
    "focused": true,
    "spoken": true


The viewport tag is reserved for the top-level element on a screen. The viewport tag holds meta-data about actions on that screen. The following properties are reported for the viewport tag:

Property Type Description



UTC time in milliseconds. For details, see the utcTime property in the data-binding context.



Run time in milliseconds of this document. For details, see the elapsedTime property in the data-binding context.


Array of changes

Temporal changes to requested component states.


  "id": "top",
  "uid": ":101",
  "position": "480x480+0+0:0",
  "type": "mixed",
  "tags": {
    "viewport": {
      "utcTime": 1728414842835,
      "elapsedTime": 500,
      "trackedChanges": [
        { "uid": ":1001", "name": "playingState", "from": "idle", "to": "playing", "utcTime": 1728414842430 },
        { "uid": ":1002", "name": "playingState", "from": "idle", "to": "playing", "utcTime": 1728414842630 },
        { "uid": ":1001", "name": "playingState", "from": "playing", "to": "paused", "utcTime": 1728414842830 }


An array of changes that occurred for specific components, as configured with the trackChanges component property.

Each change in the array has the properties shown in the following table.

Property Type Description



Unique ID of the Component



Changed property



Old value of the property



New value of the property



Timestamp of the change in UTC time.

The system guarantees to keep the last change record for every tracked property. More historical data might be available as long as there is still free capacity (configured by the runtime) in the trackedChanges array. The system removes the oldest records first.

Rules for generating the element hierarchy

The following rules apply when generating the element hierarchy for the visual context:

  • The top-level component always generates an element with the viewport tag.
  • A component is reported as an element if it's visible on the screen and has at least one of the following attributes:
    • Non-empty entities property.
    • True clickable tag.
    • A media tag.
    • A pager tag.
    • A scrollable tag.
    • A spoken tag.
  • A component that isn't visible might still be reported as an element if both of the following conditions are true:
    • The component has a non-empty entities property
    • The component was visible on the screen at some time in the past

The intent of reporting non-visible components is to allow the user to refer to an item that was visible on the screen and might have scrolled out of view. The context reporting system doesn't guarantee that every previously visible item is reported. The system keeps a window of recently visible items and reports back the most recently seen elements.

Nested element example

The following example shows how components nest in normal reporting. The example shows an APL document that displays a background image, a text element corresponding to the title of an article, and a scrolling element holding the content. The content is further divided into a labeled image element and a text element.

The top-level Container has a single entity, and the text content has the speech property set.

For this document, the rules to generate the hierarchy therefore produce the following:

  • The top-level Container is reported because it has entity data assigned.
  • The background image isn't reported because it has no entity data and no other tags apply.
  • The title isn't reported because it has no entity data and no other tags apply.
  • The scrolling region is reported because it has a valid scrollable tag.
  • The small picture isn't reported because it doesn't have entity data and no other tags apply.
  • The large text is reported because it has a spoken tag.

The resulting element hierarchy might look like the following.

  "id": "top-level",
  "uid": ":9549",
  "position": "960x480+0+0:0",
  "type": "mixed",
  "tags": {
    "viewport": {}
  "children": [
      "id": "scrollingRegion",
      "uid": ":9552",
      "position": "960x349+0+131:1",
      "type": "mixed",
      "tags": {
        "focused": false,
        "scrollable": {
          "direction": "vertical",
          "allowForward": false,
          "allowBackwards": true
      "children": [
          "id": "articleId",
          "uid": ":9555",
          "position": "832x1410+64-1002:1",
          "type": "text",
          "tags": {
            "spoken": true
          "visibility": 0.20000000298023224,
          "entities": []
      "entities": []
  "entities": [
      "id": "mainPage"

The example shows allowForward as false because the user scrolled to the bottom of the content and then made an utterance that sent a request to the skill.

Was this page helpful?

Last updated: Dec 18, 2024