APL Visual Context in the Skill Request


The Alexa Presentation Language (APL) visual context provides your skill with information about the content displayed on the screen when the user invokes an intent or triggers a user event. Your skill can use the context to determine the state of on-screen elements, such as which parts of a list are visible on the screen.

About the visual context

The APL Visual context is information sent to your skill about the content the user sees on the device. The context provides both structural and semantic information about the content the user sees:

  • Structural: How visual components appear on the screen – For example, there's a picture on the left, some scrolling text on the right, and two buttons under the text.
  • Semantic: What the components represent – For example, the picture is a picture of a box of a particular brand of protein bars, the text describes these protein bars, the left button is a "more information" button and the right button is a "buy now" button.

The APL runtime constructs and reports the structural context. To make this semantic context useful, you provide information in your APL document that describes the meaning of your components. You can define the semantic data for a component in two places:

  • Component id property – The visual context includes the id you provide for a component.
  • Component entities property – The visual context includes the entities you provide for a component.

In the earlier example, the entities for the protein bar picture might contain an identifier for the product. The Component id for the two buttons might be buttonTellMeMore and buttonBuyNow to identify the buttons.

Visual context in the skill request

A request sent to your skill includes the visual context when the user's device has a screen and the screen is displaying an APL document your skill sent with the RenderDocument directive.

The visual context is available in the Alexa.Presentation.APL property within the top-level context property in the request. The top-level context has the properties shown in the following table.

Property Type Description

token

String

The token identifying the document displayed on the device. You define the token in the RenderDocument directive you return to display the document.

version

String

The version of the APL runtime that reported the visual context.

componentsVisibleOnScreen

Array

Contains the elements that were visible on the screen when the user triggered the request to your skill. For details about the properties within an element, see Context core properties.

The following example shows the visual context when the screen was displaying a touchable element with the ID fadeHelloTextButton.

{
  "version": "1.0",
  "session": {},
  "context": {
    "Viewports": [],
    "Alexa.DataStore.PackageManager": {
      "installedPackages": []
    },
    "Viewport": {},
    "Extensions": {},
    "System": {},
    "Alexa.Presentation.APL": {
      "token": "helloworldWithButtonToken",
      "version": "AriaRuntimeLibrary-2023.2.449.0",
      "componentsVisibleOnScreen": [
        {
          "uid": ":1000",
          "position": "960x480+0+0:0",
          "type": "text",
          "tags": {
            "viewport": {}
          },
          "children": [
            {
              "id": "fadeHelloTextButton",
              "uid": ":1002",
              "position": "273x76+344+360:0",
              "type": "text",
              "tags": {
                "focused": false,
                "clickable": true
              },
              "entities": []
            }
          ],
          "entities": []
        }
      ]
    }
  },
  "request": {}
}

Context core properties

The componentsVisibleOnScreen array organizes the visual elements into a hierarchy. A given element can contain child elements. For example, an element representing a scrolling region of the screen might include one or more child elements that represent the list times displayed within the scrolling region.

Each element in the hierarchy corresponds to a single component in your APL document. However, the context doesn't include all components defined in your APL document. For details, see Rules for generating the element hierarchy.

The following table defines the core properties for an element in the context.

Property Type Default Source Description

children

Array

[]

Calculated

Array of child elements

entities

Array

[]

Component entity property

Array of entity data copied from the component

id

String

""

Component id property

The id of the component

position

String

REQUIRED

Calculated

Global position of the element, as seen by a user.

tags

Map

REQUIRED

Calculated

Any number of valid element tags

transform

Transform

[]

Component transform property

A visual transformation applied against the position.

type

One of: graphic, text, mixed, video, empty

REQUIRED

Calculated

Describes the visual appearance of the element

uid

String

REQUIRED

Generated by the runtime

The unique runtime-generated id for the component

visibility

Number

1.0

Calculated

Relative visibility of the element.

To save space, the context omits properties that contain default values. For example, assume the device is displaying the following TouchWrapper component that contains a Text component.

{
  "type": "TouchWrapper",
  "id": "idForTheTouchWrapper",
  "item": [
    {
      "type": "Text",
      "id": "idForTextWrappedInTouchWrapper",
      "text": "Text component wrapped in the TouchWrapper",
      "inheritParentState": true,
      "style": "touchableText"
    }
  ],
  "onPress": []
}

The TouchWrapper is fully visible on the screen when the user invokes an intent that sends a request to your skill. The context reports the element as shown in the following example.

{
  "id": "idForTheTouchWrapper",
  "uid": ":1022",
  "position": "952x51+35+224:0",
  "type": "text",
  "tags": {
    "focused": false,
    "clickable": true
  },
  "entities": []
}

Because the TouchWrapper is fully visible, the visibility property contains the default value 1. Alexa therefore omits this property. The TouchWrapper has no values for the transform property. Alexa omits this property. The Text child component for this TouchWrapper doesn't meet any of the requirements to be included in the context Therefore, Alexa omits the children property.

children

An array containing the elements that fall logically under this element. For example, a scrolling list might contain multiple child elements in the list. The element defines the order of the children in the array.

An element in the context omits the children property when the element has no reported children.

When generating the element hierarchy, the children of a component that isn't reported are attached to the parent of the component. For details on when a component isn't reported, see Rules for generating the element hierarchy.

For example, a document might have the following hierarchy of components.

Container "A" - reported
  Container "B" - not reported
    Text "C" - reported
    Image "D" - not reported
    Container "F" - not reported
      Text "G" - reported
      Text "H" - reported

This set of components produces the following element hierarchy in the context.

Element "A", type "mixed"
  Element "C", type "text"
  Element "G", type "text"
  Element "H", type "text"

In this hierarchy, the children of "Container B" and "Container F" are within "A" because the context doesn't include "B" or "F."

entities

An array of entity data copied from the component. This data is opaque. You can provide data in the entities property for the component to describe the meaning of the component.

When you set the entities property for a component, provide an array of objects. The object can have the properties id, type, and value. Any other properties aren't included in the visual context.

id

The id property for the component as specified in the APL document. An element in the context omits the id property when the corresponding component doesn't have an id property.

uid

An identifier generated by the APL runtime. Each component is assigned a uid. The value is an opaque string and is guaranteed to be unique in the scope of the document and not clash with any assigned id value. Each element in the context always includes the uid property.

position

Specifies the position of the element on the screen, in the form of a 5-tuple of width, height, x-position, y-position, and layer. These values are in global coordinates, and aren't relative to the parent element. The values are the default or resting position of the items before applying any transformations. For details about transformations, see the component transform property.

For compactness and interpretation, the position is a single string:

"position": "<WIDTH>x<HEIGHT>[+-]<XPOSITION>[+-]<YPOSITION>:<LAYER>"

The numeric values reported are dimensionless non-negative integers. The x and y-positions are measured from the top-left of the viewport. The layer value must meet the following requirements:

  • No two elements have the same layer
  • When two elements overlap, the element with the larger layer value is drawn on top of the element with the lower layer value.

The reported position is always in global coordinates. Using global coordinates ensures that the position uses the perspective of the user. You can also compare the relative position of any two elements.

The following example shows how the position value represents different positions on the screen.

1280x800+0+0:0     // Top-level element on a 1280x800 dp screen
620x780+10+10:1    // The left column of the above top-level element
620x780+650+10:2   // The right column of the above top-level element

Each element in the context always includes the position property.

tags

A map of attributes and data about those attributes. An element in the context includes the tags property when the element has at least one tag.

For details about the possible tags, see Element tags.

transform

A 6-element array containing the 2D homogeneous transformation matrix applied against this element. The center of the transformation coordinate system is the center of the component. The transformation array is ordered as [A,B,C,D,Tx,Ty].

The transform property is reported if the transformation isn't the identify transformation.

type

An enumerated value that describes how the user perceives the element. The following table shows the valid type values.

Type Description

empty

An empty component with no visible content.

graphic

A bitmap image or vector graphic

mixed

A blend of graphics, video, and text

text

Human-readable text

video

A video player

Alexa uses the rules shown in the following table to generate the type for an element.

Component Rules

Container

The combination of all visible children. For example, if all the visible child components map to text, the type for the Container is text. If the visible children map to different types of elements, the type for the Container is mixed.

EditText

text

Frame

The child type.

GridSequence

The combination of all visible children.

Image

graphic

Pager

The child type of the current page.

ScrollView

The combination of all visible children.

Text

text

TouchWrapper

The child type.

Sequence

The combination of all visible children.

VectorGraphic

graphic.

Video

video.

The combination of any two of text, graphic, or video is mixed. The type property defaults to empty if the component has no valid content and has no children. An element in the context always includes the type property.

visibility

The visibility property is an approximate calculation of how well the user can see the object. The visibility is defined as the percentage of the bounding box of the element that's visible in its parent multiplied by the opacity of the element.

For example, assume a vertically scrolling list where the last item in the list is 50 percent off the screen and has an 80 percent opacity. The visibility for this item is 40%, which is reported as 0.4.

The visibility calculations don't consider applied component transform values. The visibility calculation also doesn't consider that a component might be obscured by a child component on top of it.

Components with a display property of invisible or none have zero visibility.

An element in the context includes the visibility property when the visibility is greater than zero. The element omits the visibility property when the value is 1 (the default, fully visible). The element includes the visibility property when the value is zero in certain circumstances. For details on when items with zero visibility are reported, see Rules for generating the element hierarchy.

Element tags

An element tag provides additional information about the element. Most elements included in the context contain at least one tag. An element can have multiple tags.

The following table lists the available tags.

Tag Type Description Created By

checked

Boolean

The checked state of a component that has two states.

Any component with the checked state set to true.

clickable

Boolean

A button or item that the user can press.

Touchable component

disabled

Boolean

True when this component is disabled.

Component with the disabled state.

focused

Boolean

The focused state of a component that can take focus

The following components:

  • EditText
  • GridSequence
  • Pager
  • ScrollView
  • Sequence
  • TouchWrapper
  • VectorGraphic

list

Object

An ordered list of items

Sequence, GridSequence

listItem

Object

Information about a Sequence child

Sequence or GridSequence child component

media

Object

Media player

Video

ordinal

Integer

A visibly numbered element

Sequence, GridSequence, or Container child component

pager

Object

A collection of objects displayed one at a time.

Pager

scrollable

Object

A region of the screen that can scroll.

ScrollView, Sequence, or GridSequence

spoken

Boolean

A region of the screen that can be read by text-to-speech

Component with the speech property

viewport

Object

The entire screen in which a document is rendered

Top-level component

Each tag is either a basic data type (Boolean, String, or Integer) or an Object data type containing more granular information.

checked

Boolean tag indicating that the checked state for the component is true. Because all components can have a checked state, any type of component might report the checked tag.

{
  "id": "XXXX",
  "uid": ":1234",
  "position": "10x10+0+0:0",
  "type": "graphic",
  "tags": {
    "checked": true
  }
}

A component with the inheritParentState property set to true doesn't report the checked tag. To save space, an element in the context reports the checked tag when its value is true.

clickable

Boolean tag indicating that this component can be "clicked." This means that the user can activate the component by touch, from a keyboard, or with a remote. All touchable components are clickable.

{
  "id": "XXXX",
  "uid": ":1234",
  "position": "10x10+0+0:0",
  "type": "mixed",
  "tags": {
    "clickable": true,
    "focused": false
  }
}

An element in the context includes the clickable tag if it's a touchable component. The tag returns true for touchable components with the disabled state.

disabled

Boolean tag indicating that the user can't interact with this component. All components can set the disabled state, including components that don't receive clicks or focus. The following example shows the element reported for a disabled Text component with the checked state.

{
  "id": "XYZZY",
  "uid": ":1235",
  "position": "100x50+10+10:5",
  "type": "text",
  "tags": {
    "disabled": true,
    "checked": true
  }
}

Unlike the checked state, the disabled state is reported for components that have the inheritParentState property set. To save space, the disabled tag is reported when it's true.

focused

Boolean tag indicating that a component can take keyboard focus. The value of the tag indicates the current state of the control. For example, a touchable item that doesn't have focus reports the focused tag as false.

{
  "id": "XXXX",
  "uid": ":1236",
  "position": "10x10+0+0:0",
  "type": "text",
  "tags": {
    "focused": false,
    "clickable": true
  }
}

The context includes the focused tag when the component can take focus. The following components can take focus:

The focused tag reports true if the component has focus and false if the component doesn't have focus.

list

An object with a collection of properties reported for a Sequence or GridSequence. The list tag object contains properties shown in the following table.

Property Type Description

itemCount

Integer

Total number of items in the list.

highestIndexSeen

Integer

The index of the highest item seen

highestOrdinalSeen

Integer

The ordinal of the highest ordinal-equipped item seen

lowestIndexSeen

Integer

The index of the lowest item seen

lowestOrdinalSeen

Integer

The ordinal of the lowest ordinal-equipped item seen

Lists track the lowest and highest index/ordinal seen so you can make informed inferences about what the user might have observed on the screen. For example, if a new list displays ordinals 10 through 20, but 10 through 12 are visible on the screen, it's reasonable to disallow the user from saying "pick number 18" because the user doesn't know what item 18 contains.

The following example shows a list tag.

{
  "id": "myListOfDogs",
  "uid": ":138",
  "position": "1280x800+0+0:0",
  "type": "mixed",
  "tags": {
    "list": {
      "itemCount": 190,
      "lowestIndexSeen": 0,
      "highestIndexSeen": 3,
      "lowestOrdinalSeen": 1,
      "highestOrdinalSeen": 4
    },
    "scrollable": {
      "direction": "vertical",
      "allowForward": true,
      "allowBackwards": true
    },
    "focused": false
  },
  "children": [
    {
      "position": "800x600+20+33:0",
      "uid": ":2352",
      "type": "mixed",
      "tags": {
        "clickable": true,
        "ordinal": 2,
        "listitem": {
          "index": 2
        }
      }
    },
    {
      "position": "800x600+20+633:0",
      "uid": ":23112",
      "visibility": 0.16,
      "type": "mixed",
      "tags": {
        "clickable": true,
        "ordinal": 3,
        "listItem": {
          "index": 3
        }
      }
    }
  ]
}

The list tag isn't reported for an empty Sequence or GridSequence.

itemCount

The total number of items in the list. If the length of the list is unknown, the itemCount is –1.

highestIndexSeen

The highest index of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even when it's a small number of pixels.

The highestIndexSeen value is zero-based. For example, if a list contains three items and all displayed on the screen, the highestIndexSeen is 2.

highestOrdinalSeen

The highest ordinal value of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even when it's a small number of pixels. This tag applies to list items with an ordinal value. A list item has an ordinal value when the component for the item has the ordinal property set.

The highestOrdinalSeen value is reported when at least one child with an ordinal value has been seen, either currently or in the past.

lowestIndexSeen

The lowest index of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even if it's a small number of pixels.

The lowestIndexSeen value is zero-based. An APL list is commonly first displayed with the lowest item in the list visible on the screen. Therefore, lowestIndexSeen usually returns zero.

lowestOrdinalSeen

The lowest ordinal value of any child seen for this list. An item is "seen" if any part of the item was displayed on the screen, even if it was just a few pixels. This tag applies to list items with an ordinal value. A list item has an ordinal value when the component for the item has the ordinal property set.

The lowestOrdinalSeen value is reported when at least one child with an ordinal value has been seen, either currently or in the past.

listItem

Information about the child of a Sequence or GridSequence. The listItem property has the properties shown in the following table.

Property Type Description

index

Integer

Zero-based index of this element in its parent

The listItem is reported as an object to reserve space for reporting the row and column of a list item displayed in a grid.

index

The index of this list item in its parent. The index is zero-based. You can compare the index with the lowestIndexSeen and highestIndexSeen values in the list tag.

media

An object with a collection of properties reported for a media player, such as a video player. The media tag describes the current state of the media player and what operations are possible on the media player. The media tag object has the properties shown in the following table.

Property Type Description

allowAdjustSeekPositionForward

Boolean

Can seek forward relative to the current position

allowAdjustSeekPositionBackwards

Boolean

Can seek backwards relative to the current position.

allowNext

Boolean

Can move forward to the next track.

allowPrevious

Boolean

Can move backward to the previous track.

entities

Array (default [])

Current track entity data

positionInMilliseconds

Integer

Current position of the play head from the start of the track.

state

One of: idle, playing, paused

The current operating state.

url

String

Current track source URL

The media tag is reported if there is at least one media track available for playing.

The following example shows the media tag.

{
  "id": "myVideoPreview",
  "uid": ":1138",
  "position": "1024x600+0+0:0",
  "type": "video",
  "tags": {
    "media": {
      "allowAdjustSeekPositionForward": true,
      "allowAdjustSeekPositionBackwards": true,
      "allowNext": true,
      "allowPrevious": false,
      "entities": [
        "MY_ENTITY_DATA"
      ],
      "positionInMilliseconds": 34214,
      "state": "playing",
      "url": "https://myvideolocationhere"
    }
  }
}

allowAdjustSeekPositionForwards

When true, this media track supports seeking forward in time to a new position. Live media streams normally report false for this property.

allowAdjustSeekPositionBackwards

When true, this media track supports seeking backwards in time to a new position. Live media streams normally report false for this property.

allowNext

When true, the media player can advance forward to the next media track. The allowNext property is false if the media player is on the final track.

allowPrevious

If true, the media player can move back to the previous media track. The allowPrevious property is false if the media player is on the first track.

entities

Entity data associated with the current media track (see video_source_property_entity). The media object omits the entities property when the media track doesn't have any entity data associated with it.

positionInMilliseconds

The media player head position within the current media track, measured in milliseconds.

state

The current playing state of the media track. The playing state is one of the following values:

Name Description

idle

The media player hasn't played any content.

paused

The media player has played some content, but is now paused.

playing

The media player is actively playing content.

url

The URL of the current media track.

ordinal

Reported if the element has a defined ordinal value. The ordinal value is a natural number (a positive integer). The ordinal tag is assigned to children of a Multi-child component with the numbered property set to true.

{
  "id": "myListItem8",
  "uid": ":1231",
  "position": "200x100+23+26:1",
  "type": "text",
  "tags": {
    "ordinal": 6     // Ordinal is not always equal to index - 1.
    "clickable": true,
    "focused": false
  }
}

An element with a listItem might also compare its ordinal value against the highestOrdinalSeen and lowestOrdinalSeen values of the parent list tag.

pager

An object with a collection of properties reported for a Pager component if it has at least two pages. The pager tag object has the properties shown in the following table.

Property Type Description

index

Integer

Index of the current page. The index for a Pager is zero-based. Therefore, the first page has an index of 0.

pageCount

Integer

Total number of pages

allowForward

Boolean

When true, the user can move the pager forward.

allowBackwards

Boolean

When true, the user can move the pager backwards.

The allowForward and allowBackwards properties indicate what the user can do, based the navigation property for the Pager and the current page. These properties don't consider what you can do programmatically with the SetPage command.

For example, assume navigation is normal, which lets the user navigate freely back and forth in the Pager.

  • When the Pager is on the first page, allowForward reports true and allowBackwards reports false.
  • When the Pager is on a page that's neither the first nor the last page, both properties report true.
  • When the Pager is on the last page, allowForward reports false and allowBackwards reports true.

When navigation is none, the user can't navigate the Pager at all. In this scenario, both properties report false, regardless of the page displayed.

When navigation is wrap, the user can always navigate forward or backwards. When the user is on the last page, navigating forward wraps back to the first page. In this scenarios, both properties report true, regardless of the page displayed.

The following example shows the pager tag.

{
  "id": "weatherPager",
  "uid": ":111",
  "position": "1024x600+0+0:0",
  "type": "mixed",
  "tags": {
    "pager": {
      "index": 0,
      "pageCount": 4,
      "allowForward": true,
      "allowBackwards": false
    },
    "focused": false
  }
}

scrollable

An object indicating that a region can scroll forward or backwards. The following components can scroll content:

These components report the scrollable tag when the component contains enough content to require scrolling. When all the content within the component is fully visible, the visual context doesn't include the scrollable tag.

The scrollable tag object has the properties shown in the following table.

* Property * Type * Description

* direction * One of: horizontal, vertical * Direction of scrolling.

* allowFoward * Boolean * When true, the content in the scrolling area can scroll forward.

* allowBackwards * Boolean * When true, the content in the scrolling area can scroll backwards.

For example, assume a Sequence contains 10 items and is large enough to display 5 items at a time.

  • When the component shows the items with the index zero through four, allowForward is true and allowBackward is false.
  • When the user scrolls down to show items with the index two through six, both allowForward and allowBackward are true.
  • When the user scrolls all the way to the end of the list, allowForward is false and allowBackward is true.

The scrollable tag is reported when either allowForward or allowBackward is true. When both properties are false, the tag isn't included.

The following example shows a scrollable tag.

{
  "id": "todoList",
  "uid": ":211",
  "position": "1024x550+0:50:0",
  "type": "mixed",
  "tags": {
    "scrollable": {
      "direction": "horizontal",
      "allowForward": true,
      "allowBackwards": false
    },
    "focused": false
  }
}

spoken

Boolean tag indicating that this element has content that Alexa can read out loud. Any component that sets the speech property returns the spoken tag.

The following example shows spoken tag.

{
  "id": "myListItem",
  "uid": ":444",
  "position": "800x80+72+437:0",
  "type": "text",
  "tags": {
    "clickable": true,
    "focused": true,
    "spoken": true
  }
}

viewport

The viewport tag is reserved for the top-level element on a screen. The viewport tag has no defined properties.

Example:

{
  "id": "top",
  "uid": ":101",
  "position": "480x480+0+0:0",
  "type": "mixed",
  "tags": {
    "viewport": {}
  }
}

Rules for generating the element hierarchy

The following rules apply when generating the element hierarchy for the visual context:

  • The top-level component always generates an element with the viewport tag.
  • A component is reported as an element if it's visible on the screen and has at least one of the following attributes:
    • Non-empty entities property.
    • True clickable tag.
    • A media tag.
    • A pager tag.
    • A scrollable tag.
    • A spoken tag.
  • A component that isn't visible might still be reported as an element if both of the following conditions are true:
    • The component has a non-empty entities property
    • The component was visible on the screen at some time in the past

The intent of reporting non-visible components is to allow the user to refer to an item that was visible on the screen and might have scrolled out of view. The context reporting system doesn't guarantee that every previously visible item is reported. The system keeps a window of recently visible items and reports back the most recently seen elements.

Nested element example

The following example shows how components nest in normal reporting. The example shows an APL document that displays a background image, a text element corresponding to the title of an article, and a scrolling element holding the content. The content is further divided into a labeled image element and a text element.

The top-level Container has a single entity, and the text content has the speech property set.


For this document, the rules to generate the hierarchy therefore produce the following:

  • The top-level Container is reported because it has entity data assigned.
  • The background image isn't reported because it has no entity data and no other tags apply.
  • The title isn't reported because it has no entity data and no other tags apply.
  • The scrolling region is reported because it has a valid scrollable tag.
  • The small picture isn't reported because it doesn't have entity data and no other tags apply.
  • The large text is reported because it has a spoken tag.

The resulting element hierarchy might look like the following.

{
  "id": "top-level",
  "uid": ":9549",
  "position": "960x480+0+0:0",
  "type": "mixed",
  "tags": {
    "viewport": {}
  },
  "children": [
    {
      "id": "scrollingRegion",
      "uid": ":9552",
      "position": "960x349+0+131:1",
      "type": "mixed",
      "tags": {
        "focused": false,
        "scrollable": {
          "direction": "vertical",
          "allowForward": false,
          "allowBackwards": true
        }
      },
      "children": [
        {
          "id": "articleId",
          "uid": ":9555",
          "position": "832x1410+64-1002:1",
          "type": "text",
          "tags": {
            "spoken": true
          },
          "visibility": 0.20000000298023224,
          "entities": []
        }
      ],
      "entities": []
    }
  ],
  "entities": [
    {
      "id": "mainPage"
    }
  ]
}

The example shows allowForward as false because the user scrolled to the bottom of the content and then made an utterance that sent a request to the skill.


Was this page helpful?

Last updated: Nov 28, 2023