Focus Management Overview

Focus management improves the accuracy of responses from Alexa, specifically, when a user makes an ambiguous request. For example:

  • The product is playing music
  • User: "Alexa, pause."
  • User: "Alexa, what's the weather in Seattle?"
  • User: "Alexa, resume."
  • The product resumes playback of music

Focus is managed in the cloud. A client simply informs Alexa which interface has focus of an audio or visual channel, and when applicable, reports the idle time for each. This state information is sent in the context container under the AudioActivityTracker and VisualActivityTracker namespaces.

Why Do I Need This?

With products that support multi-modal experiences, such as local and offline playback or screen-based interactions unrelated to Alexa, the cloud may not be able to accurately determine what's happening on a product at a given time. By reporting audio and visual activity to Alexa, the product becomes the source of truth for all ongoing activities. This allows Alexa to determine what's in focus and accurately respond to each user request.

Use Cases

These use cases highlight the benefits of focus management.

Audio

Music Playback + Sounding Timer

This example illustrates how Alexa uses activity state to determine what content a user is attempting to stop:

  1. Music is playing
  2. A timer/alarm goes off
  3. User: "Alexa, stop."
  4. In the context of the Recognize event, the product informs Alexa that the timer/alarm has focus of the audio channel.
  5. The timer/alarm is stopped and music resumes at the previously set volume

Bluetooth

This example illustrates how Alexa uses activity state to determine what directive is sent to stop music:

  1. A user connects their phone to a paired Alexa-enabled product using Bluetooth: "Alexa, connect my phone".
  2. Music playback is initiated from the phone and output from the Alexa-enabled product.
  3. The user says, "Alexa, stop." The product receives a Bluetooth.Stop directive. This command is communicated to the phone via Bluetooth.
  4. The user says, "Alexa, play artist on Amazon Music." This results in an AudioPlayer.Play directive being sent to the Alexa-enabled product. This is because the content originates from an Alexa music provider rather than the paired device.
  5. The user says, "Alexa, stop."
  6. In the context of the Recognize event, the product informs Alexa that the AudioPlayer interface has focus of the audio channel. The Alexa-enabled product receives and AudioPlayer.Stop directive.
  7. Music is stopped

Without focus management, the Alexa-enabled product may have received a Bluetooth.Stop directive.

Visual

Display Cards

The key takeaway is that visual focus in the cloud expires after 8 seconds have elapsed. Therefore, if a user makes a request after 8 seconds have elapsed, Alexa may be unaware of the client's visual activity state. Here's what can occur without focus management:

  1. "Alexa, show me movie times for movie title."
  2. The user waits 25 seconds, then says: "Alexa, next page."
  3. Alexa responds that she doesn't know how to respond. This is because visual focus in the cloud expires after 8 seconds has elapsed, and Alexa is unaware that the display card still has visual focus on your product.

With focus management enabled, a client will report the activity state for each audio and/or visual channel that their product supports as part of context. Since context is required in Recognize events, it is present in all speech requests, therefore, when the user says "Alexa, next page" in Step 3, Alexa is aware that the TemplateRuntime interface has focus of the visual channel and will send the correct directive.

Channels

Audio and visual data handled by your AVS client are be organized into channels. These channels are: Dialog, Alerts, Content, and Visual. Channels govern how your client should prioritize inputs and outputs. Each channel is associated with one or more AVS interfaces. Each channel can be active or inactive.

For example, SpeechSynthesizer is associated with the Dialog channel, and when Alexa returns a Speak directive to your client, the Dialog channel is active and remains active until a SpeechFinished event is sent to Alexa. Similarly, when a timer goes off, the Alerts channel becomes active and remains active until the timer is cancelled.

This table provides and interface to channel mapping:

Channel Interface(s) State
Dialog SpeechSynthesizer The Dialog channel is active when either a user or Alexa is speaking.
Alerts Alerts The Alerts channel is active when a timer or alarm is sounding.
Content AudioPlayer, Bluetooth The Content channel is active when your client is playing media, such as audio streams.
Visual TemplateRuntime The Visual channel is active whenever your client is displaying Alexa-provided visual data to the user, such as Now Playing information for a song or book.

It is possible for multiple channels to be active at once. For instance, if a user is listening to music and asks Alexa a question, the Content and Dialog channels are concurrently active as long as the user or Alexa is speaking.

The visual channel is only active when the client is actively displaying Alexa-provided content to the user.

Report ActivityState

Both the AudioActivityTracker and VisualActivityTracker namespaces have an ActivityState that needs to be reported as part of Context.

  • AudioActivityTracker - Specifies which interface is active for each audio channel and the time elapsed since an activity occurred for each channel.
  • VisualActivityTracker - Indicates that visual metadata from the TemplateRuntime interface is currently being displayed to the user.

Idle Time

For each channel in AudioActivityTracker, idleTimeInMilliseconds is required. If a channel is active at the time that context is reported, idleTimeInMilliseconds must be empty or set to 0.

VisualActivityTracker does not track idle time. The TemplateRuntime interface must be reported as in focus while the product is displaying a visual metadata from Alexa, for example, a display card.

Sample Context

This is a sample message that includes context for AudioActivityTracker and VisualActivityTracker:

{
    "context": [
        {
            "header": {
                "namespace": "AudioPlayer",
                "name": "PlaybackState"
            },
            "payload": {
                "token": "{{STRING}}",
                "offsetInMilliseconds": {{LONG}},
                "playerActivity": "{{STRING}}"
            }
        },
        {
            "header": {
                "namespace": "SpeechRecognizer",
                "name": "RecognizerState"
            },
            "payload": {
                "wakeword": "ALEXA"
            }
        },
        {
            "header": {
                "namespace": "Notifications",
                "name": "IndicatorState"
            },
            "payload": {
                "isEnabled": {{BOOLEAN}},
                "isVisualIndicatorPersisted": {{BOOLEAN}}
            }
        },
        {
            "header": {
                "namespace": "Alerts",
                "name": "AlertsState"
            },
            "payload": {
                "allAlerts": [
                    {
                        "token": "{{STRING}}",
                        "type": "{{STRING}}",
                        "scheduledTime": "{{STRING}}"
                    }
                ],
                "activeAlerts": [
                    {
                        "token": "{{STRING}}",
                        "type": "{{STRING}}",
                        "scheduledTime": "{{STRING}}"
                    }
                ]
            }
        },
        {
            "header": {
                "namespace": "Speaker",
                "name": "VolumeState"
            },
            "payload": {
                "volume": {{LONG}},
                "muted": {{BOOLEAN}}
            }
        },
        {
            "header": {
                "namespace": "SpeechSynthesizer",
                "name": "SpeechState"
            },
            "payload": {
                "token": "{{STRING}}",
                "offsetInMilliseconds": {{LONG}},
                "playerActivity": "{{STRING}}"
            }
        },
        {
            "header": {
                "namespace": "AudioActivityTracker",
                "name": "ActivityState"
            },
            "payload": {
               "dialog": {
                    "interface": "{{STRING}}",
                    "idleTimeInMilliseconds": {{LONG}}
               },
               "alert": {
                    "interface": "{{STRING}}",
                    "idleTimeInMilliseconds": {{LONG}}
               },
               "content": {
                    "interface": "{{STRING}}",
                    "idleTimeInMilliseconds": {{LONG}}
               }
            }
        },
        {
            "header": {
                "namespace": "VisualActivityTracker",
                "name": "ActivityState"
            },
            "payload": {
                "focused": {
                    "interface": "{{STRING}}",
                }
            }
        }
    ],
    "event": {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "Recognize",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "profile": "{{STRING}}",
            "format": "{{STRING}}",
            "initiator": {
                "type": "{{STRING}}",
                "payload": {
                    "wakeWordIndices": {
                        "startIndexInSamples": {{LONG}},
                        "endIndexInSamples": {{LONG}}
                    }   
                }
            }
        }
    }
}