New developer tools to build LLM-powered experiences with Alexa

Mark Yoshitake Sep 20, 2023
Share:
Smart Home Skills Alexa Skills Smart Home API
Blog_Header_Post_Img

 

Today, we previewed the future of Alexa, one that’s powered by a large language model (LLM) specifically optimized for voice interactions. We believe this milestone will make engaging with Alexa more natural, conversational, and intuitive.

Like we’ve done from the beginning, we want developers on this journey with us helping define and build new ambient experiences for our shared customers. That’s why we’re excited to preview new tools that leverage our advances in generative AI—making building with Alexa simpler and faster. With these tools, developers will be able to create experiences that allow customers to access real-time data, enjoy immersive generative AI-enhanced games, perform tasks like booking restaurant reservations, getting a succinct summary of a trending news story, and more. 

You will be able to integrate content and APIs with Alexa’s LLM to create conversational experiences for Alexa-enabled devices, or alternatively, integrate with an LLM of your choice. We are also fundamentally simplifying the tools to build intuitive smart home controls. Whichever path you choose, you will be able to build with Alexa without having to write complex code or train specific interaction models.

At build-time, developers will provide the skill manifest, API specifications, content sources and natural language descriptions. At runtime, Alexa will find the right provider, orchestrate API calls and retrieve content based on user context, device context and memory (which includes conversation history, and event timeline.)

 

Building with Alexa's LLM: Conceptual Architecture

 

Whenever a user request or a device interaction (e.g. touch) occurs, Alexa orchestrates a series of actions, interfacing with Alexa’s LLM to construct prompts and iteratively make calls until the completion of the task. Alexa will enhance the prompt with additional signals and data such as memory, context and user preferences before executing API/actions based on LLM predictions.  

 

Connect your APIs with Alexa’s LLM

To integrate APIs with Alexa’s LLM, you will provide: 1) your API endpoints and definitions 2) natural language annotations and descriptions of your APIs and business logic and 3) CX examples for when to call the APIs. This new integration model allows you to build experiences that enable customers to participate in multi-turn, cross-skill conversations. For example, a customer planning a trip might ask about the best time of year to visit, places to see, and hotel reservations in the same conversation. To fulfill this task, relevant APIs will be called based on the API definitions and descriptions provided by you. As Alexa orchestrates and chains together intermediate natural language reasoning steps, it decomposes complex tasks into smaller, more manageable steps, and constructs the response to the customer. Alexa is able to complete these complex tasks in real time in line with customer expectations for voice interactions.

We have also enhanced our Skill manifest with an increased number of configuration fields to build LLM-based experiences. 

modelFacingName and modelFacingDescription fields are used by the model to understand the functionality the Alexa Skill provides. actionProvider enables a new type of integration for integrating APIs with Alexa. 

 

Copied to clipboard
{
    "manifest": {
        "modelFacingName": "Restaurant Reservations",
        "modelFacingDescription": "Use this to book restaurant reservations",
        "publishingInformation": {
          // same as before
        },
        "apis": {
            "actionProvider": {
                "endpoint": { 
                    "uri": "arn:aws:lambda:us-east-1:452493640596:function:restaurantReservationsHandler"
                }
            }
        }
    }
}

 

To make it simpler for developers to author and maintain API specifications and logic, we will support API declarations using our SDKs for the most common interface languages, starting with TypeScript. 

 

Copied to clipboard
@Service
export class RestaurantReservationService {
  @Description("Use this method to find restaurants by the zipCode")
  public findRestaurants(zipCode: string) : Promise<Restaurant> {
    // restaurant search logic
  }
  
  @Description("Use this to reserve a restaurant, confirm with the user before reserving")
  public reserveRestaurant(restaurant: Restaurant, partySize: number, date: string): Promise<string> {
    // reservation logic 
  }
}

@Description("Restaurant schema")
export interface Restaurant {
    restaurantId: string;
    name: string;
    address: string;
    address2: string;
    city: string;
    state: string;
    country: string;
}

 

Connect your content with Alexa’s LLM 

You will also be able to build content-driven, ambient experiences and deliver them to over half a billion Alexa-enabled devices. You can use unstructured data like product manuals or websites, or structured data from existing databases to enable customers to find the perfect show on Prime Video, get summaries of earnings reports, access recipes from an expert chef and more. These experiences will be powered by techniques such as retrieval augmented generation without needing to train core LLM models on your data. 

To integrate content, create a content spec that includes 1) a short natural language description of the content (e.g., FAQs on amenities and guest services) 2) instructions on the desired use-cases to support and 3) the unstructured content or structured data, and ingest them using our new tools. These tools include managed instances of vector databases, which eliminate the need for you to provision and maintain your own services. When a customer asks for information related to your content, Alexa searches the content store, and forms a response using the most relevant information.

 

Copied to clipboard

{
    "manifest": {
        "modelFacingName": "Food Recipes",
        "modelFacingDescription": "Use this to find food recipes.",
        "publishingInformation": {
          // same as before
        },
        "apis": {
            "contentProvider": {
                "sources": [
                    {
                        "location": "skillpackage://content/food-recipes.pdf"
                    }
                ]
            }
        }
    }
}

 

Connect your own LLM with Alexa devices and experiences

In addition, we understand that you may have already purpose-built your own LLMs or leveraged LLMs offered through services like Amazon Bedrock. When you want to bring your LLM-based experiences to life through expressive voice or ambient visuals, and distribute them to Alexa’s devices, you will have the ability to integrate your Alexa Skill with one or more LLMs of your choice with a new tool. To integrate third-party LLMs, all you’ll have to do is integrate with automatic speech recognition (ASR) or utterance text using a simple SDK that supports the entire range of multimodal experiences enabled by Alexa. 

We’re excited about the experiences being created by developers, and we’ll have more to share soon from companies like Character.AI, Splash, BMW, Volley, Sleep Jar, Voice Apps, and OpenTable who are enabling customers to do things like talk to fictional characters, create their own songs, play single and multi-agent driven games on Alexa-enabled devices, learn about the features of their vehicle and so much more. 

Leverage differentiated offerings for smart home developers

Today, customers have connected over 400 million smart home devices to Alexa. Our partners have been critical in helping build smart home experiences with devices from all brands working seamlessly together to make customers’ lives easier. We will be leveraging generative AI to allow customers to control their smart home devices for a variety of actions through flexible and natural conversations, which Alexa’s LLM will map to the unique and differentiated features of their products. To cover the broadest set of use cases, we will be providing developers with a path for both common integrations such as on/off or start/stop actions, and more complex ones such as actions related to dynamic lighting or custom cleaning operations for a vacuum cleaner.

Smart home Generic Controller Interfaces such as our existing Alexa.RangeController and Alexa.ModeController remain a simple yet effective way to integrate devices with Alexa for common use cases. A new interface called Action Controller extends this further by enabling device makers to model simple actions, targets, and states to control their devices, giving customers the ability to interact naturally with their devices. This means customers using Alexa’s LLM can say “Alexa, the floor is dirty,” and Alexa will be able to infer the intended action is ‘vacuum’, and state ‘start.’ Action Controller makes it simple for device makers to model new supported actions and targets. To take advantage of Action Controller, you need to 1) define your supported device actions such as clean, mop, or dock 2) define a list of targets such as ‘kitchen’ or ‘living room,’ 3) choose the state(s) to support (choose one or more actions from start/stop/pause/resume) and 4) implement the required directive and state updates when Alexa sends the requested actions to your skills.

Below is an example of a multi-functional device modeled using the Action Controller instance. In this example, the device supports cleaning type actions (such as ‘vacuum,’) a list of target zones and device action states (start/stop/pause/resume.) 

Copied to clipboard
{ "capabilities": [{
    "type": "AlexaInterface",
    "interface": "Alexa.ActionController",
    "instance": "Robot.Clean",
    "configuration": {
        "supportedActions": [{ //An identifier device maker defined for the actions reported.
            "id": "Vacuum-device-maker-defined-identifier",
            "friendlyNames": [{
                "@type": "asset",
                "value": {
                  "assetId": "Alexa.Actions.Vacuum"
        }}]}],
        "supportedStates": [
          "START",
          "STOP",
          "PAUSE",
          "RESUME"
        ],
        "target": {
          "supportedTargets": [{ //A list of identifiers device maker defined for the targets reported
             "id": "space-id-1",
             "friendlyNames": [
              {
                "@type": "asset",
                "value": {
                "assetId": "Alexa.Space.Kitchen"
}}]}]}}}]}

 

Below is the example of the directive a device maker will receive for the start action:

 

Copied to clipboard
{
  "directive": {
  "header": {
  "namespace": "Alexa.ActionController",
  "instance": "Robot.Clean",
  "name": "PerformAction",
  "messageId": "message id",
  }
  "payload": {
  "actionId": "Vacuum-device-maker-defined-identifier",
  "targetIds": ["<the target identifier>"],
  "state": "START"
  }}}

 

For device control beyond a simple action, more complex device states, or parameters, we are introducing Dynamic Controller, a new tool allowing developers to define a custom interface. Dynamic Controller enables customers to use a wider range of utterances to control the unique features of their smart home devices in a more conversational way (e.g. a customer can request a lighting solution provider to “make the lights like a misty forest” – in the past they would only have been able to select from a list of predefined options). Dynamic Controller makes it simpler for smart home developers to integrate their unique device features with Alexa without relying on pre-defined APIs or creating complex voice models.

To take advantage of Dynamic Controller, you need to 1) define the payload describing the unique device capability 2) define the directive that you would like Alexa to send and 3) add instructions on the desired use-case to support, and Alexa’s LLM will take care of the rest. With Dynamic Controller, you’ll have support for a wider range of utterances (for example, “make the lights like a misty forest” for a smart lighting solution) so that customers can take advantage of your unique features. 

Below is an example of a smart lighting device maker using Dynamic Controller to fulfill the customer utterance “make the lights like a misty forest.” The LLM will infer the colors for the misty forest, the corresponding energy level and the correct capability interface (CreateTemporaryDynamicScene) from the utterance before sending a directive to the corresponding Skill. The directive will contain the device endpoints and the requested payload that indicate how best to set the dynamic lighting scene for the customer.

 

Copied to clipboard
// Lighting partner Skill Manifest and API spec
{
"name": "Smart Lighting Partner name",
"namespace": "Smart Lighting Partner Skill name",
"publishingInformation": {
  "locales": {
  "en-US": {
  "name": "",
  "summary": "This skill allows users to control and interact with Smart light devices",
  "description": "This skill has basic and advanced smart devices control features.",
  "examplePhrases": [
  "Alexa, set the lights to a Misty forest.",
   "Alexa, make my lights look like a Sunset"
  ],
  "keywords": ["Smart Home", "Lights", "Smart Devices"]
  }},
  "category": "SMART_HOME"
  },
"capabilities": [// Dynamic Controller capabilities
{
  "name": "CreateTemporaryDynamicScene",
  "description": "Creates a custom dynamic lighting scene based on the desired color palette",
  "payload": {
  "name": "primaryColor",
  "name": "secondaryColor",
  "name": "tertiaryColor",
  "name": "energyLevel"
  }}]}

 

Below is an example of the discovery response the device maker needs to return for Dynamic Controller. In this example, the device supports creating a temporary dynamic scene operation for a smart lighting device maker.  

 

Copied to clipboard
{
  "event": {
    "header": {
      "namespace": "Alexa.Discovery",
      "name": "Discover.Response",
      "payloadVersion": "3",
      "messageId": "Unique identifier, preferably a version 4 UUID"
    },
    "payload": {
      "endpoints": [{
        "endpointId": "office_endpoint_id",
        "manufacturerName": "Lighting manufacturer name",
        "description": "A lighting product",
        "friendlyName": "Office Light",
        "displayCategories": ["Light"],
        "capabilities": [{
            "type": "AlexaSkillInterface",
            "interface": "Lighting partner Skill",
            "configurations": {
              "operations": ["CreateTemporaryDynamicScene"]
}}]}]}}}

 

GE Cync, GE Appliances, Philips Hue, iRobot, Roborock and Xiaomi will take advantage of Action Controller and Dynamic Controller to launch their new features and ship updates in the coming months. We look forward to working with many more brands to bring these experiences to life for customers.

You can learn more about the new tools by visiting the Alexa developer portal. Developers can sign up to receive updates and be among the first to access these new tools as they become available. Smart home developers can sign up via the preview interest form.

 

Learn More

Integrate with Alexa's LLM

Integrate common device actions

Integrate unique device features

Learn More

Integrate with Alexa's LLM
Integrate common device actions
Integrate unique device features