This is the final post in a three-part series about how we created the Space Explorer sample skill. In the first post, I discussed the design process and how we created the flows and visuals for the skill. In the second post, I walked through using the Alexa Developer Portal and the Alexa Presentation Language (APL) to bring our designs to life.
Today, I'll dig into AWS Lambda, the thread we used to tie it all together, and how we built our skill’s back end to route intents, handle user interaction, and extend the flexibility of APL.
Choosing a back-end server for your skill is largely up to preference or individual project requirements. For our skill's backend, we chose AWS Lambda over a traditional HTTPS server for a number of reasons. First, it is the recommended technology within the developer portal, with AWS Lambda-specific fields provided in the Service Endpoint Type configuration. Second, the serverless nature of Lambda means we don’t have to spin up a new host, worry about availability and scaling, or perform any of the other manual tasks required to make an endpoint ready to receive requests from our skill. Finally, taking advantage of Alexa Skills Kit triggers from your Lambda function is straightforward, with support built directly into the console.
Alexa Developer Console showing how to configure AWS Lambda as your skill's endpoint
Before we could really dive into the back end, we had to first decide how to organize our code. We wanted to maintain a file structure that was easy to navigate and search, while also enabling us to expand the codebase as new features became available. First, we took a look at what we had already created just to get things working. When we were building out the interaction model, the basic scaffolding we created was essentially one giant file that handled all our intents. This had already become bloated and unwieldy, so creating separate modules for all of our request handlers was an obvious next step. Those individual files were stored in the handlers directory. Similarly, we created a documents directory to house all of our APL documents.
Since we were hosting all of our sample data internally, we created a dedicated data directory. Further, it became clear fairly quickly that our touch events and voice interactions would both be using a lot of the same code, so we added a multimodal_responses directory to consolidate that logic. Here is the structure we finally settled on:
Skill
├── data
├── documents
├── handlers
├── helpers
├── multimodal_responses
└── index.js
With the new structure in place, we added an index file to the handlers directory to make all of our handlers available from a single file reference. The primary index file would then import that single file, to add the handlers to our main handler export:
const Alexa = require('ask-sdk-core');
const {
LaunchRequestHandler,
HelpRequestHandler,
EventHandler,
FallbackHandler,
CancelIntentHandler,
ExploreZoneRequestHandler,
ExploreObjectRequestHandler,
ObjectAboutHandler,
ObjectAtmosphereHandler,
ObjectSizeHandler,
ObjectDistanceHandler,
ObjectRingHandler,
ObjectSatellitesHandler,
RandomImageRequestHandler,
MoreInfoRequestHandler,
SolarSystemRequestHandler,
BackRequestHandler,
OrdinalRequestHandler,
PlanetRequestHandler,
PlutoRequestHandler,
CometsRequestHandler,
OtherRegionRequestHandler,
TheMoonRequestHandler
} = require('./handlers');
...
exports.handler = Alexa.SkillBuilders.custom()
.addRequestHandlers(
LaunchRequestHandler,
HelpRequestHandler,
EventHandler,
FallbackHandler,
CancelIntentHandler,
ExploreZoneRequestHandler,
ExploreObjectRequestHandler,
ObjectAboutHandler,
ObjectAtmosphereHandler,
ObjectSizeHandler,
ObjectDistanceHandler,
ObjectRingHandler,
ObjectSatellitesHandler,
RandomImageRequestHandler,
MoreInfoRequestHandler,
SolarSystemRequestHandler,
BackRequestHandler,
OrdinalRequestHandler,
PlanetRequestHandler,
PlutoRequestHandler,
CometsRequestHandler,
OtherRegionRequestHandler,
TheMoonRequestHandler,
SessionEndedRequestHandler
)
.lambda();
Using APL, touch events get sent to your endpoint using a new request type, Alexa.Presentation.APL.UserEvent. These events are triggered using the APL command type, SendEvent. Here's an example of what the SendEvent command looks like in our APL:
{
"type": "SendEvent",
"arguments": [
"compositionEvent",
"jupiter"
]
}
The arguments array is flexible and allows for passing arbitrary data. We decided to establish the convention of passing the event type as the first item and any data as the second for consistency. This allowed us to handle touch events with a single module called eventHandler.js. In it, we use the same handler structure as other requests. The handle method uses a switch statement to determine which multimodal response should route the request:
const EventHandler = {
canHandle: handlerInput =>
handlerInput.requestEnvelope.request.type === 'Alexa.Presentation.APL.UserEvent',
handle: handlerInput => {
const args = handlerInput.requestEnvelope.request.arguments;
const event = args[0];
const data = args[1];
switch (event) {
case 'exploreEvent':
return data === 'the moon'
? MoonResponse(handlerInput, false)
: ExploreResponse(handlerInput, data, false);
case 'exploreZoneEvent':
return ExploreZoneResponse(handlerInput, data, false);
case 'aboutEvent':
return AboutResponse(handlerInput, data, false);
case 'distanceEvent':
return DistanceResponse(handlerInput, data, 'the sun', null, false);
case 'sizeEvent':
return SizeResponse(handlerInput, data, 'earth', null, false);
case 'compositionEvent':
return AtmosphereResponse(handlerInput, data, false);
case 'satellitesEvent':
return SatellitesResponse(handlerInput, data, false);
case 'backEvent':
case 'goBack':
return BackResponse(handlerInput);
case 'imageEvent':
return RandomImageResponse(handlerInput, '');
default:
return SolarSystemResponse(handlerInput);
}
}
};
As I mentioned earlier, our voice responses and touch responses were going to share a lot of code. Pulling this shared code into separate modules meant we didn't have to duplicate efforts, made updating and bug fixing easier, and allowed us to deliver identical experiences while still enabling us to differentiate between input types.
For example, when customers navigate to a planet, the ExploreResponse is called. This is true for both voice and touch. However, when called via a touch event, we set the optional speak parameter to true, which tells the skill to mute the spoken response.
module.exports = (handlerInput, destination, speak = true) => {
...
if (handlerInput.requestEnvelope.context.System.device.supportedInterfaces['Alexa.Presentation.APL']) {
return handlerInput.responseBuilder
.speak(speak && text)
.addDirective(planetDirective(destination))
.getResponse();
} else {
return handlerInput.responseBuilder
.speak(`${text} What would you like to learn? You can ask me how big it is, how far away it is, what its atmosphere is made of, and how many moons it has. Or, you can just ask me to tell you about it.`)
.reprompt('What would you like to learn?')
.getResponse();
}
};
Differentiating between voice and touch input using an optional parameter
Our responses also handle the business of setting session attributes. We use session attributes to maintain a customer's context as they navigate throughout the skill. These attributes set the location and manage the history stack for the skill.
const attributes = handlerInput.attributesManager.getSessionAttributes();
if (
attributes.previousLocation &&
attributes.previousLocation[attributes.previousLocation.length - 1] !== attributes.location
) {
attributes.previousLocation.push(attributes.location);
}
handlerInput.attributesManager.setSessionAttributes(attributes);
An example of setting the location context and managing the history stack using session attributes
Putting this logic in these responses means touch events also trigger session attribute updates and ensure context is maintained regardless of interaction type.
APL is a very flexible language. Natively, it offers several ways to create reusable layouts. Our skill has several layouts that are shared across multiple entry points. Each layout accommodates a varying number of elements, different data, and potential empty states. We wanted to make sure we weren't rewriting the same layouts multiple times, so we made the choice to build the layout documents into modules.
Each module exports a function with variable parameters. We pass data through those parameters that modifies how the layout is constructed, determines the datasources used, and provides further context. Beyond that, there are ways we use the modules to extend and amplify some of the native features of APL and even modify our data to suit the usage.
Both the Atmosphere and Planet Details views are good examples of how we used our JavaScript to enable modularity that wasn't possible with APL alone.
The designs for atmosphere on hubs called for a variable number of elements to be laid out, either in a row centered in the viewport, or as individual pager items. This layout didn't suit a list pattern, so we couldn't use the built-in layout provided by the Sequence component. To make this layout happen, we used JavaScript to loop though the data and build the layouts for each element. In the snippet below, comp is the atmospheric composition, a parameter passed to the module.
if (comp) {
comp.forEach(item => {
elements.push({
type: 'Element',
element: item.element,
title: item.element.toUpperCase(),
notation: `\${payload.data.properties.elements['${item.element}'].notation}`,
color: `\${payload.data.properties.elements['${item.element}'].color}`,
percentage: item.percentage,
spacing: '32dp'
});
});
}
elements.forEach(item => {
pagerItems.push({
type: 'Container',
alignItems: 'center',
justifyContent: 'center',
width: '100%',
height: '100%',
paddingLeft: '6vw',
paddingRight: '6vw',
paddingTop: 50,
direction: 'row',
item
});
});
}
Instead of using a bunch of when properties on elements, we simply create an array of elements and pass that to the items property of the container. This means that the number of elements is not limited to what we initially include, nor do we need to rewrite the same APL multiple times.
Some screens serve as launchpads for information- and image-rich experiences, such as the PlanetDetails view. We want customers to navigate smoothly into any of the options on the page, but we noticed that latency could cause screens to load without images, initially. To mitigate that, we dynamically build some hidden image preloads into the document. Below we are creating image preloads using JavaScript.
...
const imagePreloads = [];
data[planet].satellites.interesting.forEach(item => {
imagePreloads.push({
type: 'Image',
height: 0,
width: 0,
position: 'absolute',
zIndex: 0,
opacity: 0,
source: item.image
});
});
imagePreloads.push({
type: 'Image',
height: 0,
width: 0,
position: 'absolute',
zIndex: 0,
opacity: 0,
source: data[planet].atmosphereImage
});
...
We added a hidden container to the layout, then passed the imagePreloads array to it's item property. This gave the renderer an opportunity to cache the images, preventing the customer from seeing an empty list when viewing a planet's moons.
Throughout our development, we operated under the assumption that devices without screens, such as Echo and Echo Dot, would simply speak the voice response from the skill and ignore the APL. Unfortunately for us, we learned late in the development process that this wasn't the case. Those devices, when sent APL directives, throw an error. We had to go back through our responses and add checks to make sure screens and APL were supported. Here’s an example if statement detecting APL support, from the SolarSystemResponse.
if (handlerInput.requestEnvelope.context.System.device.supportedInterfaces['Alexa.Presentation.APL']) {
return handlerInput.responseBuilder
.addDirective(directive)
.speak('Heading to the solar system. Where would you like to explore next?')
.getResponse();
} else {
return handlerInput.responseBuilder
.speak('Heading to the solar system. Where would you like to explore next?')
.reprompt('Where would you like to explore next?')
.getResponse();
}
It's important to remember that not all devices currently support APL, so checking specifically for that support is the safest way to avoid errors.
I've mentioned that our data is hosted as part of the skill's code. In the case of Space Explorer, we made this decision specifically because we intended to use the skill as an example. We wanted to make it as easy as possible for developers to replicate this skill for skill-building practice, so we deliberately removed any APIs that would require updating keys or endpoints to achieve that. Under normal circumstances, we wouldn’t recommend this, as it reduces flexibility and prevents easy updates. Skills you intend to publish to the Alexa Skills Store should rely on external data sources to keep content fresh without requiring constant manual updates to your skill.
I hope our look inside Space Explorer has helped you better understand the process of building a multimodal skill, clarified how everything fits together, and inspired you to build (or enhance) a multimodal skill of your own. I’m excited to see what you build with APL.