Grato por sua visita. Neste momento esta página está apenas disponível em inglês.
Alexa Blogs Alexa Developer Blogs /blogs/alexa/feed/entries/atom 2019-05-22T15:24:58+00:00 Apache Roller /blogs/alexa/post/896a5310-4189-4b8c-bc33-5610728019da/how-to-get-started-with-amazon-pay-to-sell-goods-and-services-from-your-alexa-skills How to Get Started with Amazon Pay to Sell Goods and Services from Your Alexa Skills Kristin Fritsche 2019-05-22T08:30:00+00:00 2019-05-22T15:24:58+00:00 <p><img alt="b895dd2c0d1ae02f997ffbec94e9f036cf943a3f171b8a7911e538623f37de8b_c95fe012-f03e-4fd7-8add-70cf1b8958d4.png" src="" /></p> <p>With Amazon Pay for Alexa Skills, you can sell real-world goods and services such as tickets for movies or concerts, car pick up services, food, and more. This post will show you how to add Amazon Pay to your skill in just a few simple steps.</p> <p><img alt="b895dd2c0d1ae02f997ffbec94e9f036cf943a3f171b8a7911e538623f37de8b_c95fe012-f03e-4fd7-8add-70cf1b8958d4.png" src="" /></p> <p>With Amazon Pay for Alexa Skills, you can sell real-world goods and services such as tickets for movies or concerts, car pick up services, food, and more. You can reach customers around the world through an interaction as natural as voice, powered by a seamless payment processing flow handled by Amazon Pay.</p> <p>Developers are already using <a href="" target="_blank">Amazon Pay</a> to bring a variety of real-world products to voice. For example, the British rail operator <a href="" target="_blank">Virgin Trains</a> is able to sell train tickets to customers directly through their Alexa-enabled device.</p> <p>After building an engaging voice experience, you’re ready to learn more about monetizing your Alexa skill using Amazon Pay for Alexa Skills. This post will show you how to add Amazon Pay to your skill in just a few simple steps. Before you start, sign-up as an Amazon Pay merchant. Learn more in <a href="" target="_blank">our guide</a>.</p> <p>The Amazon Pay for Alexa Skills APIs consist of only two operations - <em>Setup</em> and <em>Charge.</em> We will walk you through both, below.</p> <h2>Setup</h2> <p><em>Setup</em> will create an agreement between your merchant account and the buyer, called a <em>BillingAgreement</em>, which will be used to charge the customer in a later step. Amazon Pay uses <a href="">Alexa Skill Connections</a> to have your skill interact with the Amazon Pay services. To initiate the creation of the agreement, we create a matching <em>Connections directive</em> to call the setup operation.</p> <pre> <code>let setupDirective = { 'type': 'Connections.SendRequest', 'name': 'Setup', 'payload': { '@type': 'SetupAmazonPayRequest', '@version': '2', 'sellerId': 'AEMGQXXXKD154', 'countryOfEstablishment': 'US', 'ledgerCurrency': 'USD', 'checkoutLanguage': 'en-US', 'needAmazonShippingAddress': true, 'billingAgreementAttributes': { '@type': 'BillingAgreementAttributes', '@version': '2', 'sellerNote': 'Thanks for shaving with No Nicks', 'sellerBillingAgreementAttributes': { '@type': 'SellerBillingAgreementAttributes', '@version': '2' } } }, 'token': 'IK1yRWd8VWfF' };</code></pre> <p>First, we define the <em>Connections.SendRequest</em> directive for the Amazon Pay <em>Setup</em> operation. The payload inside the directive defines all Amazon Pay relevant information. The most essential ones are the <em>sellerId</em>, which defines <em>who</em> is initiating the charge, the <em>countryOfEstablishment</em> and <em>ledgerCurrency </em>define<em> how </em>to charge the customer. For definitions of all other fields, refer to our <a href="" target="_blank">comprehensive guide</a> linked to in the resources section.</p> <p>You'll notice, we did not define <em>how much</em> to charge yet. This is subject to the C<em>harge</em> operation, if you charge inside your skill, or any other service using our backend APIs, you are charging “offline”.</p> <p>Adding the directive to your response is fairly simple:</p> <pre> <code>return handlerInput.responseBuilder .addDirective(setupDirective) .withShouldEndSession(true) .getResponse(); </code></pre> <p>Note: the reason we end the session is because the Connection.Request will terminate your skill session and invoke it again with a Connections.Response. If you do not end your session or add a re-prompt, it will result in an error.</p> <p>To catch the response, simply define a handler for the <em>Connections.Response </em>request</p> <pre> <code>canHandle(handlerInput) { return handlerInput.requestEnvelope.request.type === &quot;Connections.Response&quot; &amp;&amp; === &quot;Setup&quot;; }</code></pre> <p><code>;</code></p> <p>The payload of the response will contain the <em>billingAgreementId</em> needed to charge the customer.</p> <h2>Charge</h2> <p>Amazon Pay can help you with a variety of use cases. We classify them as the payment workflows, C<em>harge Now</em> and <em>Charge Later</em>.</p> <p><a href="" target="_blank">Charge Now</a> allows you to sell real-world goods (e.g. tickets, clothing, etc.) and charge the buyer while they are still interacting with your skill. It's a perfect match for one-time purchases where you know the exact charge amount. The <em>starter kit</em> in the “No Nicks” demo skill is an example of Charge Now.</p> <p><a href="" target="_blank">Charge Later</a> allows you to setup a <em>BillingAgreement</em>, which represents the buyer's payment and delivery address preferences, if available, and use this agreement to charge the customer at a later point in time via Amazon Pay <a href="" target="_blank">backend APIs</a>. It's the perfect match when you don't know the exact order total yet - e.g. for up-sell opportunities, pay-as-you-go scenarios or subscriptions, where a buyer will be charged in regular intervals.</p> <p>In the <strong><em>chargeNow</em></strong> workflow, you can similarly execute a <em>charge</em> request, using the <em>billingAgreementId</em> received in the <em>setup</em> response.</p> <pre> <code>const billingAgreementId = responsePayload.billingAgreementDetails.billingAgreementId; let directiveObject = { 'type': 'Connections.SendRequest', 'name': 'Charge', 'payload': { '@type': 'ChargeAmazonPayRequest', '@version': '2', 'sellerId': 'AEMGQXXXKD154', 'billingAgreementId': billingAgreementId, 'paymentAction': 'AuthorizeAndCapture', 'authorizeAttributes': { '@type': 'AuthorizeAttributes', '@version': '2', 'authorizationReferenceId': 'ml3qPJG3nC6c65UE', 'authorizationAmount': { '@type': 'Price', '@version': '2', 'amount': '9', 'currencyCode': 'USD' }, 'transactionTimeout': 0, 'sellerAuthorizationNote': '', 'softDescriptor': 'No Nicks' }, 'sellerOrderAttributes': { '@type': 'SellerOrderAttributes', '@version': '2', 'storeName': 'No Nicks', 'sellerNote': 'Thanks for shaving with No Nicks' } }, 'token': 'WASv2lk4pdfI' }</code></pre> <p>The <em>charge</em> operation requires you to at least specify the total amount and currency to request from the customer. For a full reference, refer to the <a href="" target="_blank">comprehensive guide</a> in the resources below.</p> <p>Just like with the <em>setup </em>phase, we'll add the directive to the <em>responseBuilder</em> when preparing the response.</p> <pre> <code> return handlerInput.responseBuilder .addDirective(directiveObject) .withShouldEndSession(true) .getResponse();</code></pre> <p>Once again, define a handler for the <em>Connections.Response</em> request</p> <pre> <code>canHandle(handlerInput) { return handlerInput.requestEnvelope.request.type === &quot;Connections.Response&quot; &amp;&amp; === &quot;Charge&quot;; }</code></pre> <p>The response of the Connections request will tell you if the charge was successful or if there was an issue taking payments.</p> <p>After a successful purchase, you should send a card to the customer’s Alexa app as an order confirmation, including the order details.</p> <pre> <code>var confirmationCardResponse = 'Your order has been placed.\n' + 'Products: 1 Starter Kit \n' + 'Total amount: $9.00\n' + 'Thanks for shaving with No Nicks\n' + '' return handlerInput.responseBuilder .speak( config.confirmationIntentResponse ) .withStandardCard( 'Order Confirmation Details', confirmationCardResponse, config.logoURL ) .withShouldEndSession( true ) .getResponse( ); </code></pre> <p>&nbsp;</p> <p>With just a few simple steps, you’re able to take payments for real-world products or services in an Alexa skill.</p> <p>Get started today with integrating Amazon Pay into your Alexa skill and join the growing list of voice-first merchants. We can’t wait to see what you build!</p> <h2>Resources</h2> <ul> <li><a href="">Best Practices to Create a Delightful Voice Commerce Experience for Your Customers</a></li> <li><a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DELaunch&amp;sc_publisher=ASK&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DELaunch_ASK_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs">Amazon Pay for Alexa Skills</a></li> <li><a href="" target="_blank">Technical Documentation: Integrate a Skill with Amazon Pay</a></li> <li><a href="" target="_blank">Amazon Pay FAQs</a></li> <li><a href="" target="_blank">Amazon Pay API Reference Guide</a></li> <li><a href="" target="_blank">Amazon Pay Sample Skill</a></li> </ul> /blogs/alexa/post/f91afab2-22e8-44cb-8c34-5d9aaaf55463/how-to-leverage-presets-with-alexa-cooking-apis How to Leverage Presets with Alexa Cooking APIs Ahmed El Araby 2019-05-21T21:16:51+00:00 2019-05-21T21:16:51+00:00 <p>If your business offers a connected microwave, this blog post will help you create an easy-to-consume food preset catalog that you can associate with your microwave Alexa skill.</p> <p>Even as Amazon Alexa now appears on over 30,000 Alexa-compatible smart home devices, Alexa is also helping families to do more in the kitchen. With new innovative microwave products, Alexa can control the appliance from anywhere in the house with simple voice commands. This new functionality is available in an expanded Alexa Smart Home Skill API and helps customers prepare meals by replacing cooking controls like defrost, popcorn mode, time and power that would normally require 5 to 10 button presses with a simple voice command. Additionally, the hands-free ability for a consumer to pause and resume cooking in an oven while they take a call or handle another event is exceptionally useful.</p> <p>In 2018, Amazon released its first <a href="" target="_blank">voice-controlled microwave</a>, and <a href="" target="_blank">GE</a> followed suit. Both microwaves utilize the Alexa voice capabilities. These microwaves provide consumers with easy-to-remember voice commands to prepare common food items like popcorn and frozen pizza.</p> <p>If your business offers a connected microwave, this blog post will help you create an easy-to-consume food preset catalog that you can associate with your microwave Alexa skill.</p> <h2>The Cooking Interface</h2> <p>To understand how to implement commands for cooking, we will share the steps and best practices to add voice support to cooking devices. As developers integrate cooking-centered voice commands into connected microwaves, one of the first challenges for providing a great user experience is that packaged food items have complicated names. Variations on sub-brands, sizes, and flavors all lead to requiring voice commands that might be challenging for the customer and Alexa. To help simplify and standardize this interface for developers, Alexa defines the Alexa.Cooking Interface. This interface is common to all cooking endpoints and describes the available operations for the device.</p> <p>The basic voice operation of a microwave would be something like “Alexa, two minutes on my microwave.” This command assumes that customer already placed a food item inside the microwave, and that the customer knows the cooking time required. In case the customer didn’t specify the time, Alexa would ask about the time required to cook the item.</p> <p>What if customers didn’t know the correct mode and the right amount of time to cook and item? In this case, the Alexa preset cooking comes handy. If the microwave manufacturer has created a preset catalog, cusotmers can only ask Alexa to cook by preset name, without the need to know the mode of the time required. In some cases, preset cooking requires either weight, volume or quantity (count) to perfectly cook the food item. This is determined by the preset catalog author. The author can specify that one or more of these food properties are required to fulfill the request. If it is required, Alexa will ask the customer about count, volume or weight if customer didn’t specify them in the request.</p> <p>For cooking with preset settings, the Alexa.Cooking.PresetController helps developers define custom cooking settings appropriate for a manufacturer's appliance.</p> <h2>Using a Preset Catalog</h2> <p>If a microwave has an often used or common preset, the developer should consider it to be controlled with voice commands. Specifically, Alexa-enabled microwaves can provide customers with the ability to cook with most of the commonly used recipes and packaged food by simply providing the name of the food. (The food name will be resolved to a slot value, or catalog item, and will be sent to you skill within the Alexia directive.) Using voice means fewer buttons for the customer to press and convenient cooking control while their hands are busy or messy. Beyond stopping and starting cooking, adjustments to power levels and duration are also available. For example, a customer can stop cooking and then instruct Alexa to set the microwave at 80% power for three minutes.</p> <p>To understand how a preset is used for a cooking device, let’s look at an example. To support a preset for &quot;PRIMO Mango Chicken with Coconut Rice,” the flow from configuration to handling a PresetController directive from Alexa looks like the following:</p> <p style="text-align:center"><img alt="" src="" style="display:block; height:343px; margin-left:auto; margin-right:auto; width:400px" /></p> <ol> <li>The developer provides Amazon with a <a href="">Preset Catalog</a> of supported, custom cooking settings. This includes an entry for “PRIMO Mango Chicken with Coconut Rice.”</li> <li>After the catalog is ingested by Amazon, the developer will receive a unique preset catalog ID to be used in the discovery response of a cooking device.</li> <li>The developer builds an Alexa skill that supports the cooking endpoint.</li> <li>The discovery response returned by the cooking endpoint skill defines the required <a href="">presetCatalogId</a> (received from catalog ingestion) and <a href="">supportedCookingModes</a>.</li> <li>A customer enables the cooking endpoint skill through account linking.</li> <li>The customer says, “Alexa, microwave the Primo mango chicken with coconut rice.”</li> <li>Alexa interprets the food and cooking verb from the customer and sends a <a href="">CookByPreset</a> directive to the cooking endpoint skill.</li> <li>Using the preset directive information, the cooking endpoint instructs the endpoint to cook on High for two minutes.</li> </ol> <p>To provide cooking by name of an item, it is necessary for Amazon to train Alexa to understand the items in a provided preset catalog to offer the best customer experience.</p> <h2>Using the Supported Cooking Modes</h2> <p>A required element of a cooking endpoint is the Supported Cooking Modes. These modes describe the configuration settings for a defined mode.</p> <p>The current CookingMode values are as follows:</p> <ul> <li>Defrost - Cooking appliance is automatically configured to defrost mode</li> <li>Off - Switches off the device</li> <li>Preset - Brings it back to its automated cooking</li> <li>Reheat - Sets it to reheating mode</li> <li>TimeCook - Sets the time and power level for cooking</li> </ul> <p>For example, an endpoint with the Defrost cooking mode defined could support the following user utterance: &quot;Alexa, defrost two pounds of chicken.” In this example, the Preset is a chicken, while the Cooking Mode will be set to DEFROST. Alexa has support for food quantities, count or weight when using the preset cooking functionality.</p> <h2>Best Practices for Authoring the Preset Catalog</h2> <p>For most customers, it can be tedious when mentioning the full name of the item they want to cook in the microwave. For instance, having to say, “Alexa, microwave PRIMO Frozen Sandwiches Four Meat and Four Cheese Pizza” and “Alexa, cook ALWAYS-FRESH Frozen Sandwiches Pepperoni and Sausage Pizza.” Those two items have the same cooking instructions, and customers end up omitting the brand name. This omission might lead to preset name failure.</p> <p>To overcome this problem, avoid repeating the same words in more than one item; this makes detection more difficult. Understanding that only the Preset Name and Cooking Mode are required, it is recommended to group similar items by the mode of cooking and not the item name. For example, the following two items share the same cooking mode and time settings, as well as the general item name, but are from different brands:</p> <p style="text-align:center"><img alt="" src="" style="display:block; height:288px; margin-left:auto; margin-right:auto; width:600px" /></p> <p>In this scenarios, group these two items into one preset record: HERB ROASTED CHICKEN.Users can say, “Alexa, cook Herb Roasted Chicken” or “Alexa, Microwave Herb Roasted Chicken.” In this case, there is a preset on this microwave for Herb Roasted Chicken. Alexa sends a Cook By Preset directive with details about the food.</p> <h2>Conclusion</h2> <p>Developers should understand the challenges of voice recognition when undertaking Preset Cooking and should expect that Preset Catalog Item names reflect the most common way people identify or describe the cooked item. This is not necessarily the actual name on the box label. You should research how your customer names or refers to the supported food items that are to be used by your endpoint. Additionally, focus on how an item is cooked using your device and not the preset name.</p> <h2>Additional Resources</h2> <ul> <li><a href="">Alexa.Cooking Interface</a></li> <li><a href="">Introducing Cooking Capabilities in the Alexa Smart Home Skill API</a></li> </ul> /blogs/alexa/post/2d8c2128-eec9-44cc-9274-444940eb0a4d/using-adversarial-training-to-recognize-speakers-emotions Using Adversarial Training to Recognize Speakers’ Emotions Larry Hardesty 2019-05-21T13:20:57+00:00 2019-05-21T14:22:53+00:00 <p>The combination of an autoencoder, which is trained to output the same data it takes as input, and adversarial training, which pits two neural networks against each other, confers modest performance gains but opens the door to extensive training with unannotated data.&nbsp;</p> <p>A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.&nbsp;</p> <p>Emotion recognition has a wide range of applications: it can aid in health monitoring; it can make conversational-AI systems more engaging; and it can provide implicit customer feedback that could help voice agents like Alexa learn from their mistakes.</p> <p>Typically, emotion classification systems are neural networks trained in a supervised fashion: training data is labeled according to the speaker’s emotional state, and the network learns to predict the labels from the data. At this year’s International Conference on Acoustics, Speech, and Signal Processing, my colleagues and I <a href="" target="_blank">presented</a> an alternative approach, in which we used a publicly available data set to train a neural network known as an adversarial autoencoder.</p> <p>An adversarial autoencoder is an encoder-decoder neural network: one component of the network, the encoder, learns to produce a compact representation of input speech; the decoder reconstructs the input from the compact representation. The adversarial learning forces the encoder’s representations to conform to a desired probability distribution.</p> <p>The compact representation — or “latent” representation — encodes all properties of the training example. In our model, we explicitly dedicate part of the latent representation to the speaker’s emotional state and assume that the remaining part captures all other input characteristics.&nbsp;</p> <p>Our latent emotion representation consists of three network nodes, one for each of three emotional measures: <em>valence</em>, or whether the speaker’s emotion is positive or negative; <em>activation</em>, or whether the speaker is alert and engaged or passive; and <em>dominance</em>, or whether the speaker feels in control of the situation. The remaining part of the latent representation is much larger, 100 nodes.</p> <p style="text-align:center"><img alt="Adversarial_autoencoder.jpg" src="" style="display:block; height:309px; margin-left:auto; margin-right:auto; width:600px" />&nbsp;<br /> <em><sup>The architecture of our adversarial autoencoder. The latent representation has two components (emotion classes and style), whose outputs feed into two adversarial discriminators.</sup></em></p> <p>We conduct training in three phases. In the first phase, we train the encoder and decoder using data without labels. In the second phase, we use adversarial training to tune the encoder.</p> <p>Each latent representation — the three-node representation and the 100-node representation — passes to an adversarial discriminator. The adversarial discriminators are neural networks that attempt to distinguish real data representations, produced by the encoder, from artificial representations generated in accord with particular probability distributions. The encoder, in turn, attempts to fool the adversarial discriminator.&nbsp;</p> <p>In so doing, the encoder learns to produce representations that fit the probability distributions. This ensures that it will not overfit the training data, or rely too heavily on statistical properties of the training data that don’t represent speech data in general.</p> <p>In the third phase, we tune the encoder to ensure that the latent emotion representation predicts the emotional labels of the training data. We repeat all three training phases until we converge on the model with the best performance.&nbsp;</p> <p>For training, we used a public data set containing 10,000 utterances from 10 different speakers, labeled according to valence, activation, and dominance. We compared the performance of the proposed learning method and the fully supervised learning baseline and observed marginal improvements.</p> <p>In tests in which the inputs to our network were sentence-level feature vectors hand-engineered to capture relevant information about a speech signal, our network was 3% more accurate than a conventionally trained network in assessing valence.</p> <p>When the input to the network was a sequence of vectors representing the acoustic characteristics of 20-millisecond <em>frames</em>, or audio snippets, the improvement was 4%. This suggests that our approach could be useful for end-to-end spoken-language-understanding systems, which dispense with hand-engineered features and rely entirely on neural networks.</p> <p>Moreover, unlike conventional neural nets, adversarial autoencoders can benefit from training with unlabeled data. In our tests, for purposes of benchmarking, we used the same data sets to train both our network and the baseline network. But it’s likely that using additional unlabeled data in the first and second training phases can improve the network’s performance.</p> <p><em>Viktor Rozgic is a senior applied scientist in the Alexa Speech group.</em></p> <p><a href="" target="_blank"><strong>Paper</strong></a>: “Improving Emotion Classification through Variational Inference of Latent Variables”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Srinivas Parthasarathy, Ming Sun, Chao Wang</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Two New Papers Discuss How Alexa Recognizes Sounds</a></li> <li><a href="" target="_blank">Adversarial Training Produces Synthetic Data for Machine Learning</a></li> <li><a href="" target="_blank">To Correct Imbalances in Training Data, Don’t Oversample: Cluster</a></li> <li><a href="" target="_blank">How Alexa Is Learning to Ignore TV, Radio, and Other Media Players</a></li> </ul> /blogs/alexa/post/e7d13044-cb20-4e78-8b3e-260a54034287/alexa-fund-invests-in-unruly-studios-and-zoobean-to-boost-learning-with-alexa Alexa Fund invests in Unruly Studios and Zoobean to boost learning with Alexa Brian Adams 2019-05-21T13:00:00+00:00 2019-05-21T15:33:36+00:00 <p>Today, we are excited to announce two more investments in education companies exploring integrations with Alexa – Unruly Studios and Zoobean.</p> <p>Since launching Alexa in 2014, we have seen more and more customers make Alexa part of their daily lives. The number of customers interacting with Alexa on a daily basis both more than doubled last year, and we’re encouraged by the ways in which voice is making their lives easier, more productive and more entertaining.&nbsp;</p> <p>Education and learning is a great example of that, and we see lots of innovation in this category. Families, in particular, love interacting with Alexa because she introduces new ways to learn about the world around them – from animals and science to math, spelling and more. And new skills allow them to put a new twist on game night by offering fun, educational games the whole family can enjoy. Learning and education come to life in this communal setting, and we’ve seen a number of developers introduce new skills that allow parents and kids to share in the learning experience. In fact, there are already thousands of education and reference skills in the Alexa Skills Store.</p> <p>The Alexa Fund has helped support this category by investing in several promising edtech startups. Last fall, the Alexa Fund invested in <a href="">Bamboo Learning</a>, a voice-based software and services company with a mission to provide interactive teaching using in voice-first education applications and content.&nbsp; Sphero, another Alexa Fund investment, has continued to see positive momentum for <a href="">Sphero Edu</a>, which combines coding and robotics to make STEM education even more engaging for students.</p> <p>Today, we are excited to announce two more investments in education companies exploring integrations with Alexa – Unruly Studios and Zoobean.</p> <p><a href="">Unruly Studios</a> is an alum of the 2018 Alexa Accelerator, and we are thrilled to be reinvesting in the company as part of its seed round. Unruly is led by Bryanne Leeming, who founded the company with a compelling mission: to get more kids involved with STEM by combining coding with active, physical play through their first product, Unruly Splats. Unruly is exploring ways to connect Splats with Alexa to make the entire experience even more fun and engaging, while giving kids a glimpse into the basics of programming and voice design.</p> <p>Zoobean is the company behind <a href="">Beanstack</a>, software that allows schools and libraries to facilitate reading, and for people of all ages to track their reading progress. The company was founded by Jordan Lloyd Bookey and Felix Brandon Lloyd, who got their start in 2014 with an appearance on Shark Tank. Mark Cuban invested in Zoobean following that appearance, and has continued to back the company in the time since. Jordan and Felix share <a href="">Mark’s optimism</a> about voice technology and its potential to make learning easier and more fun, and they’re exploring ways to integrate Alexa into Beanstack, allowing readers to ask Alexa to track their progress or send reminders about reading time.</p> <p>“One of the reasons I’m so optimistic about voice technology is because it creates this communal experience where multiple people can share in the interaction,” said Mark Cuban. “Every startup founder should be looking at how voice services like Alexa fit into their business model, and it’s great to see companies like Zoobean and Unruly take that to heart. I’m excited to see them evolve their products and use voice to make reading and STEM accessible to more people.”</p> <p>Like us, the founders of Unruly and Zoobean see voice as a way to make learning easier, more fun and more engaging for people of all ages. As part of the Alexa Fund portfolio, they’ll continue to explore opportunities to integrate Alexa into their products and services -- we can’t wait to see what they build in the future!</p> /blogs/alexa/post/fc82ccb8-c204-46d9-a4e0-5fc22a84e040/voice-expert-q-a-how-discovery-designs-multimodal-alexa-skills Voice Expert Q&amp;A: How Discovery Designs Multimodal Alexa Skills Jennifer King 2019-05-20T14:00:00+00:00 2019-05-20T14:00:00+00:00 <p><img alt="" src="" style="height:240px; width:954px" /></p> <p>We recently spoke with Tim McElreath, director of technology for mobile and emerging platforms at Discovery, to learn how Discovery is leveraging voice, explore his team’s process for building multimodal skills, and dive deep into their Food Network Alexa skill.</p> <p><img alt="" src="" style="height:240px; width:954px" /></p> <p>Customers embrace voice because it’s simple, natural, and conversational. Adding visual elements and touch to deliver multimodal, voice-first experiences can make can make your Alexa skill even more engaging and easy to use. Developers are already building <a href="">multimodal skills</a> using the <a href="">Alexa Presentation Language (APL)</a>, creating immersive visuals with information that complements the voice experience.</p> <p>We had the opportunity speak with one voice leader—<a href="" target="_blank">Tim McElreath</a>, director of technology for mobile and emerging platforms at Discovery, Inc.—to learn more about how Discovery is leveraging voice, explore his team’s process for building multimodal skills, and dive deep into their Food Network Alexa skill.</p> <p><a href="" target="_blank">Senior Solutions Architect Akersh Srivastava </a>and I sat down with Tim during <a href="">Alexa Live</a>, a free online conference for voice developers. Below is recap of our discussion, which has been edited for brevity and clarity. You can also watch the full 45-minute interview below.</p> <p style="text-align:center"><a href=""><iframe allowfullscreen="" frameborder="0" height="360" src="//" width="640"></iframe></a></p> <p><strong>Akersh Srivastava: </strong>Tim, tell us about what you do. If you had to pitch yourself to the community, how would you do it?</p> <p><strong>Tim McElreath:</strong> I come from both a design and an engineering background. I'm a graduate of an art and design school but I also grew up around computers. The way I see myself is trying to bridge that gap between product design and engineering, and developing a user experience with a focus on how users really want to interact with digital interfaces. Now, at Discovery, I work very closely with the Food Network and HGTV. We're also brands like Motor Trend, Animal Planet, and brands that allow people to build their lives around the things that are important, like how they eat, how they create their home, and their past times. There's a lot of content and experiences to play with.</p> <p><strong>Akersh: </strong>How did you discover Alexa and the Alexa community?</p> <p><strong>Tim:</strong> I started working with Alexa back in 2016, so fairly early on. We have a great product team at Discovery and they recognized, because of the rate of adoption of Amazon Echo devices, that voice was going to be much more than just a novelty. This was a new way that customers could engage with our content now, and we didn’t want to wait around to see how the technology would evolve and jump in later. We wanted to start exploring how we could use voice interfaces and conversational interfaces to deliver our content, our experiences, our personalities, and our information in a more direct way to our customers. We started building a Food Network skill back in 2016 and we've been expanding on that ever since.</p> <p><strong>Cami Williams: </strong>Here at Alexa, we’re spearheading a voice-first initiative, but many skills also include some sort of component that would require you to think about multimodal experiences. I think it depends on the brand, the brand’s content, and how their customers typically engage. It's important to not only consider your voice-first approach but also previous generations of technology, like web and mobile, and recognize their influence within the voice community. With that in mind, what makes you most excited about voice?</p> <p><strong>Tim:</strong> We're in the beginning of a shift in the way humans interact with digital interfaces. We went from the early days of PC into the web into mobile 10 years ago. When you see that shift, we have to re-teach ourselves how to interact with digital interfaces. The expectation is that digital interfaces are going to understand us. But as engineers and designers, we're going to do the heavy lifting so that users can talk in their most natural language. For me, it's really an entirely new way of connecting with customers and users, and we're still figuring it out. That's really the exciting part. We don't know exactly what those expectations are going to be in the future, so being involved in it now feels very exploratory and very innovative.</p> <p><strong>Cami: </strong>Interacting with touch- and screen-based devices has become second nature. With the Alexa Presentation Language, we're excited to see how developers marry touch, screens, and voice, bringing conversation to this second-natured touch and screen experience. When you think about developing multimodal skills for Discovery, how can you marry the voice experience with the visual experience? And what's your perspective on multiple modalities for voice interfaces?</p> <p><strong>Tim:</strong> I think it's a fascinating challenge because one of the shifts in application design is that you're creating a single application that is meant to be delivered on anything, from an Echo Dot, to a small speaker, to a smart screen on your counter, to a connected TV, to auto, to headphones, and the list goes on. It's all the same experience but you have to tailor that to not only the device capabilities and the device modality, but the way the users are expected to be using that device in their current situation.</p> <p>When you're thinking about delivering a response through Alexa to a customer on a particular device, how do you change that response to make it fit their situation if it's on their night table or if they're standing six feet away from it on a kitchen counter? And how much attention are they going to be paying to that screen? For example, if you're delivering a response to a connected TV, you can expect that they're going to be actually paying attention to that screen because they're in &quot;lean back&quot; mode. However, if it's a smart screen on a kitchen counter, they may not be looking at that screen at all. You have to make sure that you're giving the information through your speech response, just in case they're not fully engaged with that screen in that particular context. If there's no screen at all, you have to be able to give them the complete information of what they're looking for via voice alone. You have to pay attention to what the user is asking for and what the device is capable of presenting. It's about adapting your interface to the user to make it as easy as possible for the user to get what they need.</p> <p><strong>Cami:</strong> What’s the skill-building process like for you and your team?</p> <p><strong>Tim:</strong> We start by approaching every interface as a conversational interface. Meaning, if we’re building a system, we think of every interaction as part of an ongoing conversation with context and history. We start by designing every interaction from that point of view, rather than starting with the visual UI or system design. We actually get people into a room and we role play. One person will be the application and knows a certain set of information and can communicate it. How would you talk to that application, that person, in a way that most naturally gives you that information using the minimum visual feedback that's necessary to give you what you need? With the minimal text input and the minimal haptic input, what is the easiest way to use people's natural language to fulfill some utility, entertainment, or need?</p> <p>Our engineers participate in the process as well. They're closest to how the technology can actually work and how we can design it from a technical point of view. They have more insight on some of the features that could assist with some of those conversational patterns. It's a combination of engineering, interaction design, the language being used in order to fulfill requests, and how we break those requests up into intense and slot values.</p> <p>During the second half of the interview (starting at <a href=";;t=1402" target="_blank">23:22 in the video</a>), we asked Tim to walk us through how his team designed the voice, visual, and touch experience for the Food Network skill. We loved having a chance to chat with Tim and enjoyed learning how a large brand is getting in early with voice to further engage and delight customers.</p> <p>If you’re excited to start building multimodal voice experiences, check out our resources below.</p> <h2>Related Content</h2> <ul> <li><a href="">See What Others Have Built with APL</a></li> <li><a href="">Hear It from a Skill Builder: Going from Voice-Only to Voice-First with Multimodal Alexa Skills</a></li> <li><a href="">How to Design with the Alexa Presentation Language Components to Create New Voice-First Experiences in Your Alexa Skill</a></li> <li><a href="">10 Tips for Designing Alexa Skills with Visual Responses</a></li> <li><a href="">4 Tips for Designing Voice-First Alexa Skills for Different Alexa-Enabled Devices</a></li> <li><a href="">How to Design Visual Components for Voice-First Alexa Skills</a></li> <li><a href="">How to Quickly Update Your Existing Multimodal Alexa Skills with the Alexa Presentation Language</a></li> <li><a href="">New Alexa Skill Sample: Learn Multimodal Skill Design with Space Explorer</a></li> <li><a href="">New Alexa Skill Sample: Learn Multimodal Skill Design with Sauce Boss</a></li> </ul> /blogs/alexa/post/7e3376cf-97d7-41d6-86a6-afcdf1ca1379/new-alexa-skills-training-course-build-your-first-alexa-skill-with-cake-walk New Alexa Skills Training Course: Build an Engaging Alexa Skill with Cake Walk Jennifer King 2019-05-17T16:38:24+00:00 2019-05-17T16:38:24+00:00 <p><img alt="" src="" /></p> <p>We’re excited to introduce our new self-paced skill-building course called Cake Walk: Build an Engaging Alexa Skill. This free course offers step-by-step guidance on how to build a high-quality Alexa skill from start to finish. Learn about the course and dive in.</p> <p><img alt="" src="" /></p> <p>We’re excited to introduce our new self-paced skill-building course called <a href="">Cake Walk: Build an Engaging Alexa Skill</a>. This free course offers step-by-step guidance on how to build a high-quality Alexa skill from start to finish. New skill builders will learn how to build their first skill in 5 minutes. Experienced developers will learn how to add advanced features like memory to deliver a more personalized and conversational voice experience. When you complete the course, you'll have the foundational knowledge of voice design, skill programming, and development tools to help you build high-quality Alexa skills your customers will enjoy.</p> <h2>Learn How to Design and Implement a Voice Experience</h2> <p>While anyone can quickly build an Alexa skill, there’s a lot to consider to build an engaging voice experience. Having a compelling voice idea is important, but so is implementation. A great skill idea implemented poorly will make it challenging for your skill to gain traction and retain customers. Before you start turning your voice idea into an Alexa skill, we recommend taking the time to learn how to design a voice experience, how to build a voice user interface, and how to leverage skill-building tools. We designed the Cake Walk course to teach you these concepts so you can design and implement an engaging skill.&nbsp;</p> <p>Cake Walk is a simple sample skill that enables Alexa to count down the days until your birthday. Cake Walk will also deliver a happy birthday message on your special day. Throughout the course, you’ll learn how to build your own version of Cake Walk, from the basic voice design and implementation to adding advanced features like persistence and memory.</p> <h2>Course Components</h2> <p>The course offers an introduction to voice design concepts and four skill-programming modules:</p> <ul> <li><a href="" target="_blank">Create a skill in 5 minutes</a></li> <li><a href="" target="_blank">Collect slots turn by turn</a></li> <li><a href="" target="_blank">Add memory to Cake Walk</a></li> <li><a href="" target="_blank">Use the Settings API to get the time zone</a></li> </ul> <p>If you’re new to skill building, we recommend starting from the <a href="" target="_blank">introduction</a>. If you already know the basics and want to add memory to your skill, you can skip ahead to <a href="" target="_blank">the section on using persistent attributes.</a> Each module includes the code you need to get started and step-by-step instructions to apply the code.</p> <h2>What You'll Learn</h2> <p>By completing the course, you’ll understand the components of voice design, skill programming, and tooling to help you build engaging skills. You’ll learn how to use the <a href="">Alexa Developer Console</a> to create and test your skill. You’ll also learn how to use <a href="">Alexa-hosted skills</a> to host your skill’s back end. The course introduces the core concepts of voice design and how to program your back end using the <a href="">Alexa Skills Kit Software Development Kit for Node.js</a>.</p> <p>You’ll also learn how to leverage important Alexa Skills Kit (ASK) features like:</p> <ul> <li><a href="">Intents, utterances, and slots</a> to build a voice user interface</li> <li><a href="">Auto delegation</a> to have the skill automatically prompt for missing information</li> <li><a href="" target="_blank">ASK Software Development Kit for Node.js</a> to handle requests sent to your skill</li> <li><a href="" target="_blank">Persistent attributes</a> with <a href="" target="_blank">Amazon S3</a> to remember information</li> <li><a href="">Alexa Settings API</a> to look up the time zone</li> </ul> <h2>More Training Opportunities to Enhance Your Alexa Skills</h2> <p>Once you’ve completed this course, we recommend you continue your learning by checking out these additional training materials:</p> <ul> <li><a href="" target="_blank">Designing for Conversation Course</a>: Learn how to design more dynamic and conversational experiences.</li> <li><a href="">Alexa Design Guide</a>: Learn the principles of situational voice design so that you can create voice-first skills that are natural and user-centric.</li> <li><a href=";sort=time" target="_blank">How to Shift from Screen-First to Voice-First Design</a>: Learn about the four design patterns that make voice-first experiences engaging.</li> </ul> <h2>Get Started with Cake Walk</h2> <p>The self-paced course is free and available for anyone ready to build Alexa skills. <a href="">Click here</a> to get started. And please tell us what you think! Reach out to me on Twitter at <a href="" target="_blank">@SleepyDeveloper</a> to share your comments and feedback.</p> /blogs/alexa/post/80c551eb-5303-4ade-9942-e83d55d1904f/best-practices-to-create-a-delightful-voice-commerce-experience-for-your-customers Best Practices to Create a Delightful Voice Commerce Experience for Your Customers Kristin Fritsche 2019-05-17T08:30:00+00:00 2019-05-17T12:10:51+00:00 <p><img alt="b895dd2c0d1ae02f997ffbec94e9f036cf943a3f171b8a7911e538623f37de8b_c95fe012-f03e-4fd7-8add-70cf1b8958d4.png" src="" /></p> <p>Today, developers and businesses can leverage Alexa to reach customers across over 100 million Alexa-enabled devices, engage with customers, and sell products and services using <a href="">in-skill purchasing (ISP)</a> and <a href="">Amazon Pay for Alexa Skills</a>.</p> <p><img alt="b895dd2c0d1ae02f997ffbec94e9f036cf943a3f171b8a7911e538623f37de8b_c95fe012-f03e-4fd7-8add-70cf1b8958d4.png" src="" /></p> <p>Voice is the next frontier for developers and merchants to reach new customers, extend their brand presence, and generate revenue. Today, developers and businesses can leverage Alexa to reach customers across over 100 million Alexa-enabled devices, engage with customers, and sell products and services using <a href="">in-skill purchasing (ISP)</a> and <a href="">Amazon Pay for Alexa Skills</a>.</p> <p>If you are offering goods or services through your website or your mobile app, you might think about using the same approach for Alexa. But building and designing for voice technology is different than for screen-based devices. While selling your product or service might be your ultimate goal, you first have to build a valuable and convenient voice experience for your customers.</p> <p>If you’re ready to learn how you can leverage voice to build your business, follow these best practices for creating a delightful voice commerce experience for your customers.</p> <h2>Think Voice-First</h2> <p>As you are designing your voice experience, think about how voice can help customers solve a problem or simplify a task. Which components of the customer journey through your current digital channels are cumbersome or tedious, and how can voice make the experience better? For example, how many clicks or taps does it take to check an order status on desktop or mobile, respectively? How can you use Alexa to make that task more convenient—faster, easier, and more natural—for customers? Here are some other related questions to consider:</p> <ul> <li>How can you enhance your current offerings via voice? What is the value proposition of the voice-first purchasing flow? Example: Make your FAQs accessible via voice and help customers get their questions answered in the most natural way.</li> <li>What are top reasons that customers reach out to your customer support? Example: If one of the most frequent questions for your support team is “When will my order arrive?” consider supporting this utterance in your skill.</li> <li>Are there habitual tasks you can simplify, like renewal subscriptions? Example: Habitual tasks more often than not come with habitual products. Make it easy for customers to reorder them via voice. Use your order history to identify the right products automatically and simplify the checkout experience.</li> <li>What about special deals for Alexa on a daily, weekly or monthly basis? Example: A “Deal of the Day” is a nice way to put the most interesting products into focus and curate the selection for your customers.</li> </ul> <h2>Consider Supporting Multimodal Experiences</h2> <p>With the <a href="" target="_blank">Alexa Presentation Language (APL)</a>, you can build multimodal voice experiences that are compatible with Alexa-enabled devices with screens. Customers embrace voice because it’s simple, natural, and conversational. When you build a <a href="">multimodal</a> experience, you combine voice, touch, text, images, graphics, audio, and video in a single user interface. The result is voice-first experience complemented by visuals. You can provide customers with complementary information that’s easily glanceable from across the room. You can build immersive experiences that customers can sit back and watch, or lean into to get things done. And, you can optimize skills to deliver the best experience on whatever device a customer is using. Example: If you are selling apparel, a multimodal experience will help customers see the product before they buy it.</p> <h2>Keep It Simple</h2> <p>Simplicity and convenience are critical for expriences on Alexa. Don’t try to do everything with your skill, instead create a seamless customer interaction. Your voice user interface should be simple and easy to interact with - so should be the items you are selling via voice. Example: Product searches can result in a high number of results. Instead of reading a long list of results to the customer, only provide a smaller selection (e. g. 3 to 5) of the results at once. This makes it easier for the customer to follow. A multimodal experience can complement the voice experience by providing a visual list for the search results. Read more about building voice-first experiences for Alexa-enabled devices with screens below.</p> <h2>Limit Your Selection</h2> <p>While your first intention might be to offer as wide a selection as possible within your skill, constrain and curate what you offer at first. For example, only offer customers their most frequent purchases or your businesses most popular products. This will help reduce the paradox of choice for customers. You can widen your selection over time and learn from customer feedback and by leveraging <a href="" target="_blank">skill usage analytics</a>.<em> </em>Example: Offer one to two products at first and refer to the skill usage analytics dashboard to see how customers are interacting with your upsell. Use this data to determine which products to add and where in the customer journey is the optimal time to upsell them.</p> <h2>Think Multichannel</h2> <p>When designing your voice experience, the customer experience don't needs to start and end with voice. Enable your customers to start checkout on your website or mobile app and complete the purchase via Alexa, or vice versa. To help you in making this vision come true, we have created <a href="" target="_blank">Amazon Pay Buyer ID</a> for you, to identify your customers across channels and personalize the experience for them. With help of the Amazon Pay <a href="" target="_blank">Automatic Payments API</a>, customers can pre-authorize payments for future purchases. This enables you to charge a customer's Amazon account on a regular basis for subscriptions and usage-based billing without requiring the customer to start a new voice checkout any time. Example: Use the knowledge you gained about your customers over time. Any time customers interact with you via a new channel – just delight them with a pleasant personalized experience. For example, let’s say you have a candy subscription service that sends a care package to subscribers every month. By leveraging &nbsp;the Automatic Payments API in combination with Amazon Pay Buyer ID, you can create an Alexa skill to allow customers to manage their care packages (e.g. change the size, order an extra two for a month) and bill accordingly.</p> <h2>Resources</h2> <ul> <li><a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DELaunch&amp;sc_publisher=ASK&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DELaunch_ASK_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs">Amazon Pay for Alexa Skills</a></li> <li><a href="" target="_blank">Technical Documentation: Integrate a Skill with Amazon Pay</a></li> <li><a href="" target="_blank">Amazon Pay FAQs</a></li> <li><a href="" target="_blank">Amazon Pay API Reference Guide</a></li> <li><a href="" target="_blank">Amazon Pay Sample Skill</a></li> </ul> /blogs/alexa/post/4c24b9ac-8da8-47c0-a639-002e46c85932/alexa-auto-finalist-for-the-tu-automotive-best-auto-mobility-product-service-award Alexa Auto: Finalist for the TU-Automotive Best Auto Mobility Product/Service Award Arianne Walker 2019-05-16T15:43:42+00:00 2019-05-16T15:43:42+00:00 <p><a href="" target="_self"><img alt="" src="" style="height:240px; width:954px" /></a></p> <p>Amazon Alexa’s in-vehicle voice-first experience is a finalist for the Best Auto Mobility Product/Service TU-Automotive Award.</p> <p><img alt="Alexa Auto Hero" src="" style="height:240px; width:954px" /></p> <p>Amazon Alexa’s in-vehicle voice-first experience is a finalist for the Best Auto Mobility Product/Service TU-Automotive Award. We’re humbled by the recognition, which serves as a testament to how the auto industry has already begun to embrace Alexa, and customer demand for a voice-first experience in the vehicle.&nbsp;</p> <p>Many automakers are bringing Alexa into their newer <a href="" target="_blank">vehicles</a> to improve and augment the voice-first experience for drivers. If you’re not in the market for a brand-new car, you can choose from a number of aftermarket <a href="" target="_blank">devices</a> with Alexa built-in to bring Alexa with you on the go.&nbsp;</p> <p>Bringing Alexa into the vehicle means staying productive, connected and entertained while on the go. Alexa allows you to voice control popular music services, including Amazon Music Unlimited, iHeartRadio, Pandora, TuneIn, and more. Drive time becomes can’t-stop-listening time as you immerse yourself in an audiobook with Audible. With a cloud of information at the tip of your tongue, just ask Alexa for the latest sports scores, stock prices, weather forecast, traffic updates, and answers to general questions.</p> <p>To maintain productivity while on the go, Alexa can help manage your calendar, to-do lists, grocery lists, shopping cart, orders, and more. Alexa can help you join a conference call on your commute to work or drop in on your family to let them know you are headed home.</p> <p>Whether you know exactly where you want to go or are looking to try some place new, Alexa is always ready to make recommendations and get you directions to keep you moving throughout your day.</p> <p>Alexa can also keep you connected to your <a href=";node=6563140011" target="_blank">smart home devices</a> while on the go. Have Alexa lock your front door, turn on your porch light, turn on your alarm system, and set your thermostat to away mode—just in case you forgot. Use <a href="" target="_blank">connected car Alexa skills</a> to remotely control your vehicle from the comfort of your home. You can remotely lock/unlock your car doors, start/stop the engine, find out how<a href="" target="_blank"><img alt="TU-Automotive Awards Finalist" src="" style="float:right; height:240px; margin:5px; width:480px" /></a> much fuel you have left, and more—just&nbsp;ask Alexa.</p> <p>All of this functionality (and <a href="" target="_blank">more</a>) allows you to simply use your voice while keeping your hands on the wheel and eyes on the road.&nbsp;</p> <p>On June 6, 2019, at TU-Automotive Detroit, <a href="" target="_blank">Chris Wenneman</a>, Director, Alexa Automotive will be sharing more about integrating voice into the vehicle and how it’s enhancing the user experience.&nbsp;We hope to see you there.&nbsp;In the meantime, see what <a href="" target="_blank">vehicles</a> have Alexa built-in and how easy it is to bring Alexa into more vehicles with the <a href="" target="_blank">Alexa Auto SDK</a>.</p> /blogs/alexa/post/9c5f66b0-9514-423a-827e-a2e71a68b456/how-to-write-great-dialogs-for-alexa-skills How to Write Engaging Dialogs for Alexa Skills Jennifer King 2019-05-16T14:00:00+00:00 2019-05-16T14:00:00+00:00 <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>In much the same way the layout, color, and animations on a screen can impact usability, wording can signal subtle meaning to users. Learn how to write skill dialog that's focused, natural, and simple.</p> <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>For designers and developers looking to build <a href="">conversational Alexa skills</a>, you have to think differently about how you design the experience. The hard-won design patterns we've found success with on other interfaces like the web and mobile haven’t always transferred seamlessly to voice. Voice experiences are best when they're adaptable, letting customers speak in their own words. They're personal, speaking directly to customers. They're available, allowing customers to set direction. And they're relatable, cooperating with customers to get something done. Each of these voice patterns boils down to carefully choosing what we say and how we say it.</p> <p>In much the same way the layout, color, and animations on a screen can impact usability, wording can signal subtle meaning to users. After you've written a script for your Alexa skill, charted the edge cases, and mapped dialog flows, it's time to turn your attention to wording. These small adjustments to your skill wording can have major impact on how focused, natural, and simple the voice experience is for your customers.</p> <p>I recently spoke with <a href="" target="_blank">Alexa Senior Solutions Architect Justin Jeffress</a> on this topic during <a href="">Alexa Live</a>, a free online conference for voice developers. During our session on <a href="" target="_blank">How to Write Great Dialogs for Alexa Skills</a>, Justin and I shared best practices for writing dialog that can elevate a voice experience from a transaction to an experience. In today’s post, I share a recap of what we shared during the session. You can also watch the full 45-minute session below.</p> <p style="text-align:center"><a href=""><iframe allowfullscreen="" frameborder="0" height="360" src="//" width="640"></iframe></a></p> <h2>Start with the Right Foundation Using Situational Design</h2> <p>In order to write great dialogs, you need to start with the right foundation. Conversations ebb and flow, and the responses depend on the previous answer to a question. When designing your dialog for a skill, it might be very tempting to use a flow chart (pictured at <a href="" target="_blank">2:47 in the video</a> above). We’ve found that flow charts drastically limit the interactions customers can have with your skill.</p> <p>The best skills are dynamic. They adjust to what they learn about the customer and flex based on the different ways a customer might interact with a skill. Instead of using flow charts to design your dialog, use <a href="">situational design</a> to build your foundation. Situational design is a voice-first method to design a <a href="">voice user interface (VUI)</a>. Using situational design, you start with a simple dialog that helps you focus on the conversation and how customers interact with your skill. Learn more about situational design <a href="">here</a>.</p> <h2>Be Adaptable – Don’t Onboard Users Again and Again</h2> <p>When you’re getting to know another person, you get to know more and more about that person over time. Every time you see them, you don’t have to re-introduce yourself. You pick up where you left off and continue the conversation. Your dialog should adapt to the customer and their familiarity with your skill. For example, how long have they used your skill? Is this their first time? Is this the second time? How long have they been away from your skill? Has it been five seconds? Five days? Five months? If it's been five seconds since the last time that they used your skill, then you probably don't need onboard them again right away. But if it's been five months, maybe it's time to let them know that you’ve added a new feature instead of treating them like they've never used the skill before.</p> <h2>Be Brief – Be Mindful of How Long Alexa Speaks</h2> <p>With screen-based experiences, the customer can look at a screen to get the information they need and know what to do next. With voice-first experiences, customers have to rely on your dialog. They have to listen, they have to process, and they have to respond. You need to make sure that you're writing things to be heard, not read. That means being brief in your responses so customers can digest what Alexa is saying. It also means pacing your dialog in a way that feels natural.</p> <p>I always like to tell developers to “break the grammar rule” and don’t be afraid to use punctuation. Alexa respects commas and grammatical punctuation marks with pauses in her responses. If you use a comma in your dialog, she'll naturally pause. In a natural conversation, we frequently pause and take breaths. Putting commas in your dialog will help Alexa slow down so the customer can better digest the information. We cover several examples starting at <a href="" target="_blank">2:11 in the video</a>.</p> <h2>More Tips for Writing Engaging Dialogs</h2> <p>In addition to the above, our video covers tips for:</p> <ul> <li><strong>Handling confirmations:</strong> Know when to be implicit vs. explicit.</li> <li><strong>Avoiding voice menus:</strong> How to avoid writing dialog in a phone tree (starting at <a href=";;t=1189" target="_blank">19:49 in the video</a>).</li> <li><strong>Speaking to lists:</strong> Limit options for users and add pauses (starting at <a href=";;t=1309" target="_blank">21:49 in the video</a>).</li> <li><strong>Recovering errors:</strong> How to avoid them and bounce back when they happen (starting at <a href=";;t=1440" target="_blank">24:00 in the video</a>).</li> </ul> <p>Watch the<a href="" target="_blank"> full session</a> to get all the best practices.</p> <h2>Related Content</h2> <ul> <li><a href="">About Situational Design</a></li> <li><a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=wb_acquisition&amp;sc_publisher=ASK&amp;sc_content=Content&amp;sc_detail=Guide&amp;sc_funnel=Convert&amp;sc_country=WW&amp;sc_medium=Owned_WB_wb_acquisition_ASK_Content_Guide_Convert_WW_visitors_guide-page_text-link&amp;sc_segment=visitors&amp;sc_place=guide-page&amp;sc_trackingcode=text-link">Guide: How to Shift from Screen-First to Voice-First Design</a></li> <li><a href="">Alexa Design Guide</a></li> <li><a href="" target="_blank">Guide: 10 Things Every Alexa Skill Should Do</a></li> </ul> /blogs/alexa/post/d5f108c5-d292-47c0-a02e-89fac3d29476/should-alexa-read-2-3-as-two-thirds-or-february-third-the-science-of-text-normalization Should Alexa Read “2/3” as “Two-Thirds” or “February Third”?: The Science of Text Normalization Larry Hardesty 2019-05-16T13:00:00+00:00 2019-05-16T13:00:00+00:00 <p>Text normalization&nbsp;is the process of converting particular words of a sentence into a standard format so that software can handle them. Breaking inputs into component parts and factoring in syntactic information reduces the error rate of a neural text normalization system by 98%.</p> <p><sup><em>Yuzong Liu cowrote this blog post with Ming Sun</em></sup></p> <p>Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.</p> <p><img alt="TokenizerInSDS.png" src="" style="display:block; height:375px; margin-left:auto; margin-right:auto; width:500px" /></p> <p style="text-align:center"><sub><em>ASR = automatic speech recognition; NLU = natural-language understanding; DM = dialogue management;<br /> NLG = natural-language generation; and TTS = text-to-speech synthesis</em></sub><br /> &nbsp;</p> <p>In the example above, time expressions live two lives inside Alexa, to meet an individual skill’s needs and to optimize the system’s performance, even though end users are unaware of such internal format switches. There are many other types of expressions that receive similar treatment, such as date, e-mail address, numbers, and abbreviations.</p> <p>To do text normalization and inverse text normalization in English, Alexa currently relies on thousands of handwritten rules. As the range of possible interactions with Alexa increases, authoring rules becomes an intrinsically error-prone process. Moreover, as Alexa continues to move into new languages, we would rather not rewrite all those rules from scratch.</p> <p>Consequently, at this year’s meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), my colleagues and I will&nbsp;<a href="" target="_blank">report</a> a set of experiments in using recurrent neural networks to build a text normalization system.</p> <p>By breaking words in our network’s input and output streams into smaller strings of characters (called subword units), we demonstrate a 75% reduction in error rate relative to the best-performing neural system previously reported. We also show a 63% reduction in latency, or the time it takes to receive a response to a single request.</p> <p>By factoring in additional information, such as words’ parts of speech and their capitalizations, we demonstrate a further error rate reduction of 81%.</p> <p>What makes text normalization nontrivial is the ambiguity of its inputs: depending on context, for instance, “Dr.” could mean “doctor” or “Drive”, and “2/3” could mean “two-thirds” or “February third”. A text normalization system needs to consider context when determining how to handle a given word.</p> <p>To that end, the best previous neural model adopted a window-based approach to textual analysis. With every input sentence it receives, the model slides a “window” of fixed length — say, five words — along the sentence. Within each window, the model decides only what to do with the central word; the words on either side are there for context.&nbsp;</p> <p>But this is time consuming. In principle, it would be more efficient to process the words of a sentence individually, rather than in five-word chunks. In the absence of windows, the model could gauge context using an attention mechanism. For each input word, the attention mechanism would determine which previously seen words should influence its interpretation.</p> <p><img alt="attns-date.png" src="" style="display:block; height:281px; margin-left:auto; margin-right:auto; width:450px" /></p> <p style="text-align:center"><sub><em>The activation pattern of an attention mechanism, during the normalization of the input “archived from the original on 2011/11/11”</em></sub></p> <p>In our experiments, however, a sentence-based text normalization system, with attention mechanism, performed poorly compared to a window-based model, making about 2.5 times as many errors. Our solution: break inputs into their subword components before passing them to the neural net and, similarly, train the model to output subword units. A separate algorithm then stitches the network’s outputs into complete words.&nbsp;</p> <p>The big advantage of subword units is that they reduce the number of inputs that a neural network must learn to handle. A network that operates at the word level would, for instance, treat the following words as distinct inputs: crab, crabs, pine, pines, apple, apples, crabapple, crabapples, pineapple, and pineapples. A network that uses subwords might treat them as different sequences of four inputs: crab, pine, apple, and the letter s.</p> <p>Using subword units also helps the model decide what to do with input words it hasn’t seen before. Even if a word isn’t familiar, it may have subword components that are, and that could be enough to help the model decide on a course of action.&nbsp;</p> <p>To produce our inventory of subword units, we first break all the words in our training set into individual characters. An algorithm then combs through the data, identifying the most commonly occurring two-character units, three-character units, and so on, adding them to our inventory until it reaches capacity.</p> <p>We tested six different inventory sizes, starting with 500 subword units and doubling the size until we reached 16,000. We found that an inventory of 2,000 subwords worked best.&nbsp;</p> <p>We trained our model using 500,000 examples from a public data set, and we compared its performance to that of a window-based model and a sentence-based model that does not use subword units.</p> <p>The baseline sentence-based model had a word error rate (WER) of 9.3%, meaning that 9.3% of its word-level output decisions were wrong. With a WER of 3.8%, the window-based model offered a significant improvement. But the model with subword units reduced the error rate still further, to 0.9%. It was also the fastest of the three models.</p> <p>Once we had benchmarked our system against the two baselines, we re-trained it to use not only subword units but additional linguistic data that could be algorithmically extracted from the input, such as parts of speech, position within the sentence, and capitalization.</p> <p>That data can help the system resolve ambiguities. For instance, if the word “resume” is tagged as a verb, it should simply be copied verbatim to the output stream. If, however, it’s tagged as a noun, it’s probably supposed to be the word “r&eacute;sum&eacute;,” and accents should be added. Similarly, the character strings “us” and “id” are more likely to be one-syllable nouns if lowercase, two-syllable abbreviations if capitalized.</p> <p>With the addition of the linguistic data, the model’s WER dropped to just 0.2%.&nbsp;</p> <p><em>Ming Sun is an applied scientist in the Alexa AI group, and Yuzong Liu is a machine learning scientist in the Alexa Speech group.</em></p> <p><a href="" target="_blank"><strong>Paper</strong></a>: “Neural Text Normalization with Subword Units”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Courtney Mansfield, <a href="" target="_blank">Ankur Gandhe</a>, Bj&ouml;rn Hoffmeister, Ryan Thomas, Denis Filimonov, D. K. Joo, Siyu Wang, Gavrielle Lent</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Training a Machine Learning Model in English Improves Its Performance in Japanese</a></li> <li><a href="" target="_blank">Automatic Transliteration Can Help Alexa Find Data Across Language Barriers</a></li> </ul>