Gracias por tu visita. Esta página solo está disponible en inglés.
Alexa Blogs Alexa Developer Blogs /blogs/alexa/feed/entries/atom 2019-04-26T03:58:41+00:00 Apache Roller /blogs/alexa/post/83dd06f2-d7d6-4a55-8b4f-1c443c1e483c/training-speech-synthesizers-on-data-from-multiple-speakers-improves-performance-stability Training Speech Synthesizers on Data from Multiple Speakers Improves Performance, Stability Larry Hardesty 2019-04-25T13:22:59+00:00 2019-04-25T13:49:26+00:00 <p>The ability to build new text-to-speech models with relatively little speaker-specific training data could enable a wide variety of customizable speaker styles.</p> <p>When a customer asks Alexa to play “Hey Jude”, and Alexa responds, “Playing 'Hey Jude' by the Beatles,” that response is generated by a text-to-speech (TTS) system, which converts textual inputs into synthetic-speech outputs.</p> <p>Historically, TTS systems used either concatenative approaches, which string together ultrashort snippets of recorded speech, or classical statistical parametric speech synthesis (SPSS) methods, which generate speech waveforms from scratch by fitting them to statistical models.</p> <p>Recently, however, voice agents such as Alexa have begun moving to the neural TTS paradigm, in which neural networks <a href="" target="_blank">synthesize speech</a>. Like all neural networks, neural TTS systems learn from large bodies of training examples. In user studies, subjects tend to rate the speech produced by neural TTS (or NTTS) as much more natural than speech produced through earlier methods.&nbsp;</p> <p>In general, NTTS models require more data than SPSS models. But recent work suggests that training NTTS systems on examples from several different speakers yields better results with less data. This opens the prospect that voice agents could offer a wide variety of customizable speaker styles, without requiring voice performers to spend days in the recording booth.</p> <p>At the International Conference on Acoustics, Speech, and Signal Processing, my colleagues and I will present what we believe is the first <a href="" target="_blank">systematic study</a> of the advantages of training NTTS systems on data from multiple speakers. In tests involving 70 listeners, we found that a model trained on 5,000 utterances from seven different speakers yielded more-natural-sounding speech than a model trained on 15,000 utterances from a single speaker.</p> <p><img alt="NTTS_model.png" src="" style="display:block; height:365px; margin-left:auto; margin-right:auto; width:500px" /></p> <p style="text-align:center"><br /> <em><sup>The architecture of our neural TTS system, which takes a phoneme sequence — a series of short, phonetically rendered word fragments — as an input and outputs a sequence of mel-spectrograms, or snapshots of the power in different frequency bands. The attention mechanism indicates which elements of the input sequence the network should concentrate on when producing each element of the output sequence.</sup></em></p> <p>An NTTS system trained on data from seven different speakers doesn't sound like an average of seven different voices. When we train our neural network on multiple speakers, we use a one-hot vector — a string of 0's with one 1 among them — to indicate which speaker provided each sample. At run time, we can select an output voice by simply passing the network the corresponding one-hot vector.</p> <p>In our user study, we also presented listeners with live recordings of a human speaker and synthetic speech modeled on the same speaker and asked them whether the speaker was the same. On this test, the NTTS system trained on multiple speakers fared just as well as the one trained on a single speaker. Nor did we observe any statistical difference between the naturalness of models trained on data from speakers of different genders and models trained on data from speakers of the same gender as the target speaker.</p> <table align="center" border="0" cellpadding="0" cellspacing="5" style="width:400px"> <tbody> <tr> <td><strong>&nbsp; &nbsp;<a href="" target="_blank">Single-gender model</a></strong></td> <td><a href="" target="_blank"><strong>Mixed-gender model</strong></a></td> </tr> </tbody> </table> <p style="text-align:center"><sup><em>The single-gender model was trained on 5,000 utterances from four female speakers;<br /> the mixed-gender model was trained on 5,000 utterances from four female and three male speakers</em></sup></p> <p>Finally, we also found the models trained on multiple speakers to be more <em>stable</em> than models trained on single speakers. NTTS systems sometimes drop words, mumble, or produce “heavy glitches,” where they get stuck repeating a single sound. In our study, the multi-speaker models exhibited these types of errors less frequently than the single-speaker models.&nbsp;</p> <p>NTTS systems typically consist of two neural networks. The first converts phonetic renderings of text into mel-spectrograms, or 50-millisecond snapshots of the power in a series of frequency bands chosen to emphasize frequencies to which humans are particularly attuned. Because humans can perceive acoustic features shorter than 50 milliseconds in duration, the second network — the <em>vocoder</em> — converts the mel-spectrograms into a finer-grained audio signal.</p> <p>Like NTTS systems, SPSS systems learn to synthesize mel-spectrograms from phonetic data. But with SPSS systems, the vocoders have traditionally been hand-engineered. The versatility and complexity of neural vocoders accounts for much of the difference in performance between SPSS and NTTS.&nbsp;</p> <p>Our experiments suggest that, beyond 15,000 training examples, single-speaker NTTS models will start outperforming multi-speaker models. To be sure, the NTTS version of Alexa's current voice was trained on more than 15,000 examples. But mixed models could make it significantly easier to get new voices up and running for targeted applications.</p> <p><em>Jakub Lachowicz is an applied scientist in the Alexa Speech group</em>.</p> <p><a href="" target="_blank"><strong>Paper</strong></a>: “Effect of Data Reduction on Sequence-to-Sequence Neural TTS”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>:&nbsp;Javier Latorre, Jaime Lorenzo-Trueba, Thomas Merritt, Thomas Drugman,<br /> Srikanth Ronanki, Klimkov Viacheslav</p> <p><strong>Related</strong>:</p> <ul> <li><a href="">Cross-Lingual Transfer Learning for Bootstrapping AI Systems Reduces New-Language Data Requirements</a></li> <li><a href="">Leveraging Unannotated Data to Bootstrap Alexa Functions More Quickly</a></li> <li><a href="">Varying Speaking Styles with Neural Text-to-Speech</a></li> <li><a href="">Amazon Scientists Use Transfer Learning to Accelerate Development of New Alexa Capabilities</a></li> </ul> /blogs/alexa/post/712136c2-2290-40e5-be5b-9bf7d53af8ed/how-to-monetize-alexa-skills-using-the-alexa-skills-kit-software-development-kit-for-java Code Deep Dive: How to Monetize Alexa Skills Using the Alexa Skills Kit Software Development Kit for Java Suhem Parack 2019-04-24T14:00:00+00:00 2019-04-24T14:00:00+00:00 <p>Learn how to implement in-skill purchasing in an Alexa skill using the Alexa Skills Kit (ASK) Software Development Kit (SDK) for Java.</p> <p style="text-align:justify"><strong><em>Editor’s Note:</em></strong><em> In a previous </em><a href=""><em>code deep dive series post</em></a><em>, we provided an end-to-end walkthrough of how to implement </em><a href=""><em>in-skill purchasing</em></a><em> (ISP) in an Alexa skill using Node.js. The deep dive references the <strong>Premium Hello World Skill</strong>, which is a sample skill that demonstrates how to use ISP features by offering a &quot;Greetings Pack” and a “Premium Subscription&quot; that greets the customer in a variety of languages in different accents using Amazon Polly. In today’s tutorial, we will walk through how to implement ISP in an Alexa skill using the <a href="">Alexa Skills Kit (ASK) Software Development Kit (SDK) for Java</a> using the Premium Hello World sample skill. The complete sample Premium Hello World sample skill is available on </em><a href="" target="_blank"><em>GitHub</em></a><em>.</em></p> <h2 style="text-align:justify">About the Premium Hello World Sample Skill</h2> <p style="text-align:justify">The <a href="" target="_blank">Premium Hello World sample skill</a> offers simple greetings in English. The skill offers the Greetings Pack as a one-time purchase as well as a subscription. This provides customers access to greetings in a variety of languages in different accents using <a href="" target="_blank">Amazon Polly</a>. For details on the various ISP scenarios in this skill, please refer to <a href="">this code deep dive post</a>.</p> <h2 style="text-align:justify">Using the MonetizationServiceClient</h2> <p style="text-align:justify">The ASK SDK for Java provides the MonetizationServiceClient, which is used to get in-skill products. It can be obtained from the handler input as follows:</p> <pre> <code class="language-java">MonetizationServiceClient client = input.getServiceClientFactory().getMonetizationService();</code></pre> <p style="text-align:justify">This client allows us to obtain a single in-skill product, using the product ID:</p> <pre> <code class="language-java">InSkillProduct product = client.getInSkillProduct(locale, productId);</code></pre> <p>It also allows us to obtain a list of in-skill products:</p> <pre> <code class="language-java">InSkillProductsResponse response = client.getInSkillProducts(locale, null, null, null, null, null); List&lt;InSkillProduct&gt; inSkillProducts = response.getInSkillProducts();</code></pre> <p style="text-align:justify">Using this list of in-skill products, we can obtain a list of entitled products and purchasable products. For example, we can get list of entitled products as shown here:</p> <pre> <code class="language-java">public static List&lt;String&gt; getAllEntitledProducts(List&lt;InSkillProduct&gt; inSkillProducts) { return .filter(product -&gt; product.getEntitled().toString().equalsIgnoreCase(&quot;ENTITLED&quot;)) .map(product -&gt; product.getName()) .collect(Collectors.toList()); }</code></pre> <h2 style="text-align:justify">Making the Upsell</h2> <p style="text-align:justify">Once we decide to offer an upsell to the customer, we will add a SendRequestDirective to the response:</p> <pre> <code class="language-java">if(shouldUpsell(input)) { String upsellMessage = String.format(&quot;By the way, you can now get greetings in more languages. %s. %s&quot;, greetingsPackProduct.get().getSummary(), getRandomObject(SkillData.LEARN_MORE_STRINGS)); //Upsell Greetings Pack return input.getResponseBuilder() .addDirective(getUpsellDirective(greetingsPackProduct.get().getProductId(), upsellMessage)) .build(); }</code></pre> <p style="text-align:justify">The SendRequestDirective for upsell can be built using the upsell message, productid and token as shown below. We set the name for the SendRequestDirective as upsell using - .withName(&quot;Upsell&quot;)</p> <pre> <code class="language-java">public static SendRequestDirective getUpsellDirective(String productId, String upsellMessage) { // Prepare the directive payload Map&lt;String,Object&gt; mapObject = new HashMap&lt;&gt;(); Map&lt;String, Object&gt; inskillProduct = new HashMap&lt;&gt;(); inskillProduct.put(&quot;productId&quot;, productId); mapObject.put(&quot;upsellMessage&quot;, upsellMessage); mapObject.put(&quot;InSkillProduct&quot;, inskillProduct); return SendRequestDirective.builder() .withPayload(mapObject) .withName(&quot;Upsell&quot;) .withToken(&quot;correlationToken&quot;) .build(); }</code></pre> <p style="text-align:justify">Similarly, the SendRequestDirective for when the customer wants to buy an ISP or cancel the purchase by passing the type - “Buy”, “Cancel,” etc.</p> <pre> <code class="language-java">public static SendRequestDirective getDirectiveByType(String productId, String type) { // Prepare the directive payload Map&lt;String, Object&gt; payload = new HashMap&lt;&gt;(); Map&lt;String, Object&gt; inskillProduct = new HashMap&lt;&gt;(); inskillProduct.put(&quot;productId&quot;, productId); payload.put(&quot;InSkillProduct&quot;, inskillProduct); // Prepare the directive request SendRequestDirective directive = SendRequestDirective.builder() .withPayload(payload) .withName(type) .withToken(&quot;correlationToken&quot;) .build(); return directive; }</code></pre> <h2 style="text-align:justify">Handling the Purchase Experience Flow</h2> <p style="text-align:justify">When a customer says “yes” to buying a pack, Alexa’s Purchase Experience Flow takes over and responds back to the customer with more details about the product, along with the pricing information (provided when the product is created).</p> <p style="text-align:justify">When the customer accepts the upsell offer by responding with a “Yes” to “Would you like to buy it?” Alexa responds back with a Connections.Response, which includes the purchaseResult property and indicates the result of the purchase transaction: ACCEPTED, DECLINED, ALREADY_PURCHASED, or ERROR.</p> <p style="text-align:justify">After the purchase is completed, Alexa sends a Connections.Response directive back to our skill with a purchaseResult of “ACCEPTED.” To handle the Connections.Response, we will implement the BuyResponseHandler. In the canHandle() method of this handler, we can check the request name as shown below:</p> <pre> <code class="language-java">public boolean canHandle(HandlerInput input, ConnectionsResponse connectionsResponse) { String name = input.getRequestEnvelopeJson().get(&quot;request&quot;).get(&quot;name&quot;).asText(); return (name.equalsIgnoreCase(&quot;Buy&quot;) || name.equalsIgnoreCase(&quot;Upsell&quot;)); }</code></pre> <p style="text-align:justify">We will check if the status code for this request was “200 OK.”</p> <pre> <code class="language-java">String code = input.getRequestEnvelopeJson().get(&quot;request&quot;).get(&quot;status&quot;).get(&quot;code&quot;).asText();</code></pre> <p style="text-align:justify">If the status code is 200, then we can handle the various scenarios of the purchaseResult (ACCEPTED, DECLINED, ALREADY_PURCHASED, or ERROR) as shown below:</p> <pre> <code class="language-java">if (inSkillProduct.isPresent() &amp;&amp; code.equalsIgnoreCase(SUCCESS_CODE)) { String preSpeechText; final String purchaseResult = handlerInput.getRequestEnvelopeJson().get(&quot;request&quot;).get(&quot;payload&quot;) .get(&quot;purchaseResult&quot;).asText(); switch (purchaseResult) { case &quot;ACCEPTED&quot;: { preSpeechText = IspUtil.getBuyResponseText(inSkillProduct.get().getReferenceName(), inSkillProduct.get().getName()); break; } case &quot;DECLINED&quot;: { preSpeechText = &quot;No Problem.&quot;; break; } case &quot;ALREADY_PURCHASED&quot;: { preSpeechText = IspUtil.getBuyResponseText(inSkillProduct.get().getReferenceName(), inSkillProduct.get().getName()); break; } default: preSpeechText = String.format(&quot;Something unexpected happened, but thanks for your interest in the %s.&quot;, inSkillProduct.get().getName()); } return IspUtil.getResponseBasedOnAccessType(handlerInput, inSkillProducts, preSpeechText); } //Something failed System.out.println(String.format(&quot;Connections.Response indicated failure. error: %s&quot;, handlerInput.getRequestEnvelopeJson().get(&quot;request&quot;).get(&quot;status&quot;) .get(&quot;message&quot;).toString())); return handlerInput.getResponseBuilder() .withSpeech(&quot;There was an error handling your purchase request. Please try again or contact us for help.&quot;) .build();</code></pre> <p style="text-align:justify">In the case of ACCEPTED, we inform the customer that they have purchased the in-skill product and provide them with a premium greeting. In case of DECLINED, we offer them the simple greeting. In case of ALREADY_PURCHASED, we offer them the premium greeting.</p> <p style="text-align:justify">You can find the complete implementation of this skill in Java on <a href="" target="_blank">GitHub</a>.</p> <h2 style="text-align:justify">Resources and Related Content</h2> <ul> <li style="text-align:justify"><a href="">Make Money with Alexa Skills: An Introduction</a></li> <li style="text-align:justify"><a href="">In-Skill Purchase Certification Guide</a></li> <li style="text-align:justify"><a href="">Add In-Skill Purchasing Directly from the Alexa Developer Console</a></li> <li style="text-align:justify"><a href="">Which Type of In-Skill Product Is Right for Your Alexa Skill?</a></li> <li style="text-align:justify"><a href="" target="_blank">Alexa Skills Kit SDK for Java</a></li> </ul> /blogs/alexa/post/af4b0637-c473-4768-bdf5-cc2b56eec0d1/now-available-test-multi-turn-conversations-beta-using-the-ask-cli-and-smapi Now Available: Test Multi-Turn Conversations Using the ASK CLI and SMAPI BJ Haberkorn 2019-04-23T18:00:00+00:00 2019-04-24T15:44:15+00:00 <p><img src="" /></p> <p>Now you can now test multi-turn conversations using the new dialog command in the ASK CLI and the updated simulation API (beta) in SMAPI.</p> <p>&nbsp;</p> <p><img src="" /></p> <p><em><strong>Editor's note:</strong>&nbsp;We updated this blog on April 23, 2019, to reflect the General Availability (GA) of Multi-Turn Conversations in the ASK CLI and SMAPI.</em></p> <p>We’re excited to announce that Alexa developers worldwide can now test multi-turn conversations using the new dialog command in the Alexa Skills Kit (ASK) Command Line Interface (CLI) and the updated simulation API in the Skill Management API (SMAPI). Previously, you could only simulate multi-turn conversations using the Alexa Developer Console. If you use the CLI, SMAPI, or the ASK plugin for Visual Studio Code for skill development, you can use this new testing capability to improve your skill conversations and deliver a better experience to your customers.</p> <h2>Simulate a Conversation on the Command Line</h2> <p>Using the new dialog command in the CLI, you can simulate a conversation with your skill on the command line. As shown below, the simulation supports multi-turn conversations. You can test multiple paths through your skill, and vary your responses to confirm how your skill will respond.</p> <p><img alt="" src="" style="display:block; margin-left:auto; margin-right:auto" /></p> <h2>Capture Utterances and Skill Responses for Debugging and to Streamline Future Testing</h2> <p>The output option of the dialog command captures all utterances, skill responses, and the associated JSON elements to a file. You can use the information in these output files to help you debug your backend service logic. At any time during a session, you can use the record command to capture your text utterances since the start of the session or your last recording point to a file. You can use the recorded files to automate the execution of frequently used tests. Simply record the sets of utterances needed to test your skill, and play these conversations back in the future using the replay function.</p> <h2>Use the Development Option That Works Best for You</h2> <p>The <a href="">dialog command in the CLI</a> takes advantage of updates to the <a href="">simulation API</a> in SMAPI, and is available in the <a href="">ASK Toolkit for Visual Studio Code</a>. Now, you can test multi-turn conversations regardless of which development tool you prefer.</p> <p>&nbsp;</p> /blogs/alexa/post/75ee61df-8365-44bb-b28f-e708000891ad/how-to-use-interceptors-to-simplify-handler-code-and-cache-product-and-purchase-information-in-monetized-alexa-skills How to Use Interceptors to Simplify Handler Code and Cache Product and Purchase Information in Monetized Alexa Skills Jennifer King 2019-04-23T14:00:00+00:00 2019-04-23T14:00:00+00:00 <p>In this blog, I’ll use an interceptor to fetch and cache details about in-skill products for a monetized Alexa skill.</p> <p>We’ve all been in situations where we need to update code that we haven’t touched in a long time. Or we have to troubleshoot code that someone else wrote. Simpler and more streamlined code is typically much easier to maintain, even more so when the same code is not repeated in multiple places. The interceptors available in the <a href="">Alexa Skills Kit (ASK) Software Development Kit (SDK)</a> are a great tool for simplifying and streamlining your skill code.</p> <p>The ASK SDK includes both <a href="">request and response interceptors</a>. These interceptors are executed every time your skill code is invoked. That makes them ideal for completing tasks that you want to be performed every time your skill is invoked. Request interceptors are executed prior to the main handler being selected and executed, while response interceptors are executed after the selected handler completes its work. You can see this flow in this architecture diagram. Every request interceptor is executed in order, as are the request interceptors, while only one of the handlers is executed for each invocation.&nbsp;</p> <p><img alt="" src="" /></p> <p>A common task performed in a request interceptor is logging the request payload. By doing it there, you don’t need to repeat that code in every handler. A request inceptor is also a good place to load persistent attributes. A good use of a response interceptor would be to save the persistent attributes.</p> <p>In this blog, I’ll use an interceptor to fetch and cache details about in-skill products for a <a href="">monetized Alexa skill</a>. This will include not only the product details, but also if the customer is eligible to purchase the product, and if the customer has already purchased them. This way, no matter what intent is invoked, my code will have access to this data and I don’t have to worry about including that code in every possible path. For this blog, I am using Python for the sample code; however, interceptors and the techniques used also apply to skills that use the ASK SDK for Node.js and the ASK SDK for Java.</p> <h2>Planning When to Call the Alexa Monetization Service</h2> <p>More and more skills are including <a href="">in-skill purchasing</a> (ISP) in the functionality they offer. As with any application, as you add more functionality, you must weigh various tradeoffs relating to user experience, performance, resources and more. One of those decisions is how to manage calling the Alexa Monetization Service (AMS) for details about your product catalog and your customers’ purchase statuses.</p> <p>One approach is to call the AMS every time you need either product or purchase information. This will be fine if you only need to call the service rarely, say once per session. Any more than that, since the information is unlikely to change during a session, that approach will result in more calls than necessary. This will introduce additional latency, albeit only a small amount for each call. This additional latency will have two impacts: first and foremost, the customer will wait longer for Alexa’s response. Again, this is likely to be minimal, but even small amounts of latency can add up. The other impact is to the AWS Lambda function’s run duration.</p> <h2>How Latency Relates to AWS Lambda Function Billing</h2> <p>You can build and host most skills for free with <a href="">AWS Lambda</a>, which is free for the first one million calls per month through the <a href="">AWS Free Tier</a>. You can also apply for <a href="">AWS promotional credits</a> if you incur AWS charges related to your skill. Regardless of what charges are incurred, the compute time portion of the <a href="">Lambda function pricing</a> is based on GB-seconds (also referred to as Duration). Therefore, the more latency the function experiences (i.e. the longer the function runs), the more billable duration is incurred. Even though you might not incur charges in your skill, reducing your function run time will lead to smaller Lambda billable duration.</p> <h2>Why Caching Product and Purchase Data Works</h2> <p>For most skills, the details about both a skill’s product and that customer’s purchase status will not change over the course of any given skill session. As a result, a single call to the AMS will suffice to obtain the product details, what the customer has purchased, and what is available for the customer to purchase. This information can be stored in session attributes, and then it will be passed with each request without having to query AMS. When the session ends, the data will be discarded. When a new session starts, the process will start over with the call to AMS to get the data.</p> <h2>What Happens to the Session When a Purchase Is Made</h2> <p>As with any caching solution, it is important to invalidate data when it is no longer current/correct. This is true with this solution as well, and the point at which the data logically changes is when the customer makes a purchase. From the customer point of view, the in-skill product is part of the same skill session that precedes and follows the purchase flow. These are in fact, different sessions. You may recall that you need to save any pertinent context information as persistent attributes prior to sending the Connections.SendRequest directive, and the reconstitute the data/attributes after control is returned to your skill. When the Connections.Response event is received by your skill, this starts a new session, and this is represented in the request payload.</p> <h2>When Should Caching Occur</h2> <p>Ideally the caching should occur once during the session, and the best time is at the start of the session so that the data is available for every request in the session. Your skill code should examine the request payload and if it finds the new session flag, then it’s a new session and it should fetch the data from AMS.</p> <p>Rather than attempt to identify all the possible Intent Handlers that could be triggered when a session is initiated, the approach described in this blog uses a RequestInteceptor to watch for a new session. When a request initiates a new session, the interceptor calls AMS and caches the data in a session attribute. Unlike the handle function of handler which is only called if the canHandle returns true, the process method of every RequestInterceptor is called for every request.</p> <p>This approach also works for the cases where a customer buys a product. When returning from the purchase flow (initiated by the Connections.SendRequest directive and results in a Connections.Response event being sent to your skill), the Connections.Response event request payload has the attrbitue set to true. Since a customer might have purchased a new product, this is definitely an appropriate time to call AMS and get fresh data.</p> <h2>How to Create the Interceptor</h2> <p>Before we get to adding the main interceptor to our python code, we need to setup a few things. First, we need to import the DefaultSerializer from the ASK SDK for Python. This will serve a key role in interceptor logic.</p> <pre> <code class="language-python">from ask_sdk_core.serialize import DefaultSerializer</code></pre> <p>In addition, if you haven’t already, you’ll need to add the following imports as well:</p> <pre> <code class="language-python">from ask_sdk_core.dispatch_components import AbstractRequestInterceptor from import EntitledState</code></pre> <p>Next, we’ll create a few helper functions. The first helper function checks the request for the new session flag.</p> <pre> <code class="language-python">def is_new_session(request_envelope): &quot;&quot;&quot;Checks to see if the request is the first of a session&quot;&quot;&quot; # type: (RequestEnvelope) -&gt; bool if == True: return True return False</code></pre> <p>The next one filters the list of products to only include the products the customer has already purchased or is ‘entitled’ to. Depending on your use case, you might leave the list unfiltered or break the list up into separate lists based on purchase status or product type.</p> <pre> <code class="language-python">def get_all_entitled_products(in_skill_product_list): &quot;&quot;&quot;Get list of in-skill products in ENTITLED state.&quot;&quot;&quot; # type: (List[InSkillProduct]) -&gt; List[InSkillProduct] entitled_product_list = [ l for l in in_skill_product_list if ( l.entitled == EntitledState.ENTITLED)] return entitled_product_list</code></pre> <p>Now that we have those helper functions in place, let’s add the main interceptor. We’ll name it “load_isp_data_interceptor”. (I like to end my interceptors with “interceptor” so I can quickly distinguish them; you can use whatever naming convention you like.)</p> <h2>Main Interceptor Code</h2> <p>Unlike Intent Handlers, Request Inteceptors have only one method “process”, since the interceptor processes every request. Here is the load_isp_data_interceptor class:</p> <pre> <code class="language-python">class load_isp_data_interceptor(AbstractRequestInterceptor): &quot;&quot;&quot;queries monetization service&quot;&quot;&quot; def process(self, handler_input): # type: (HandlerInput) -&gt; None print(&quot;Starting Entitled Product Check&quot;)</code></pre> <p>Here is where the helper function is called to check if the session is new or not.</p> <pre> <code class="language-python"> if is_new_session(handler_input.request_envelope): # new session, check to see what products are already owned. try:;new session, so see what is entitled&quot;)</code></pre> <p>This code looks up the locale in the request. ISP is currently only available in the en-US locale, but looking it up will help to future proof your code.</p> <pre> <code class="language-python">locale = handler_input.request_envelope.request.locale</code></pre> <p>Now the interceptor creates a client for the AMS and the gets all the skill product details for the given locale. This call gets the product details and the purchase status of each product.</p> <pre> <code class="language-python"> ms = handler_input.service_client_factory.get_monetization_service() result = ms.get_in_skill_products(locale)</code></pre> <p>After we have the results, we filter the result set to only include the products which the customer has purchased. As noted earlier, your use case may have different needs, so be sure to adjust this as needed.</p> <pre> <code class="language-python"> entitled_products = get_all_entitled_products(result.in_skill_products)</code></pre> <p>If there are any products left after the filtering is complete, then we will store the result set in a session attribute named “entitledProducts.”</p> <pre> <code class="language-python"> if entitled_products: session_attributes = handler_input.attributes_manager.session_attributes session_attributes[&quot;entitledProducts&quot;] = entitled_products except Exception as error:;Error calling InSkillProducts API: {}&quot;.format(error)) raise error else:;not a new session, deserialize if needed&quot;)</code></pre> <p>If the session is not new, there is a little bit of work we need to do so that we can write code that interacts with the entitledProducts session attribute in a consistent manner. Initially when we save the entitled product list, it is stored as InSkillProduct objects. When it is serialized / deserialized during subsequent requests, the data is deserialized as standard python objects, which means we can’t interact with them in the same way. If there is at least one entitled product, we’ll fix this by using the DefaultSerializer to deserialize the list as the original InSkillProduct object type.</p> <pre> <code class="language-python"> session_attributes = handler_input.attributes_manager.session_attributes entitled_products = session_attributes.get(&quot;entitledProducts&quot;, None) if entitled_products: d = DefaultSerializer() entitled_products = json.dumps(entitled_products) session_attributes[&quot;entitledProducts&quot;] = d.deserialize( entitled_products, 'list[]')</code></pre> <p>The last step to enabling this interceptor is to add it to the SkillBuilder object. In the below code, “sb” is a StandardSkillBuilder, so if you named your SkillBuilder object differently, adjust accordingly. Check out the <a href="">Skill Builder Objects – To Customize or Not To Customize</a> blog for more information on the difference between standard and custom Skill Builder objects.</p> <pre> <code class="language-python">sb.add_global_request_interceptor(load_isp_data_interceptor())</code></pre> <p>That’s it! You now have an interceptor in your python skill that caches ISP data. If you’d rather use Node.js to do the same thing, keep reading. The Node.js code is at the end of this blog.</p> <p>We’re excited to see what you build with ISP!&nbsp; Tweet me <a href="">@franklinlobb</a> and I’d be happy to check it out!</p> <h2>Resources and Related Content</h2> <ul> <li><a href="">Make Money with Alexa Skills: An Introduction</a></li> <li><a href="">Add In-Skill Purchasing Directly from the Alexa Developer Console</a></li> <li><a href="">Which Type of In-Skill Product Is Right for Your Alexa Skill?</a></li> <li><a href="">Alexa Developer Forums</a></li> <li><a href="">Premium Facts Sample Skill</a> (Python)</li> <li><a href="">Name the Show Sample Skill</a></li> </ul> <h2>Node.js Version</h2> <p>Helper Function:</p> <pre> <code class="language-python">function getAllEntitledProducts(inSkillProductList) { const entitledProductList = inSkillProductList.filter(record =&gt; record.entitled === 'ENTITLED'); console.log(`Currently entitled products: ${JSON.stringify(entitledProductList)}`); return entitledProductList; }</code></pre> <p>Interceptor Code:</p> <pre> <code class="language-python">const loadISPDataInterceptor = { async process(handlerInput) { if ( === true) { // new session, check to see what products are already owned. try { const locale = handlerInput.requestEnvelope.request.locale; const ms = handlerInput.serviceClientFactory.getMonetizationServiceClient(); const result = await ms.getInSkillProducts(locale); const entitledProducts = getAllEntitledProducts(result.inSkillProducts); const sessionAttributes = handlerInput.attributesManager.getSessionAttributes(); sessionAttributes.entitledProducts = entitledProducts; handlerInput.attributesManager.setSessionAttributes(sessionAttributes); } catch (error) { console.log(`Error calling InSkillProducts API: ${error}`); } } }, };</code></pre> <p>Add Interceptor to Skill Builder:</p> <pre> <code class="language-python">.addRequestInteceptor(loadISPDataInterceptor)</code></pre> /blogs/alexa/post/94539149-2d4f-4033-9a7d-7d77ae87b1f3/using-wake-word-acoustics-to-filter-out-background-speech-improves-speech-recognition-by-15 Using Wake Word Acoustics to Filter Out Background Speech Improves Speech Recognition by 15% Larry Hardesty 2019-04-22T13:00:00+00:00 2019-04-22T13:09:59+00:00 <p>The wake word provides an acoustic profile that can be used to identify&nbsp;utterances from the same speaker.</p> <p>One of the ways that we’re always trying to improve Alexa’s performance is by teaching her to <a href="" target="_blank">ignore</a> speech that <a href="" target="_blank">isn’t intended</a> for her.&nbsp;</p> <p>At this year’s International Conference on Acoustics, Speech, and Signal Processing, my colleagues and I will present a <a href="" target="_blank">new technique</a> for doing this, which could complement the techniques that Alexa already uses.</p> <p>We assume that the speaker who activates an Alexa-enabled device by uttering its “wake word” — usually “Alexa” — is the one Alexa should be listening to. Essentially, our technique takes an acoustic snapshot of the wake word and compares subsequent speech to it. Speech whose acoustics match those of the wake word is judged to be intended for Alexa, and all other speech is treated as background noise.</p> <p>Rather than training a separate neural network to make this discrimination, we integrate our wake-word-matching mechanism into a standard automatic-speech-recognition system. We then train the system as a whole to recognize only the speech of the wake word utterer. In tests, this approach reduced speech recognition errors by 15%.</p> <p>We implemented our technique using two different neural-network architectures. Both were variations of a sequence-to-sequence encoder-decoder network with an attention mechanism. A sequence-to-sequence network is one that processes an input sequence — here, a series of “frames”, or millisecond-scale snapshots of an audio signal — in order and produces a corresponding output sequence — here, phonetic renderings of speech sounds.</p> <p>In an encoder-decoder network, the encoder summarizes the input as a vector — a sequence of numbers — of fixed length. Typically, the vector is more compact than the original input. The decoder then converts the vector into an output. The entire network is trained together, so that the encoder learns to produce summary vectors well suited to the decoder’s task.</p> <p>Finally, the attention mechanism tells the decoder which elements of the encoder’s summary vector to focus on when producing an output. In a sequence-to-sequence model, the attention mechanism’s decision is typically based on the current states of both the encoder and decoder networks.<br /> <img alt="Seq2Seq_encoder-decoder_with_attention.png" src="" style="display:block; height:262px; margin-left:auto; margin-right:auto; width:500px" /></p> <p style="text-align:center"><sup><em>Our baseline sequence-to-sequence encoder-decoder model with attention, which we modified to<br /> emphasize speech inputs with the same acoustic features as the “wake word” that activates Alexa</em></sup></p> <p>Our first modification to this baseline network was simply to add an input to the attention mechanism. In addition to receiving information about the current states of the encoder and decoder networks, our modified attention mechanism also receives the raw frame data corresponding to the wake word. During training, the attention mechanism automatically learns which acoustic characteristics of the wake word to look for in subsequent speech.</p> <p>In another experiment, we trained the network more explicitly to emphasize input speech whose acoustic profile matches that of the wake word. First, we added a mechanism that directly compares the wake word acoustics with those of subsequent speech. Then we used the result of that comparison as an input to a mechanism that learns to suppress — or “mask” — some elements of the encoder’s summary vector before they even pass to the attention mechanism. Otherwise, the attention mechanism is the same as in the baseline model.</p> <p>We expected the masking approach to outperform the less explicitly supervised attention mechanism, but in fact it fared slightly worse, reducing the error rate of the baseline model by only 13%, rather than 15%. We suspect that this is because the decision to mask encoder outputs is based solely on the state of the encoder network, whereas the modified attention mechanism factored in the state of the decoder network, too. In future work, we plan to explore a masking mechanism that also considers the decoder state.</p> <p><em>Xing Fan is a senior applied scientist in the Alexa AI group.</em></p> <p><a href="" target="_blank"><strong>Paper</strong></a>: “End-to-End Anchored Speech Recognition”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Yiming Wang, I-Fan Chen, Yuzong Liu, Tongfei Chen, Bj&ouml;rn Hoffmeister</p> <p><strong>Related</strong>:</p> <ul> <li><a href="">Signal Processor Improves Echo’s Bass Response, Loudness, and Speech Recognition Accuracy</a></li> <li><a href="">Joint Training on Speech Signal Isolation and Speech Recognition Improves Performance</a></li> <li><a href="" target="_blank">Machine-Labeled Data + Artificial Noise = Better Speech Recognition</a></li> <li><a href="" target="_blank">Alexa, Do I Need to Use Your Wake Word? How About Now?</a></li> <li><a href="" target="_blank">Amazon at ICASSP</a></li> </ul> /blogs/alexa/post/42a69522-ea56-4ad3-ae8b-8bfa24258491/4-tips-for-implementing-device-discovery-in-your-smart-home-skills 4 Tips for Implementing Device Discovery in Your Smart Home Skills Ben Porter 2019-04-19T15:56:41+00:00 2019-04-19T21:09:23+00:00 <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>Building an Alexa smart home skill enables you to link Alexa with your existing smart home device cloud. Follow these tips to resolve any device discovery errors you may encounter.</p> <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>Building an Alexa smart home skill enables you to link Alexa with your existing smart home device cloud. This gives your customers the ability to control and query their compatible devices through Alexa. When creating a smart home skill, one of the first steps is to implement the <a href="">Alexa.Discovery interface.</a>The Device Discovery process allows you to send Alexa a list of your customer’s devices and capabilities, so that they can interact with them through voice or the Alexa app.</p> <p>When testing your implementation of the Alexa.Discovery interface, you first need to check that you've enabled the skill on your Alexa account and successfully performed account linking. The next step is to run device discovery from your Alexa account. If no endpoints are found from your skill, then you may have encountered a device discovery error. Here are four common culprits you can look into to resolve these errors.</p> <h2>1. Consider the Location of Your Customers, Your AWS Lambda Functions, and Supported Languages</h2> <p>In order for your customers to successfully discover devices in their Alexa accounts, the following three regional factors will need to align:</p> <h3>Languages Available on the Skill</h3> <p>When creating a smart home skill, like other skill types, you first choose the default language. <a href="">Additional languages can be added to the skill</a>, later.</p> <h3>Regions the Skill’s AWS Lambda Functions Are Deployed To</h3> <p>For each language that you add to the skill, you will need to ensure that you have an AWS Lambda Function deployed to the corresponding region. For a full list of these mappings, please see the table in the following tech doc: <a href="">Deploy Your Lambda Function to Multiple Regions.</a></p> <h3>Region of the Customer’s Amazon Account</h3> <p>Amazon accounts are associated with a particular region that determines the skills that are available within the Alexa Skills Store. For example, if an Amazon account is of the US region, then the skills available from the Alexa Skills Store for that account will be those with an English (US) locale.</p> <h3>Avoiding Availability Issues When Testing</h3> <p>When testing the development version of your smart home skill, you will need to take note of the region of the Amazon account that you are testing with, as this can determine whether or not device discovery will be successful. <em>Please note</em>: When developing a skill, the development version will be available for testing on your corresponding Amazon account, regardless of the languages of the model.</p> <p>If your Amazon account’s region is different from what your skill’s current configuration supports, then the Device Discovery request will not make it to your skill’s Lambda function. In this case, the skill’s Lambda function would not have received the discovery request, thus no corresponding request log will be found in Amazon CloudWatch. This will not be an issue once the skill is live in the Alexa Skills Store, as it will only be available to Amazon accounts that are set to the corresponding region.</p> <p><em><strong>CloudWatch Logs Tip</strong>: Availability issues = no discover request logged</em></p> <h2>2. Check the Latency of Your Skill’s Lambda Function, and Related Async Calls</h2> <p>Once your skill’s Lambda function has received a device discovery request, the skill has a maximum of 8 seconds with which to return a response. However, we recommended that responses are sent as quickly as possible.</p> <p>When first creating a Lambda function, the default timeout is set to 3 seconds. Updating this timeout to 7 seconds will help avoid the potential for the lambda function to time out before returning the final Discover.Response within Alexa's timeout window. If you are planning on building a smart home camera skill, then please note that they have a timeout window of 6 seconds. Please see the <a href="">Build Smart Home Camera Skills</a> documentation for additional information. For changing the default timeout of a Lambda function, please take a look at <a href="">Basic AWS Lambda Function Configuration</a>.</p> <p><em><strong>CloudWatch Logs Tip:</strong> Lambda Timeout Issues = discover request logged, 'task timed out' Message Logged</em></p> <p>Another common culprit of timeout issues are any asynchronous calls being made within your skill code. Is your device cloud taking too long to return a response to your skill's Lambda function? Logging any asynchronous requests and responses made within your skill’s Lambda function is very important in helping to pinpoint the source of the latency.</p> <h2>3. Ensure That Your Skill Is Not Returning Malformed JSON</h2> <p>Another common issue regarding device discovery failures is malformed Discover.Response JSON. The first step in troubleshooting this is to copy the complete JSON response that your skill returned from your Amazon CloudWatch logs and check it with a JSON validator to ensure that it is valid.</p> <p>If valid JSON was returned, the next step would be to compare it with a known working sample. A general Discover.Response sample that includes various types of endpoints can be found in the Alexa smart home Github, <a href="" target="_blank">here</a>.</p> <p>For more specific examples, please see the device templates in our <a href="">Get Started with Device Templates</a> documentation.</p> <p><em><strong>CloudWatch Logs Tip:</strong> Malformed JSON Response = discover request and return response both logged.</em></p> <h2>4. Check Access Token Refresh Requests From Your Authorization Server</h2> <p>If you've successfully completed device discovery, and then notice that later discover requests are failing, the culprit could be that the Alexa service could not successfully refresh and retrieve a valid access token from the skill's authorization service. You can check the logs of your authorization service to ensure that refresh requests are successfully completing.</p> <p><em><strong>CloudWatch Logs Tip:</strong> Apparent when first discovery flow after account linking completes successfully, and subsequent device discovery requests are not logged.</em></p> <h2>Conclusion</h2> <p>Come join the discussion in the <a href="" target="_blank">Alexa Smart Home Skill API Forum</a>. New to the forum and smart home skill building? Check out our <a href="" target="_blank">welcome post</a>, which has helpful links to get you started. You can also find me over on Twitter <a href="" target="_blank">@roycodes</a>.</p> <h2>Related Content</h2> <ul> <li><a href="">Help Customers to Seamlessly Update and Maintain Their Smart Devices with Proactive Discovery and Endpoint Management</a></li> <li><a href="">Resource Roundup: Top Alexa Tips and Tutorials for Smart Home Skill Builders</a></li> <li><a href="">Beyond the Basics: Best Practices for Adding Account Linking to Your Alexa Skills</a></li> </ul> /blogs/alexa/post/c4396c3c-ecf6-45c8-a3f7-e6e026b240af/musicplode-media-uses-in-skill-purchasing-to-turn-its-beat-the-intro-voice-game-into-a-hit-for-alexa-customers Musicplode Media Uses In-Skill Purchasing to Turn Its “Beat the Intro” Voice Game into a Hit for Alexa Customers Jennifer King 2019-04-18T14:00:00+00:00 2019-04-18T14:00:00+00:00 <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>Learn how the Beat the Intro game skill is helping media and entertainment marketing company Musicplode reach more customers, build brand awareness, and generate revenue with in-skill purchasing (ISP).</p> <p><img alt="" src="" /></p> <p>According to founder and CEO Kevin Deakin, <a href="" target="_blank">Musicplode Media Ltd</a> is a media and entertainment marketing company—not a game maker. But the creator of the popular UK radio music quiz show, Beat the Intro, saw voice-first gaming as a new frontier and an opportunity to further grow his company. To Deakin and managing director Dave Brown, a streaming music quiz game seemed a natural fit for Alexa. Today, the <a href="" target="_blank">Beat the Intro</a> game skill is helping Musicplode reach more customers, build brand awareness, and generate revenue with <a href="">in-skill purchasing (ISP)</a>.</p> <p>“Many people use Alexa devices for streaming music, so we felt voice was the perfect medium for a music-based game like Beat the Intro,” says Brown. “And with over 100 million Alexa-enabled devices out there, it gave us an opportunity to grow our brand, engage more customers, and monetize our efforts by offering premium content that customers love.&quot;</p> <p><a href="">Beat the Intro</a> tests Alexa customers’ music knowledge with a variety of free gameplay rounds. The skill also offers players a monthly subscription with unlimited play and a library of categorized music packs. Trying to guess the names and artists of the music tracks is so engaging that customers are eager to extend their playtime by signing up for the monthly subscription. Today, Brown and Deakin are seeing a 45% offer-to-purchase conversion rate and a 4-star rating in the Alexa Skills Store.</p> <p>“With Alexa and in-skill purchasing, we have the opportunity to engage audiences with our own brand and build a revenue stream at the same time,” says Brown.</p> <h2>Alexa Provides an Opportunity to Breathe New Life into a Popular Music Game</h2> <p>The original idea for Beat the Intro came from a marketing campaign Deakin created for a London radio broadcaster. The radio DJs would play snippets of songs and invite listeners to guess the song title and artist. Deakin eventually developed the music quiz into a DVD-based game and was planning to create a mobile version in 2015 when he first discovered Alexa. Given the potential, he immediately pivoted Musicplode’s game development strategy to focus on voice games.</p> <p>“I rushed off to the Amazon store in Seattle and bought an Echo to bring back to London,” says Brown. “We felt from the outset Alexa was such a natural medium for games like Beat the Intro, so we immediately started scoping out how we could use Alexa—not mobile devices—to bring our game to life.”</p> <p>The Beat the Intro game skill starts off by presenting the Daily Challenge, which provides three new song tracks for players to guess every day. After that, customers can play another round of seven musical questions each per day. Players can choose from three play modes: solo (against a simulated challenger), team mode for two groups of participants, or multiple player mode which allows up to four players with Echo Buttons to battle it out.</p> <p>Beside its engaging game design with fun music content, Musicplode uses <a href="">SSML</a> to delight customers by using a blend of human voices (like Moozzo, the game’s wacky host) and a variety of unusual sound effects in addition to Alexa, who is the voice of the announcer and scorekeeper.</p> <p>“To engage customers for long periods of time, we used radio production-quality techniques throughout the skill, like human voices, layering voiceovers on the music tracks, and clever audio effects like a ‘sting’ at the end of a track,” says Brown. “Techniques like that make Beat the Intro stand out from other skills.”</p> <h2>Offering Premium Content Ups Gameplay Even Further</h2> <p>Given its popularity, Beat the Intro quickly started earning money through <a href="">Alexa Developer Rewards</a>, a program that pays developers for eligible skills with some of the highest customer engagement. But the Musicplode team wanted to deepen engagement by giving customers more of what they love, not to mention creating a more predictable revenue stream for the game. For this, they turned to monetizing their skill using ISP.</p> <p>“Getting rewards checks in the post every month was nice, but these weren't at a level that would sustain a business like ours,” says Brown. “With in-skill purchasing, we knew we’d found a perfect way to monetize our game, while giving players a richer experience.”</p> <p>When Beat the Intro launched its premium content in August 2018, it offered players the features they most requested: to extend their gameplay and customize the game experience with music categories of their choice. To show customers the value of the premium content, Beat the Intro offers a free 14-day trial that customers can cancel any time. If they opt to purchase Beat the Intro Unlimited—a subscription for $2.99 a month and a discount for Amazon Prime members—customers can play an unlimited number of game rounds and select music from a particular decade or genre.</p> <p>With an impressive 45% conversion rate and a 4.0-star rating in the Alexa Skills Store—not to mention being named one of Amazon’s top skills of 2018—Musicplode has a winning skill to engage customers and build its brand.</p> <p>With in-skill purchasing, Beat the Intro is building a base of engaged customers who are more than willing to purchase premium content to extend their gameplay. Players describe the skill as a great game for those who love music with “lots of genres and songs.” Some have left reviews describing how the skill is “challenging and fun” and like the experience of seeing where they rank against other players.</p> <p>“We're constantly updating our content by adding new tracks and genres on a regular basis, as well as special packs for major sporting events and awards,” says Brown. “This allows us to keep the game fresh, which helps us delight and retain our customers.”</p> <h2>Voice-First Games with In-Skill Purchasing Offers a Ripe Opportunity for Business</h2> <p>Voice is the next frontier in gaming, evidenced by the popularity of hit game skills like Beat the Intro. With the opportunities for developers to create engaging voice-first games for Alexa and make money with in-skill purchasing, Brown says Musicplode plans to continue investing in voice and building their brand presence though Alexa.</p> <p>“The combination of Alexa, voice-first games, and in-skill purchasing gives companies like ours the ability to create engaging voice-first games, reach more customers than ever, and build a sustainable revenue stream for their voice business,” says Brown.</p> <h2>Related Content</h2> <ul> <li><a href="">Sell Premium Content to Enrich Your Skill Experience</a></li> <li><a href="">Hypno Therapist Alexa Skill Uses In-Skill Purchasing to Reach More Customers and Build a Business</a></li> <li><a href="">In-Skill Purchasing Takes Volley’s Thriving Voice Business to the Next Level</a></li> <li><a href="">With In-Skill Purchasing, Gal Shenar Sets His Growing Voice Business Up for Long-Term Success</a></li> <li><a href="">Alexa Game Skill “Would You Rather for Family” Adds In-Skill Purchasing and Sees Revenue Growth</a></li> <li><a href="">The Vortex, an Alexa Game Skill from Doppio, Delivers a Double Shot of Customer Engagement with In-Skill Purchasing</a></li> </ul> <h2>Make Money by Creating Engaging Voice Games Customers Love</h2> <p>With ISP, you can sell premium content to enrich your Alexa skill experience. ISP supports one-time purchases for entitlements that unlock access to features or content in your skill, subscriptions that offer access to premium features or content for a period of time, and consumables which can be purchased and depleted. You define your premium offering and price, and we handle the voice-first purchasing flow. <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=wb_acquisition&amp;sc_publisher=ASK&amp;sc_content=Content&amp;sc_detail=vod-webinar&amp;sc_funnel=Convert&amp;sc_country=WW&amp;sc_medium=Owned_WB_wb_acquisition_ASK_Content_vod-webinar_Convert_WW_visitors_makemoney-page_CTA-graphic&amp;sc_segment=visitors&amp;sc_place=makemoney-page&amp;sc_trackingcode=CTA-graphic" target="_blank">Download our introductory guide</a> to learn more.</p> /blogs/alexa/post/9436a0fd-34d1-4121-8479-074e6a8c7c0f/two-new-papers-discuss-how-alexa-recognizes-sounds Two New Papers Discuss How Alexa Recognizes Sounds Larry Hardesty 2019-04-18T13:08:39+00:00 2019-04-18T13:08:39+00:00 <p>Alexa scientists use semi-supervised learning and &quot;pyramidal&quot; neural networks to address the problems of sound identification and media detection.</p> <p>Last year, Amazon announced the beta release of Alexa Guard, a new service that lets customers who are leaving the house instruct their Echo devices to listen for glass breaking or smoke and carbon dioxide alarms going off.</p> <p>At this year’s International Conference on Acoustics, Speech, and Signal Processing, our team is presenting several papers on sound detection. I wrote about <a href="" target="_blank">one of them</a> a few weeks ago, a new method for doing machine learning with unbalanced data sets.</p> <p>Today I’ll briefly discuss two others, both of which, like the first, describe machine learning systems. <a href="" target="_blank">One paper</a> addresses the problem of media detection, or recognizing when the speech captured by a digital-assistant device comes from a TV or radio rather than a human speaker. In particular, we develop a way to better characterize media audio by examining longer-duration audio streams versus merely classifying short audio snippets. Media detection helps filter a particularly deceptive type of background noise out of speech signals.&nbsp;</p> <p>For our <a href="" target="_blank">other paper</a>, we used semi-supervised learning to train a system developed from an external dataset to do audio event detection. Semi-supervised learning uses small sets of annotated training data to leverage larger sets of unannotated data. In particular, we use tri-training, in which three different models are trained to perform the same task, but on slightly different data sets. Pooling their outputs corrects a common problem in semi-supervised training, in which a model’s errors end up being amplified.</p> <p>Our media detection system is based on the observation that the audio characteristics we would most like to identify are those common to all instances of media sound, regardless of content. Our network design is an attempt to abstract away from the properties of particular training examples.</p> <p>Like many machine learning models in the field of spoken-language understanding, ours uses recurrent neural networks (RNNs). An RNN processes sequenced inputs in order, and each output factors in the inputs and outputs that preceded it.&nbsp;</p> <p>We use a convolutional neural network (CNN) as feature extractor, and stack RNN layers on top of it. But each RNN layer has only a fraction as many nodes as the one beneath it. That is, only every third or fourth output from the first RNN provides an input to the second, and only every third or fourth output of the second RNN provides an input to the third.</p> <p><img alt="Pyramidal.jpg" src="" style="display:block; height:68px; margin-left:auto; margin-right:auto; width:550px" /></p> <p style="text-align:center">&nbsp;<em><sup>A standard stack of recurrent neural networks (left) and the “pyramidal” stack we use instead</sup></em></p> <p>Because the networks are recurrent, each output we pass contains information about the outputs we skip. But this “pyramidal” stacking encourages the model to ignore short-term variations in the input signal.</p> <p>For every five-second snippet of audio processed by our system, the pyramidal RNNs produce a single output vector, representing the probabilities that the snippet belongs to any of several different sound categories.</p> <p>But our system includes still another RNN, which tracks relationships between five-second snippets. We experimented with two different ways of integrating that higher-level RNN with the pyramidal RNNs. In the first, the output vector from the pyramidal RNN simply passes to the higher-level RNN, which makes the final determination about whether media sound is present.</p> <p>In the other, however, the higher-level RNN lies <em>between</em> the middle and top layers of the pyramidal RNN. It receives its input from the middle layer, and its output, along with that of the middle layer, passes to the top layer of the pyramidal RNN.</p> <p><img alt="contextual_2.jpg" src="" style="display:block; height:218px; margin-left:auto; margin-right:auto; width:550px" /></p> <p style="text-align:center"><em><sub>In the second of our two contextual models, a high-level RNN (red circles) receives inputs from one layer of a<br /> pyramidal RNN (groups of five blue circles), and its output passes to the next layer (groups of two blue circles).</sub></em><br /> &nbsp;</p> <p>This was our best-performing model. When compared to a model that used the pyramidal RNNs but no higher-level RNN, it offered a 24% reduction in equal error rate, which is the error rate that results when the system parameters are set so that the false-positive rate equals the false-negative rate.</p> <p>Our other ICASSP paper presents our semi-supervised approach to audio event detection (AED). One popular and simple semi-supervised learning technique is self-training, in which a machine learning model is trained on a small amount of labeled data and then itself labels a much larger set of unlabeled data. The machine-labeled data is then sorted according to confidence score — the system’s confidence that its labels are correct — and data falling in the right confidence window is used to fine-tune the model.</p> <p>The model, that is, is retrained on data that it has labeled itself. Remarkably, this approach tends to improve the model’s performance.</p> <p>But it also poses a risk. If the model makes a systematic error, and if it makes it with high confidence, then that error will feed back into the model during self-training, growing in magnitude.</p> <p>Tri-training is intended to mitigate this kind of self-reinforcement. In our experiments, we created three different training sets, each the size of the original — 39,000 examples — by randomly sampling data from the original. There was substantial overlap between the sets, but in each, some data items were oversampled, and some were undersampled.</p> <p>We trained neural networks on all three data sets and saved copies of them, which we might call initial models. Then we used each of those networks to label another 5.4 million examples. For each of the initial models, we used machine-labeled data to re-train it only if both of the other models agreed on the labels with high confidence. In all, we retained only 5,000 examples out of the more than five million in the unlabeled data set.</p> <p>Finally, we used six different models to classify the examples in our test set: the three initial models and the three retrained models. On samples of three sounds — dog sounds, baby cries, and gunshots — pooling the results of all six models led to reductions in equal-error rate (EER) of 16%, 26%, and 19%, respectively, over a standard self-trained model.</p> <p>Of course, using six different models to process the same input is impractical, so we also trained a seventh neural network to mimic the aggregate results of the first six. On the test set, that network was not quite as accurate as the six-network ensemble, but it was still a marked improvement over the standard self-trained model, reducing EER on the same three sample sets by 11%, 18%, and 6%, respectively.</p> <p><em>Ming Sun is a senior speech scientist in the Alexa Speech group.</em></p> <p><strong>Papers</strong>:<br /> “<a href="" target="_blank">Hierarchical Residual-Pyramidal Model for Large Context Based Media Presence Detection</a>”<br /> “<a href="" target="_blank">Semi-Supervised Acoustic Event Detection Based on Tri-Training</a>”</p> <p><a href=""><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Qingming Tang, <a href="" target="_blank">Chieh-Chi Kao</a>, <a href="" target="_blank">Viktor Rozgic</a>, Bowen Shi, Spyros Matsoukas, Chao Wang</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">How Alexa Learns</a></li> <li><a href="" target="_blank">Audio Watermarking Algorithm Is First to Solve &quot;Second-Screen Problem&quot; in Real Time</a></li> <li><a href="" target="_blank">To Correct Imbalances in Training Data, Don’t Oversample: Cluster</a></li> <li><a href="" target="_blank">Why Alexa Won't Wake Up When She Hears Her Name in Amazon's Super Bowl Ad</a></li> <li><a href="" target="_blank">Identifying Sounds in Audio Streams</a></li> <li><a href="" target="_blank">Amazon at ICASSP</a><br /> &nbsp;</li> </ul> /blogs/alexa/post/642879ef-aa5d-40bc-bca8-bc4e7da1ba05/use-dynamic-entities-to-create-personalized-voice-experiences1 動的エンティティを使用してパーソナライズされた音声エクスペリエンスを作る Chisato Hiroki 2019-04-18T07:21:49+00:00 2019-04-18T07:21:49+00:00 <p>私は毎朝出勤するときに、同じコーヒーショップで緑茶ラテを買います。バリスタは過去の経験から私の欲しいものがわかっているので、入店するとすぐに注文を作り始めてくれます。注文を変えて意地悪してみようかとふと思ったりしますが、せっかく作ってくれたおいしい抹茶ラテを台無しにしてほしくはありません。</p> <p>エクスペリエンスがシンプルにパーソナライズされているので、私はこの店にせっせと通い続けています。同じように、優れたAlexaスキルではパーソナライズされたユーザーエクスペリエンスを提供してくれます。ユーザーは繰り返しそのスキルを利用したいという気持ちになります。</p> <p>私は毎朝出勤するときに、同じコーヒーショップで緑茶ラテを買います。バリスタは過去の経験から私の欲しいものがわかっているので、入店するとすぐに注文を作り始めてくれます。注文を変えて意地悪してみようかとふと思ったりしますが、せっかく作ってくれたおいしい抹茶ラテを台無しにしてほしくはありません。</p> <p>&nbsp;</p> <p>エクスペリエンスがシンプルにパーソナライズされているので、私はこの店にせっせと通い続けています。同じように、優れたAlexaスキルはパーソナライズされたユーザーエクスペリエンスを提供してくれます。ユーザーは繰り返しそのスキルを利用したいという気持ちになります。</p> <p>&nbsp;</p> <p>新しい<a href="">動的エンティティ</a>機能を活用すると、対話モデルを編集しビルドし直して再認定をする必要はありません。実行時に対話モデルを調整してAlexaスキルのエクスペリエンスをパーソナライズすることができます。プログラムコードや、データベース、RESTful API呼び出しから取得したデータ構造を、既存のスロット値に実行時に適用することで、スキルが、ユーザーやコンテキスト(文脈)、会話の流れを処理することができます。スロット値や同義語をコンテキストに合わせることで、先ほどのコーヒーショップのように、ユーザーの好みと過去の対話に基づいてエクスペリエンスをパーソナライズすることができます。現在は<a href="">永続アトリビュート</a>を利用することでスキルセッション間のイベントをコーディングしてスキルに記憶させることができます。しかし、それには対話モデルの更新と、それに付随する認定の更新などの作業が必要です。動的エンティティの実装では、モデルの再ビルドや再認定は必要ありません。<a href="">AWS Lambda</a>コードでディレクティブが更新されるとすぐに、スキルのスロットのカスタマイズが開始されます。動的エンティティの追加は簡単です。</p> <p>&nbsp;</p> <h2>動的エンティティのしくみ</h2> <p>コーヒーショップの例をもう一度見てみましょう。スキルでドリンクを注文するには、スロットタイプが<strong>drinkType</strong>であるドリンクスロットが必要です。最初に、2つの値、コーヒーと紅茶でこのドリンクスロットを定義しました。次に、緑茶とウーロン茶を追加するよう商品を拡張しました。動的エンティティでは、モデルを手動で更新しても、認定の再申請は必要なく、実行中にこうしたことを行うことができます。</p> <p>スキルでは、<strong>Dynamic.UpdateDynamicEntities</strong>ディレクティブを返す必要があります。<strong>updateBehavior</strong>値をCLEARまたはREPLACEのいずれかに設定し、typesフィールドを使用して動的エンティティを設定することができます。Alexaサービスは、ディレクティブを返すときに<strong>drinkType</strong>の新しいスロット値と同義語を登録します。これはサイレントプロセスで、ユーザーがスキルと対話する以外に何かをする必要はありません。</p> <p>新しいスロット値と同義語が登録されると、動的エンティティに関連付けられたスロットを含むスキルへのリクエストには、<strong>drinkType</strong>の静的に定義された値と動的に定義された値の両方に基づいた解決済みの値が含まれるようになります。ユーザーの発した言葉が静的スロット値と動的スロット値のどちらに一致したのかは、スキルコードから確認することができます。ユーザーがスキルとの対話を終了すると、動的エンティティの有効期限が切れるため、次回にスキルが起動したときに動的エンティティを再登録する必要があります。</p> <p>それでは、応答とリクエストから見ていきましょう。</p> <p>&nbsp;</p> <h2>応答: 動的エンティティの登録</h2> <p><strong>drinkType</strong>の動的エンティティを登録すると、応答でこのディレクティブが返ります。</p> <pre> <code class="language-java">... 'directives': [ { 'type': 'Dialog.UpdateDynamicEntities', 'updateBehavior': 'REPLACE', 'types': [...] } ] ... </code></pre> <p>typeと<strong>updateBehavior</strong>はとてもわかりやすいものです。一方、typesフィールドはスロットタイプを表す複合オブジェクトの配列です。スキルには複数のスロットタイプを持たせることができます。複数のスロットタイプ値と同義語を設定できるため、typesは配列になっています。types配列内のスロットタイプオブジェクトを見てみましょう。</p> <pre> <code class="language-java">{ 'name': '&lt;slotType&gt;', 'values': [ { ... }, ... ] } </code></pre> <p>nameとvaluesという2つのフィールドがあります。nameフィールドは、更新したいスロットタイプの名前です。スロットタイプには複数の値を持たせることができるため、values配列で、スロット値と同義語を表す複合オブジェクトのリストを提供できます。values配列内のスロット値オブジェクトを詳しく見ていきましょう。</p> <pre> <code class="language-java">{ 'id': '&lt;slotValueId&gt;', 'name': {...} } </code></pre> <p>各スロット値オブジェクトにはnameが1つ含まれます。これはオブジェクトであり、任意のIDを定義できます。IDは任意ですが、スロット値を使ってデータベースやRESTfulウェブサービス(<a href="">Amazon DynamoDB</a>など)の内容を検索する予定であれば、IDを1つ定義するとよいでしょう。そうすると、コードにスロット値とデータベースID値を照合するディクショナリーを作成する必要がなくなります。</p> <p>最後に、nameフィールドは複合オブジェクトであり、値、同義語の配列が含まれます。</p> <pre> <code class="language-java">{ 'value': '&lt;slotValue&gt;', 'synonyms': [ '&lt;同義語A&gt;', '&lt;同義語B&gt;', ... ] } 以上のことを総合し、drinkTypeに緑茶とウーロン茶を追加する場合のディレクティブは次のようになります。 ... 'directives': [ { 'type': 'Dialog.UpdateDynamicEntities', 'updateBehavior': 'REPLACE', 'types': [ { 'name': 'drinkType', 'values': [ { 'id': 'grnTea', 'name': { 'value': '緑茶', 'synonyms': [ '抹茶', ] } }, { 'id': 'oolTea', 'name': { 'value': 'ウーロン茶', 'synonyms': [ '中国茶', '青茶', ] } } ] } ] } ] ... </code></pre> <h2>リクエスト:動的に定義されたスロット値の受け取り</h2> <p>動的エンティティが登録されると、スキルコードに送信されたリクエストには、静的に定義された解決と動的に定義された解決の両方が<strong>resolutionsPerAuthority</strong>という名前の配列に含まれます。注:同義語とスロット値のマッピングに<a href="">エンティティ解決</a>を使用したことがあれば、<strong>resolutionsPerAuthority</strong>配列には見覚えがあるでしょう。以前は、この配列の項目は1つだけでした。そのため、<strong>resolutionsPerAuthority[0]</strong>とハードコーディングしていました。しかし、静的エンティティと動的エンティティはこの配列に複数の項目があるため、ハードコーディングは適していません。</p> <p>注意していただきたいのは、 配列の順序はないということです。動的エンティティが先に来ることもあれば、静的エンティティが先のこともあります。<strong>resolutionsPerAuthority[0]</strong>配列では順序に頼ることができないため、配列内の各オブジェクトのエンティティタイプをチェックする必要があります。</p> <pre> <code class="language-java">静的:;skill_id&gt;.drinkType 動的:;skill_id&gt;.drinkType </code></pre> <p>動的の方には「dynamic」という単語が含まれている点を除き、両者はまったく一緒だということにお気付きになったでしょうか。 コードでは、<strong>resolutionsPerAuthority</strong>内のオブジェクトが動的かどうかを判断するためには、このオブジェクトに「<strong>.er-authority.echo-sdk.dynamic</strong>」が含まれているかどうかをチェックするとよいでしょう。 「dynamic」が含まれているかチェックするだけではなぜいけないのでしょうか。 スロットタイプに<strong>dynamicType</strong>という名前を付けたケースを考えてみましょう。このケースでは、静的情報ソースと動的情報ソースの両方の文字列の末尾に<strong>dynamicType</strong>が含まれます。「dynamic」が含まれているかどうかのチェックだけでは、静的、動的ともにtrueが返ってきてしまい、欲しい結果が得られません。</p> <p>ユーザーが抹茶を注文した場合を考えてみましょう。これは緑茶の同義語です。ちょうど<strong>drinkType</strong>を動的に更新したばかりです。 リクエストを掘り下げ、<strong>OrderIntent</strong>を見てみましょう。<strong>resolutionsPerAuthority</strong>には2つの情報ソースが含まれます。</p> <pre> <code class="language-java">... { &quot;intent&quot;: { &quot;name&quot;: &quot;OrderIntent&quot;, &quot;confirmationStatus&quot;: &quot;NONE&quot;, &quot;slots&quot;: { &quot;drink&quot;: { &quot;name&quot;: &quot;drink&quot;, &quot;value&quot;: &quot;抹茶&quot;, &quot;resolutions&quot;: { &quot;resolutionsPerAuthority&quot;: [ { &quot;authority&quot;: &quot;;skill_id&gt;.drinkType&quot;, &quot;status&quot;: { &quot;code&quot;: &quot;ER_SUCCESS_NO_MATCH&quot; } }, { &quot;authority&quot;: &quot;;skill_id&gt;.drinkType&quot;, &quot;status&quot;: { &quot;code&quot;: &quot;ER_SUCCESS_MATCH&quot; }, &quot;values&quot;: [ { &quot;value&quot;: { &quot;name&quot;: &quot;緑茶&quot;, &quot;id&quot;: &quot;grnTea&quot; } } ] } ] }, &quot;confirmationStatus&quot;: &quot;NONE&quot;, &quot;source&quot;: &quot;USER&quot; } } } } ... </code></pre> <p>最上位(<strong>intent.slots.drink</strong>)では、valueは「抹茶」です。これは、スキルがユーザーの注文をキャプチャしたことを意味します。また、<strong>resolutionsPerAuthority</strong>が2つあることも確認できます。1つは静的で、もう1つは動的です。静的に定義された<strong>drinkType</strong>には緑茶という値もその同義語の抹茶も含まれていません。そのため、ステータスコードは<strong>ER_SUCCES_NO_MATCH</strong>です。動的の方は<strong>ER_SUCCESS_MATCH</strong>であり、緑茶という値とIDである<strong>grnTea</strong>が含まれています。ドリンクメニューを動的に更新することができたようです。</p> <p>では、期間限定の緑茶フラッペのドリンクプロモーションをスキルで提供しましょう。ウェブサービスを通してスキルが呼び出すデータベースにドリンクのエントリを追加できました。スキルが開いたときに、<strong>drinkType</strong>エンティティとして動的に登録された緑茶フラッペを使って販売メニューを更新しました。プロモーション期間が終わると、このエントリをデータベースから削除できます。リピーターが緑茶フラッペを注文しようとしても、静的エンティティにも動的エンティティにも存在しないため、そのアイテムはもう提供されていないことがわかります。</p> <p>動的エンティティの使い方を把握しましたね。では制限事項について見ていきましょう。</p> <p>&nbsp;</p> <h2>制限事項</h2> <p>説明してきたように、動的エンティティは非常に強力ですが、認識しておくべき制限事項がいくつかあります。</p> <h3>&nbsp;</h3> <h3>1.エンティティの上限は100件</h3> <p>これには、各スロットタイプの値と同義語の組み合わせも含まれます。合計が100件を超えると、403エラーが返り、動的エンティティは登録されなくなりますが、静的エンティティは引き続き機能します。</p> <h3>&nbsp;</h3> <h3>2.追加ではない</h3> <p>動的エンティティは任意の応答から何度でも繰り返し更新できます。この時、以前に登録した動的エンティティは上書きされます。複数の応答を使って200件の動的エンティティを追加することはできませんが、スキルのコンテキストに基づいて100件のエンティティをロードしてユーザーの応答を処理し、別のコンテキストが入力されたら、新たな100件と入れ換えることができます。</p> <h3>&nbsp;</h3> <h3>3.ワンショットのサポートはなし</h3> <p>スキルがディレクティブを返すためには、まずリクエストを受け取る必要があります。つまり、動的エンティティはスキルが開かれるまで登録されません。ユーザーが「アレクサ、コーヒーショップを開いて」と言ってスキルを開くのが理想的です。すると、<strong>LaunchRequest</strong>がトリガーされるため、動的エンティティを登録できます。ユーザーが「アレクサ、コーヒーショップで緑茶が欲しい」と言ったとすると、動的エンティティが登録できるようになる前に<strong>OrderIntent</strong>がトリガーされます。この場合、ドリンクスロットは緑茶として解決されますが、スキルで使用できるのは静的エンティティのみです。ユーザーがプロモーションメニューからアイテムを注文した場合、その商品がまだ販売中かどうかを動的エンティティを使ってチェックすることができないため、手動でチェックする必要があります。</p> <p>動的エンティティで対話をパーソナライズし、スロットタイプをコンテキストに合わせて精度を向上させると、スキルがいっそう魅力的になります。</p> <p>&nbsp;</p> <h2>パーソナライズ</h2> <p>私がいつものコーヒーショップに入ったとき、バリスタも私も、私が飲み物が欲しいこと、その飲み物は緑茶ラテであることを知っています。バリスタは他の顧客についても同じように推測ができます。そのため注文のプロセスがパーソナライズされて効率的になり、さらにユーザーを引き付けられます。</p> <p>動的エンティティでは、各ユーザーの過去の注文履歴に基づいてスロット値を動的にマッピングすることで、これと同じことができます。動的エンティティを使用すると、スロット値をいつもの注文にマッピングできます。私の場合は緑茶ラテです。私はスキルと対話するときに、「いつもの」で注文することができます。ドリンクスロットに「いつもの」を入れて、緑茶ラテに解決することができます。</p> <p>&nbsp;</p> <h2>コンテキストに合わせる</h2> <p>スキルをコンテキストに合わせると、より精度が向上するため、いっそう魅力的になります。現在のスキルのコンテキストに合わせて動的エンティティを設定すると、スロットに細かくコンテキストを認識させることができます。たとえば、音声でバスの経路を検索できるスキルがあるとします。道路とバス停の名前は表記が難しくて読めず、正式な読み方で発話されないことがあります。<u><a href="">デバイスアドレスAPI</a></u>を使用すると、スキルからユーザーが所有するEchoデバイスの郵便番号を取得し、それを使用して特定の半径内のすべてのバス停と交差点を検索し、動的エンティティを使用してその道路の名前をスロットタイプに設定することができます。動的エンティティのセットははるかに小さいものの、ユーザーがスキルと対話している場所というコンテキストと関連性が高くなり、スキルが道路名を解釈するときの精度が向上します。</p> <p>&nbsp;</p> <h2>まとめ</h2> <p>初期のテスターの方にお試しいただいたところ、動的エンティティはユーザーを引き付けるために役立つ強力なツールだということがわかりました。短時間で、パーソナライズされた質の高いエクスペリエンスを提供できます。英国のネットスーパー<u><a href="">Ocado</a></u>の<u><a href="">James Dimmock</a></u>氏は「当社は100種類以上の牛乳を販売しており、多くのお客様がブランドや種類に強いこだわりを持っていらっしゃいます。<u><a href="">Ocadoのスキル</a></u>では、動的エンティティを使用することで、お客様がお気に入りの商品をこれまで以上にすばやく簡単にカートに入れられるようになります。」と話しています。</p> <p>また、音声開発者は動的エンティティを使ってビジネスモデルを改善しています。米国の音声開発会社<u><a href=""></a></u>の創業者である<u><a href="">Joel Wilson</a></u>氏は、実行時にスキルを更新することで時間の短縮になり、業務をスピードアップできることを発見しました。Wilson氏は「これで、コンテンツが変わるたびに言語モデルを変更してスキルを再申請する必要がなくなるでしょう」と話してくれました。</p> <p>この記事が、スキルに動的エンティティを組み込む際のヒントになれば幸いです。うまく行ったら、ぜひ教えてください。 私のTwitterアカウントは<u><a href="">SleepyDeveloper</a></u>ですので、連絡をお待ちしています。</p> <p>&nbsp;</p> <h2>関連リソース</h2> <ul> <li><a href="">動的エンティティに関する技術資料</a></li> <li><a href="">カスタマイズした対話に動的エンティティを使用する</a></li> <li><a href="">永続アトリビュートに関するNodeJS SDKドキュメント</a></li> <li><a href="">Announcing Alexa Entity Resolution(英語)</a></li> <li><a href="">Alexa Skill Teardown: Understanding Entity Resolution with the Pet Match Skill(英語)</a></li> <li><a href="">エンティティ解決に関する技術資料</a></li> <li><a href="">Alexa Skill Recipe: Using the Device Address API to Enhance Your Voice Experience(英語)</a></li> <li><a href="">Amazon DynamoDB とは</a></li> </ul> /blogs/alexa/post/99fb071e-9aaf-481b-b9af-0186c0f712a5/how-to-monitor-custom-alexa-skills-using-amazon-cloudwatch-alarms How to Monitor Custom Alexa Skills Using Amazon CloudWatch Alarms Jennifer King 2019-04-17T14:00:00+00:00 2019-04-17T14:00:00+00:00 <p style="text-align:justify">If you have a custom Alexa skill that uses AWS Lambda as the back end, follow the steps below to create alerts using Amazon CloudWatch alarms to get notified when errors occur.</p> <p style="text-align:justify">As a skill developer, you want to make sure that your skill is always working as expected and providing a consistent experience to your customers. One way to do this is with continuous monitoring so that you’re alerted about unexpected errors that may arise with your skill. Monitoring enables you to identify the root-cause of any errors and address those issues quickly. If you do not have monitoring in place, skill issues and errors may go unnoticed for an extended period of time, which could lead to a poor skill experience.</p> <p style="text-align:justify">If you have a custom skill that uses <a href="" target="_blank">AWS Lambda</a> as the back end, follow the steps below to create alerts using Amazon CloudWatch alarms to get notified when there is a spike in errors for your skill.</p> <h2 style="text-align:justify">Logging Error Information</h2> <p style="text-align:justify">In order to monitor skills for errors, you first need to log the appropriate errors. In case of errors with a skill request, the skill receives a <a href="">SessionEndedRequest</a> that contains the error message and error type. You can log this error information to identify the cause of errors with their skill. For complete instructions on how to log and debug this error information, refer to <a href="">this blog post</a>. For this example, every time I get a SessionEndedRequest due to a skill error, I will log it with the prefix “Error Message.”</p> <h2 style="text-align:justify">Adding a Metric Filter in CloudWatch</h2> <p style="text-align:justify">Once you have the error information being logged, the next step is setting a metric filter that you can use to track your errors from CloudWatch. First, you will go to <a href="" target="_blank"></a>. Next, in the navigation panel on the left, select <strong>Logs</strong>. Then, identify the log group for your skill and click on <strong>Create Metric Filter.</strong></p> <p style="text-align:justify"><strong><img alt="" src="" style="display:block; margin-left:auto; margin-right:auto" /></strong></p> <p style="text-align:justify">This will open the <strong>Define Logs Metric Filter</strong> screen. In the filter pattern, enter “Error Message” (or the prefix from your logs on which you want to be alerted on). You will also have an option of testing whether your pattern works.</p> <p style="text-align:justify"><img alt="" src="" style="display:block; height:512px; margin-left:auto; margin-right:auto; width:600px" /></p> <p style="text-align:justify">Next, click on assign metric. This will open the <strong>Create Metric Filter and Assign a Metric</strong> screen. Enter the Filter Name, Metric Namespace, and Metric Name and then click Create Filter.</p> <p style="text-align:justify"><img alt="" src="" style="display:block; height:375px; margin-left:auto; margin-right:auto; width:700px" /></p> <p style="text-align:justify"><strong>Note</strong>: You can also setup your metrics based on individual error types so that you can have separate alarms; for example, for error types INVALID_RESPONSE and INTERNAL_SERVICE_ERROR. You can control this by logging the particular error type in your logs and building your metrics based on each pattern. You can find a list of error types for a custom skill <a href="">here</a>.</p> <h2 style="text-align:justify">Creating the Alarm</h2> <p style="text-align:justify">Once you have your Metric Filter created, you are ready to create alarms. You want to be notified in case you see a rise in errors (identified by your metric filter). Click on Create Alarm for your metric filter.</p> <p style="text-align:justify"><img alt="" src="" style="display:block; margin-left:auto; margin-right:auto" />On the <strong>create new alarm</strong> screen, provide a name and description for your alarm. Also, provide the threshold for the number of errors for which you want to be alerted on. For this example, I will set it as greater than or equal to 3. Next, in the Action section, you can select the method of notification when this alarm is triggered. For my example, I have created an <a href="" target="_blank">AWS SNS topic</a> and subscribed my email to it. So, when this alarm is triggered, it will send me an email on the provided email address.</p> <p style="text-align:justify"><img alt="" src="" /></p> <p style="text-align:justify">Now, whenever customers invoke my skill and there is a spike in errors (three or more requests with errors in this example) on the skill’s back end (and customers hear “Sorry I’m having trouble accessing your skill right now”), I will receive an email notification informing me about the error with the skill. See the example email below:</p> <p style="text-align:justify"><img alt="" src="" /></p> <p>I can then debug and identify the root cause of the issue and resolve it before a lot of customers are impacted by this error.</p> <h2>Related Content</h2> <p>For more information on debugging and troubleshooting custom skills, check out these resources:</p> <ul> <li><a href="">How to Debug Errors for Custom Alexa Skills</a></li> <li><a href="">3 Tips to Troubleshoot Your Custom Alexa Skill's Back End</a></li> <li><a href="">Test and Debug a Custom Skill</a></li> <li><a href="">How to Handle Error Messages about Your Remote Endpoint</a></li> <li><a href="">Why console.log() Is Your Friend</a></li> </ul>