Vielen Dank für deinen Besuch. Diese Seite ist nur in Englisch verfügbar.
Alexa Blogs Alexa Developer Blogs /blogs/alexa/feed/entries/atom 2019-08-16T17:25:01+00:00 Apache Roller /blogs/alexa/post/ca2cfbfb-37a2-49de-840c-f06f6ad8b74d/introducing-custom-interfaces-enabling-developers-to-build-dynamic-gadgets-games-and-smart-toys-with-alexa Introducing Custom Interfaces, Enabling Developers to Build Dynamic Gadgets, Games, and Smart Toys with Alexa Karen Yue 2019-08-15T17:39:52+00:00 2019-08-15T18:55:55+00:00 <p><a href="" target="_blank"><img alt="Alexa Smart Toys" src="" style="height:480px; width:1908px" /></a></p> <p>Today, we are excited to introduce new developer tools that enable you&nbsp;to connect gadgets, games, and smart toy products with immersive skill-based content—unlocking creative ways for customers to experience your product. This is made possible using Custom Interfaces.</p> <p><a href="" target="_blank"><img alt="Alexa Smart Toys" src="" /></a></p> <p>Since launching Alexa more than four years ago, customers have purchased more than 100 million Alexa-enabled devices, allowing them to interact with products in new and&nbsp;engaging ways. Today, we are excited to introduce new developer tools that enable you&nbsp;to connect <a href="" target="_blank">gadgets, games, and smart toy products</a> with immersive skill-based content—unlocking creative ways for customers to experience your product. This is made possible using Custom Interfaces, the newest feature available&nbsp;in the <a href="" target="_blank">Alexa Gadgets Toolkit</a>.</p> <h2>Explore the Fun Side of Alexa: Gadgets, Games, and Smart Toys</h2> <p>Gadgets, games, and smart toys come in all shapes and sizes, and for all ages. With Custom Interfaces, you can design dynamic interactions with Alexa that span multiple product categories from board games and action figures, to gizmos and novelties. For example, a basketball hoop for your office that lights up the scoreboard when you say “Alexa, tell Basketball Hoop to start a game,” and triggers Alexa’s response when you score.</p> <p><a href="" target="_blank"><img alt="" src="" style="float:right; height:202px; padding-left:10px; width:500px" /></a>These ideas are also possible with Custom Interfaces:</p> <ul> <li>A mini keyboard that turns Alexa into a piano teacher, lighting up keys that correspond to a given song and providing feedback on whether you have pressed the right sequence of keys.</li> <li>An indoor drone that flies when you say “Alexa, tell my drone to fly in a figure 8,”&nbsp;and triggers Alexa to play a tune upon landing.</li> <li>A game printer that creates a game sheet when you say, “Alexa, tell Game Printer to give me a Sudoku puzzle.&quot;</li> <li>A dog toy that counts how many times your dog plays fetch, and lights up green when a 20-minute session has concluded.</li> </ul> <h2>The Benefits of Custom Interfaces</h2> <p>With Custom Interfaces, you can build products that can be updated with new functionality and refreshed content to enhance the overall interactive experience. You can also offer premium product features that can be unlocked through in-skill purchasing. Custom Interfaces support the following:</p> <ul> <li><strong>Direct Communication:</strong> Facilitate connection and communication between your product and Alexa, removing the burden of creating a device cloud and customer account management infrastructure.</li> <li><strong>Dynamic Voice Interactions</strong>: Design robust voice interactions for your product, to create extended, story-driven experiences for your customers.</li> <li><strong><strong>Adapts to Your Product</strong></strong>: Get support for a wide range of capabilities, regardless of what you are trying to build.</li> </ul> <h2><a href="" target="_blank"><img alt="diagram" src="" style="float:right; height:268px; margin-bottom:-10px; padding-left:10px; width:500px" /></a>How It Works: The Role of an Alexa Skill</h2> <p>To unlock these features, and enable Alexa to interact with the unique capabilities of your product, you will need to create a compatible Alexa skill. The custom interaction is achieved through the Custom Interface Controller, a skill API that exchanges messages with your product over the course of a given skill session, allowing you to design voice experiences that are tailored to your product’s functionality.</p> <p>Messages sent from your skill to your product, or <em>directives</em>, can be configured to activate a range of reactions from your product through motors, sound chips, lights, and more. You can trigger directives in response to game behavior, alongside specific moments in storytelling, or in its simplest form, in response to an explicit command from your customers.</p> <p>Messages sent from your product to a skill, or <em>events</em>, can be triggered by customers engaging directly with your product whether by activating a button, triggering an accelerometer, or achieving a specific sequence of events. Events can also be triggered by the state of your product. When the skill is in session, you will need to ensure that there is an active input handler to listen for an event. You can determine how long to listen for an event — up to 90 seconds — and filter the specific events that you want your skill to receive.</p> <h2>Build for Younger Audiences (Now in Private Beta)</h2> <p>With the help of Custom Interfaces, we are unlocking additional opportunities for developers to create playful, educational and interactive gadgets, games, and smart toys for younger audiences. From kids role play and action figures to building and learning smart toys, you can create unique story-rich interactions with characters that kids already know and love. For example, a teddy bear that reacts to an audio story provided through a companion Alexa kid skill.</p> <p>All products targeted to kids under the age of 13 must have an accompanying <a href="" target="_blank">kid skill</a>. Consistent with the Children's Online Privacy Protection Act, we require permission from a parent before kid skills can be used.</p> <p>Our Private Beta is limited to commercial developers by invite only.</p> <h2>Get Started with Custom Interfaces</h2> <p>To help you get started on your first prototype using Custom Interfaces, we are excited to share sample projects that enable you to build with Raspberry Pi and Python-based software. The software includes sample applications and step-by-step guides that simplify the process of getting your prototype connected and plugged in to the capabilities of Alexa Gadgets Toolkit. Once connected, you have the flexibility to combine your prototype with off-the-shelf components, such as servos, buttons, lights, and more.</p> <p>Visit our <a href="" target="_blank">resource library</a>, which includes the following:</p> <ol> <li><a href="" target="_blank">Tech documentation</a></li> <li><a href="" target="_blank">Sample application that uses Custom Interfaces and step-by-step guides</a></li> </ol> <p>With Custom Interfaces, there are even more possibilities for engaging experiences that you can build to delight your customers. Start prototyping today and be first-to-market in fun and interactive categories that are not yet connected to Alexa. We can’t wait to see what you build for Alexa customers!</p> /blogs/alexa/post/7eda239b-24a9-45f1-bdc7-d86879dc99d3/new-ai-system-helps-accelerate-alexa-skill-development New AI System Helps Accelerate Alexa Skill Development Larry Hardesty 2019-08-15T13:00:00+00:00 2019-08-15T14:22:51+00:00 <p>Based on embeddings, system suggests named entities — or &quot;slot values&quot; — that developers might want their skills to recognize.</p> <p>Alexa currently has more than 90,000 skills, or abilities contributed by third-party developers — the NPR skill, the Find My Phone skill, the Jeopardy! skill, and so on.</p> <p>For each skill, the developer has to specify both <em>slots</em> — the types of data the skill will act on — and <em>slot values</em> — the particular values that the slots can assume. A restaurant-finding skill, for instance, would probably have a slot called something like CUISINE_TYPE, which could take on values such as “Indian”, “Chinese”, “Mexican”, and so on.</p> <p>For some skills, exhaustively specifying slot values is a laborious process. We’re trying to make it easier with a tool we’re calling catalogue value suggestions, which is currently available to English-language skill developers and will soon expand to other languages.</p> <p>With catalogue value suggestions, the developer supplies a list of slot values, and based on that list, a neural network suggests a range of additional slot values. So if, for example, the developer provided the CUISINE_TYPEs “Indian”, “Chinese”, and “Mexican”, the network might suggest “Ethiopian” and “Peruvian”. The developer can then choose whether to accept or reject each suggestion.</p> <p>“This will definitely improve the dev process of creating a skill,” says Jos&eacute; Chavez Marino, an Xbox developer with Microsoft. “The suggestions were very good, but even if they were not accurate, you just don't use them. I only see positive things on implementing this in the Alexa dev console.”</p> <p>The system depends centrally on the idea of embeddings, or representing text strings as points in a multidimensional space, such that strings with similar semantic content are close together. We use proximity in the embedding space as the basis for three distinct tasks: making the slot value suggestions themselves; weeding offensive terms out of the value suggestion catalogue; and identifying slots whose values are so ambiguous that suggestions would be unproductive.<br /> <br /> <img alt="ambiguous_slots.gif" src="" style="display:block; height:282px; margin-left:auto; margin-right:auto; width:500px" /></p> <p style="text-align:center"><em><sub>Sometimes&nbsp;a skill will include slots such as </sub></em><sub>Things_I_like</sub><em><sub> or even </sub></em><sub>Miscellaneous_terms</sub><em><sub> whose values are so irregular that they provide no good basis for slot value suggestions. Here, the solid blue circle represents the average embedding of the slot values “Bird”, “Dog”, and “Cat” (hollow blue circles), while the solid red square represents the average embedding of the slot values “Left”, “Hamster”, and “Boston” (hollow red squares). If slot-value embeddings lie too far (dotted circles) from their averages, we conclude that suggesting new slot values would be unproductive.</sub></em></p> <p><br /> The first step in building our catalogue of slot value suggestions: assemble a list of <em>phrases</em>, as slot values frequently consist of more than one word — restaurant names and place names, for instance. When training&nbsp;our embedding network, we treated both phrases and non-phrasal words as&nbsp;<em>tokens</em>, or semantic units.&nbsp;</p> <p>We then fed the network training data in overlapping five-token chunks. For any given input token, the network would learn to predict the two tokens that preceded it and the two that followed it. The outputs of the network thus represented the frequencies with which tokens co-occurred, which we used to group tokens together in the embedding space.</p> <p>Next, we removed offensive content from the catalogue. We combined and pruned several publicly available blacklists of offensive terms, embedded their contents, and identified words near them in the embedding space. For each of those nearby neighbors, we looked at its 10 nearest neighbors. If at least five of these were already on the blacklist, we blacklisted the new term as well.</p> <p>When a developer provides us with a list of values for a particular slot, our system finds their average embedding and selects its nearest neighbors as slot value suggestions. If the developer-provided values lie too far from their average <em>(see figure, above)</em>, the system concludes that the slot is too ambiguous to yield useful suggestions.</p> <p>To test our system, we extracted 500 random slots from the 1,000 most popular Alexa skills and used half the values for each slot to generate suggestions. On average, the system provided 6.51 suggestions per slot, and human reviewers judged that 88.5% of them were situationally appropriate.</p> <p><em>Boya Yu is an applied scientist in Alexa AI’s Natural Understanding group.</em></p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Markus Dreyer</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Representing Data at Three Levels of Generality Improves Multitask Machine Learning</a></li> <li><a href="" target="_blank">Who’s on First? How Alexa Is Learning to Resolve Referring Terms</a></li> <li><a href="" target="_blank">To Correct Imbalances in Training Data, Don’t Oversample: Cluster</a></li> <li><a href="" target="_blank">With New Data Representation Scheme, Alexa Can Better Match Skills to Customer Requests</a></li> </ul> <p><sub><em>Animation by&nbsp;<a href="" target="_blank">Nick&nbsp;Little</a></em></sub></p> /blogs/alexa/post/eb8ec4df-6ef0-4dba-a291-3a9f8ef4915d/isp-certification Alexaスキル認定へのヒント: スキル内課金 Takuya Goshima 2019-08-15T06:12:27+00:00 2019-08-15T06:30:39+00:00 <p><img alt="" src="" style="height:480px; width:1908px" />みなさまが開発されたAlexaスキルはスキルストアへの公開に当たり、Alexa審査チームが<a href="">規定された要件</a>をもとに認定審査をさせていただき、 スキルがよいユーザー体験をお届けできるよう必要に応じてフィードバックをさせていただいています。このブログではスキルを申請いただいた際に、特に改善の指摘を受けている項目について紹介します。今回は「スキル内課金」についてです。</p> <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>今回は「スキル内課金」について、スキル認定へのヒントをお伝えします。みなさまが開発されたAlexaスキルはスキルストアへの公開に当たり、Alexa審査チームが<a href="">規定された要件</a>をもとに認定審査をさせていただき、 スキルがよいユーザー体験をお届けできるよう必要に応じてフィードバックをさせていただいています。このブログではスキルを申請いただいた際に、特に改善の指摘を受けている項目について紹介します。</p> <p>&nbsp;</p> <h2><strong>スキル内課金</strong></h2> <p><a href="">スキル内課金</a>とは、Alexaスキル内でデジタルコンテンツに課金できるようにするための仕組みです。<br /> 2019年8月時点では、日本語、英語(米国)、英語(英国)、ドイツ語でのみご利用いただけます。スキル内課金がサポートしている言語以外の言語でスキルを作成されている場合は、ご利用いただくことが出来ませんのでご注意ください。</p> <p>スキル内商品の価格範囲、使用できる言語、公開地域の一覧についての詳細は<a href="">こちら</a>のページをご参照ください。</p> <p>スキル内商品の情報を取得するためのAPIについては<a href="">こちら</a>をご確認ください。</p> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>1.</strong> <strong>購入のキャンセルまたは返金</strong></h3> <p>スキル内課金を利用するすべてのスキルは、ユーザーによる購入のキャンセルまたは返金に対応する必要があります。キャンセルまたは返金の処理には、リクエストをサポートするためのカスタムインテントを構築し、返金するユーザーリクエストを処理するコードを追加する必要があります。この機能の実装方法に関しては<a href="">返金またはキャンセルのリクエストを処理する</a>もしくは<a href="">こちら</a>のページをご参照ください。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>ユーザー: 「アレクサ、洞窟探検を返金して。」</p> <p>Alexa:「返金については、Alexaアプリにリンクを送信しましたので、そちらで確認してください。」</p> </div> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>2.</strong> <strong>アップセルの提供</strong></h3> <p>アップセルとは、販売する商品を宣伝するためにスキルがユーザーに伝えるメッセージのことです。<br /> ユーザーが対象の商品を所有しているかどうかを確認し、有料の商品をおすすめするメッセージ(アップセルメッセージ)がユーザーに示される必要があります。価格の詳細はAlexaによる購入フロー内で提供されるため、アップセルメッセージに含めないでください。詳細は<a href="">購入フローをデザインする</a>をご参照ください。</p> <p>&nbsp;</p> <p>・[アップセル]の発話例</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「すばらしい。 50か国中45か国に正答しました。世界の国々についてより詳しく学びたい場合は、国鳥の拡張パックがお勧めです。詳しく知りたいですか?」</p> <p>ユーザー: 「はい」</p> <p>Alexa: 「国鳥の拡張パックには、195種類の鳥が含まれています。実在の鳥であったり、架空の鳥であったり、国によってさまざまです。クイズを楽しみながらさらに知識を増やせます。税込299円でのご提供です。購入しますか?」</p> <p>ユーザー: 「はい」</p> <p>Alexa: 「国鳥の拡張パックをご購入いただきありがとうございます。 プレイを始めますか?」</p> </div> <p>&nbsp;</p> <p>ユーザーがスキルを使用するうえで有用なスキル内商品を、おすすめとしてスキルの応答内で提示することが出来ます。<br /> おすすめの提示の実装については<a href="">お勧めを行う</a>や、ユーザー<a href="">がスキル内商品を見つけやすくする</a>のページをご参照ください。</p> <p>また、スキルの応答内でユーザーがスキル内商品の情報について、いつでも確認できることを<a href="">リマインド</a>することができます。リマインダーが完了したら、スキルを再開するようにしてください。</p> <p>&nbsp;</p> <p>・[おすすめ]の発話例</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa:「洞窟探検拡張パックを入手すると、冒険をもっと楽しめます。詳しく知りたいですか?」</p> <p>ユーザー:「はい」</p> </div> <p>&nbsp;</p> <p>・[リマインダー]の発話例</p> <p>ユーザーが無料のアドベンチャーシリーズをプレイしており、もうすぐ無料コンテンツをクリアするとします。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「6つの宝物のうち5つを見つけました。おみごとです。 この冒険をクリアしたら、いつでも新しい冒険を入手できます。拡張パックについて聞きたいですか?」(一時停止)</p> <p>Alexa: 「さあ、最後の宝物探しを続けましょう。あなたが暗い森を歩いていると…」</p> </div> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>3. ユーザーが希望するスキル内商品の購入</strong></h3> <p>ユーザーが興味を持つスキル内商品についてはアップセルメッセージが提示されていなくてもユーザーが購入出来るようにする必要があります。<br /> そのためには、それを処理するコードを追加して購入をサポートするカスタムインテントを構築し、ディレクティブを送信して購入フローを開始します。<br /> 購入を実施する手順については、<a href="">購入リクエストに対応するコードを追加する</a>もしくは<a href="">直接購入の処理方法</a>を参照してください。</p> <p>&nbsp;</p> <p>・ユーザーが購入するアイテムを名指しした場合</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>ユーザー: 「洞窟探検拡張パックを買いたい。」</p> <p>Alexa: 「洞窟探検拡張パックには、埋もれている古代の宝物を発見する新しい冒険が5つ含まれています。税込199円でのご提供です。購入しますか?」</p> </div> <p>&nbsp;</p> <p>・アイテムの購入を希望したが、名前を指定していない場合</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>ユーザー: 「拡張パックが欲しい。」</p> <p>Alexa: 「今完了した冒険を続ける拡張パックは2つあります。 洞窟探検のアドベンチャーゲームと深海探査のパズルゲームです。どちらにしますか?」</p> <p>ユーザー: 「深海探査」</p> <p>Alexa: 「わかりました。深海探査拡張パックには7つの新しいパズルがあります。税込399円でのご提供です。購入しますか?」</p> </div> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>4.</strong> <strong>購入完了後の商品の利用</strong></h3> <p>ユーザーが購入したスキル内商品が、買い切り型、消費型、サブスクリプション型のいずれの場合であったとしても、購入処理が完了したら直ちにコンテンツが利用出来る必要があります。また、ユーザーがスキル内商品を購入しなかった場合は、その場合における選択肢を提示してください。購入後については<a href="">購入フロー後の処理</a>、もしくは<a href="">購入フロー後にスキルを再開する</a>を参照してください。</p> <p>&nbsp;</p> <p>ユーザーが、購入したサブスクリプション型の一部である「大亀裂」スキルをプレイしたいと言いました。購入フロー後、すぐにリクエストに応える必要があります。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「大亀裂を探索しましょう。暗い森をようやく抜けると、地面に謎の裂け目がありました…」</p> </div> <p>&nbsp;</p> <p>ユーザーはトリビアゲームでヒント5個パックを購入しました。購入フロー後すぐに、ヒントにアクセスできることを通知する必要があります。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「トリビアで 追加ヒント5個をご購入いただきありがとうございます。使用するときは、「ヒントをください」と言ってください。 それでは最後の問題に戻りましょう。今すぐヒントを使用しますか?」</p> </div> <p>&nbsp;</p> <p>ユーザーが購入しない場合も、同様にコンテキストに応じて処理する必要があります。ユーザーがすべてのコンテンツを消費し、他のオプションを拒否した場合は、セッションを終了します。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「6つの宝物をすべて見つけました。いつでも戻ってきて、新しいアドベンチャーやパズルが追加されているかどうか確認してください」</p> </div> <p>&nbsp;</p> <p>スキル認定に関するフィードバックメールには、特定された問題の概要、必要な場合は各問題の再現方法の手順、およびスキルを認定に進めるためのガイダンスが記載されています。スキルの審査において注釈がある場合は<a href="">こちら</a>を参照いただき、スキルの申請時に開発者コンソールの「公開」タブ内にある「プライバシーとコンプライアンス」中の「テスト手順」のフィールドを通してお知らせください。</p> <p><a href="">Alexa Skills Kit (ASK) (日本語)</a>スペースでは、スキル開発に関する質問や、他の開発者の質問に対する回答を投稿することが出来ます。また、<a href=";sc_channel=website&amp;sc_publisher=devportal&amp;sc_campaign=Conversion_Contact-Us&amp;sc_assettype=conversion&amp;sc_team=us&amp;sc_traffictype=organic&amp;sc_country=united-states&amp;sc_segment=all&amp;sc_itrackingcode=100020_us_website&amp;sc_detail=blog-alexa">お問い合わせフォーム</a>もあわせてご利用ください。</p> <p>&nbsp;</p> <p>スキル内課金の関連記事:<br /> ・<a href="">スキル内課金を使ったスキルを日本のAlexaユーザー向けに開発できるようになりました</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキル内課金のベストプラクティス</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキルによる収益化</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキル内課金の開発手順</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキル内課金に関するFAQ</a><br /> ・<a href="">スキル内課金の認定ガイド</a><br /> ・<a href="">スキル内課金のよくある質問</a></p> /blogs/alexa/post/6a5b0ad4-b27e-4a6b-87dd-3792bab23c51/what-s-new-in-the-alexa-skills-kit-july2019-release-roundup What's New in the Alexa Skills Kit: July 2019 Release Roundup Leo Ohannesian 2019-08-15T00:18:18+00:00 2019-08-15T00:18:18+00:00 <p><img alt="Intent-history_blog.png" src="" /></p> <p>What's new in the Alexa Skills Kit for July 2019? Read our release roundup blog to find out.</p> <p><img alt="Intent-history_blog.png" src="" /></p> <p><em><strong>Editor's Note: </strong>Our monthly release roundup series showcases the latest in Alexa Skills Kit developer tools and features that can make your skills easier to manage, simpler to deploy, and more engaging for your customers. Build with these new capabilities to enable your Alexa skills to drive brand or revenue objectives.</em></p> <p>&nbsp;</p> <p>In this roundup post we share details about the new things released for skill developers last month, including the General Availability of Skill Connections along with several other features&nbsp;that can help make you be more productive or build more engaging skills. Check out the entire livestream for more information from Alexa evangelists and code samples.</p> <p><iframe allowfullscreen="" frameborder="0" height="360" src="//" width="640"></iframe></p> <h2>1. Improve productivity by outsourcing tasks to other skills with Skill Connections, now Generally Available</h2> <p>Skill connections enable a skill to use another skill to perform a specific task, so you can do more for your customers by extending your skill's abilities with minimal changes. <a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Productivity&amp;sc_country=WW" target="_blank">Check out the announcement here</a> or <a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Productivity&amp;sc_country=WW" target="_blank">learn more about Skill Connections&nbsp;in our tech docs. </a></p> <ul> </ul> <h2>2. Easily integrate leaderboards into your game skills using the skills GameOn SDK (Beta)</h2> <p>Leaderboards are a great way to keep players engaged with your game skills and drive retention. You can now use the Skills GameOn SDK (beta), powered by Amazon GameOn and optimized for Alexa skills, to easily integrate leaderboards into your game skills. We are also excited to announce that we have a special offer to help more skill developers leverage the GameOn capabilities. <a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Games&amp;sc_country=WW" target="_blank">Learn more about the GameOn SDK by reading our blog.</a></p> <h2>3. Alexa Presentation Language 1.1 (Beta)</h2> <p>We are excited to announce the next version of Alexa Presentation Language (APL) with support for animations, vector graphics, better tooling, and a design system that makes APL skill development for multiple viewport profiles faster. Read about&nbsp;<a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=APL&amp;sc_country=WW" target="_blank">it in our announcement.</a></p> <h2>4. Quickly test your VUI with Quick Builds, now on the Developer Console</h2> <p>Save time and start testing early: We are excited to announce developer console support for Quick builds, which enable you to start testing your skill with sample utterances on average 67% quicker than before. This is done by introducing a new intermediate build state called Quick Build. Read about&nbsp;<a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Productivity&amp;sc_country=WW#build-and-save" target="_blank">it in our tech docs.</a></p> <p>As always, we can't wait to see what you build. As a reminder, learn how to get the most out of the tech docs by visiting the <a href="" target="_blank">Latest Tips page.</a></p> /blogs/alexa/post/67edf9f0-1ec6-4261-ad6b-46cf36d87fbb/voice-agency-say-it-now-ceo-discusses-reaping-big-rewards-from-the-evolving-voice-industry Voice Agency 'Say It Now' CEO Discusses Reaping Big Rewards from the Voice Industry Emma Martensson 2019-08-14T14:00:00+00:00 2019-08-14T14:00:00+00:00 <p><img alt="Voice Agency 'Say It Now' CEO Discusses Reaping Big Rewards from the Voice Industry " src="" /></p> <p><a href="" target="_blank">Charlie Cadbury</a>, CEO of <a href="" target="_blank">Say It Now</a>, has adapted alongside technology since 1999 when he sold his first website. Today, Say It Now is a group of enterprise natural language processing (NLP) experts building out conversational strategies and products alongside Fortune 500 companies.</p> <p><img alt="Voice Agency 'Say It Now' CEO Discusses Reaping Big Rewards from the Voice Industry " src="" /></p> <p><a href="" target="_blank">Charlie Cadbury</a>, CEO of <a href="" target="_blank">Say It Now</a>, has adapted alongside technology since 1999 when he sold his first website. In 2015 he began playing with voice with a proof of concept for Allegiant Air. Charlie then spent most of 2016 and 2017 attending conferences with an Amazon Echo, pitching (and winning) several travel innovation competitions for a conversational hotel concierge service called <a href="" target="_blank">‘Dazzle’</a>. Charlie got early support from Marriott Hotels for Dazzle and was able to build out the proposition in Marriott Hotel London County Hall. “That recognition gave me confidence that voice had potential; however at that point in 2017 the business case for voice wasn’t very well established and often guests had never seen or heard of a smart speaker before,” Charlie says. Dazzle grew into a multi-award winning conversational service, and together with the VP of product, <a href="" target="_blank">Sander Siezen</a>, Charlie went on to set up Say It Now, a voice agency, at the end of summer 2018.</p> <p>Today, Say It Now is a group of enterprise natural language processing (NLP) experts building out conversational strategies and products alongside Fortune 500 companies. “The business benefits being reaped are better articulated now we’ve been in the NLP space for 4 years and are clearer than ever about the strategies brands should adopt,” Charlie says. Recently Say It Now won the UK and EU rounds of The Alexa Cup, both incredible milestones for them, and won the Bronze medal in the global final. Charlie adds “We’re happy with the top spot in Europe and third in the world!”</p> <h2>How to Collaborate with Clients on a Voice Strategy</h2> <p>Say It Now has, together with their clients, developed a workshop technique that allows them to assess whether voice is the right approach and, if so, where the most value can be found right now. They also create a roadmap for where the value and growth will come from over time. In this workshop they then form an actionable plan that is carried out collaboratively with their client. Whilst Say It Now brings their specific industry expertise to the table to deliver insight and guidance, the clients educate them about the internal machinations of their organisation and their strategy.</p> <p>This is exactly the way they work with their client <a href="" target="_blank">Diageo</a> on their Alexa skill, <a href="" target="_blank">Talisker Tasting</a>. Diageo has <a href="" target="_blank">stated</a> that this kind of relationship has the desired effect and as a result they have committed to continued investment in voice.</p> <p>For Say it Now’s own skills, like the Alexa Cup’s winning submission Book It Now, they took a slightly different approach. “We sought to find a solution for booking services through Alexa and adding that functionality at scale,” Charlie says.</p> <h2>Say It Now Expects Big Rewards from ‘Conversational Commerce’</h2> <p>“I started talking about ‘emerging commerce’, the idea that the way we transact has always and will always evolve, in 2012 when building out some early mobile payment apps.” He continues, “I’ve taken a very keen interest in the development of ‘conversational commerce’ over the past few years and believe it is inevitable and will be transformative to the voice industry.” Charlie believes the way it manifests will take a few years to mature but there are big rewards if some of this transaction value can be captured. Say It Now is working hard to ensure they’re part of this evolution.</p> <h2>Aspiring Voice Agencies Should Connect with the Voice Community</h2> <p>Charlie sees relationships and community as key when building skills and starting a voice business. Say It Now has had a lot of success building relationships with other developers, voice designers and the community at large. “It means you are abreast of the latest developments and have the right kind of people around you to make sense of this rapidly changing world,” he says. He recommends the <a href="" target="_blank"></a> podcast and <a href="" target="_blank">Voice First Community</a> for any aspiring voice developers.</p> <h2>More ‘Wow’ in the Future of Voice</h2> <p>Say It Now is interested to see how <a href="">Name Free Skill Interactions (NFSI) </a>and <a href="">Alexa Conversations </a>develop in the future. “Discovery and cross skill flows are still a challenge,” he says, “but when unlocked we expect to see more ‘wow’ and utilisation of voice as a place where even more complex tasks can be seamlessly delegated to your Amazon Alexa.”</p> <p>Charlie thinks this change has to come from a place of trusted personalisation, and that trust will take a while to be built up and must stem from delightful voice experiences that will provide the data businesses need. “But get there we will,” he concludes. Charlie and Say It Now is excited to play their part in the rapidly evolving story of voice.</p> <h2>Related Content</h2> <ul> <li><a href="">Hugo’s Move from Digital Nomad to Full Time Alexa Skills Developer</a></li> <li><a href="">How Vocala is Creating a Growing Voice Business</a></li> <li><a href="">Make Money with In-Skill Purchasing</a></li> <li><a href="">Sell Premium Content to Enrich Your Skill Experience</a></li> </ul> <h2>Grow Your Voice Business with Monetized Alexa Skills</h2> <p>With in-skill purchasing (ISP), you can sell premium content to enrich your Alexa skill experience. ISP supports one-time purchases for entitlements that unlock access to features or content in your skill, subscriptions that offer access to premium features or content for a period of time, and consumables which can be purchased and depleted. You define your premium offering and price, and we handle the voice-first purchasing flow.&nbsp;If you add ISP to your skill, you may be eligible to earn a voucher for the <a href="" target="_blank">AWS Certified Alexa Skill Builder</a> exam through the&nbsp;<a href=";linkId=67863388#?&amp;sc_category=Owned&amp;sc_channel=SM&amp;sc_campaign=EUPromotion&amp;sc_publisher=LI&amp;sc_content=Promotion&amp;sc_funnel=Publish&amp;sc_country=EU&amp;sc_medium=Owned_SM_EUPromotion_LI_Promotion_Publish_EU_EUDevs&amp;sc_segment=EUDevs">EU Perks Program</a>. <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=wb_acquisition&amp;sc_publisher=ASK&amp;sc_content=Content&amp;sc_detail=vod-webinar&amp;sc_funnel=Convert&amp;sc_country=WW&amp;sc_medium=Owned_WB_wb_acquisition_ASK_Content_vod-webinar_Convert_WW_visitors_makemoney-page_CTA-graphic&amp;sc_segment=visitors&amp;sc_place=makemoney-page&amp;sc_trackingcode=CTA-graphic" target="_blank">Download our introductory guide</a> to learn more.</p> /blogs/alexa/post/d92c7822-d289-44fd-a9fe-9652874fc3c9/five-benchmarks-for-writing-dialog-that-sounds-great-to-alexa-customers Five Benchmarks for Writing Dialog that Sounds Great to Alexa Customers Michelle Wallace 2019-08-13T16:25:21+00:00 2019-08-13T16:25:21+00:00 <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>Great Alexa skills depend on written prompts. This post covers five benchmarks your Alexa skill’s dialog should meet, and specific techniques for how you can get there.</p> <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>Great Alexa skills depend on written prompts. In voice-first interfaces, the dialog you write isn’t one component of the user interface—it <em>is </em>the interface, because Alexa’s voice is the primary guide leading a customer through your skill.<br /> <br /> But if you don’t have a background in writing, that’s okay! Any skill builder can improve their written dialog so it successfully serves the customer. This post covers five benchmarks your Alexa skill’s dialog should meet, and specific techniques for how you can get there.</p> <h2>Benchmark 1: Avoid Jargon and Ten-Dollar Words</h2> <p>Customers love low-friction interactions, and the individual words in your dialog can be a huge part of keeping the interaction simple and easy. Informal language is faster and less burdensome for a customer to process, so they can follow a voice interaction without pausing to respond.<br /> <br /> Here are some examples of commonly used jargon or overly formal words, along with alternatives that could be used instead:<br /> <br /> Jargon: “You can default to a stored method associated with this account, or override it by selecting an alternate method of payment.”<br /> Simpler: “You can use the credit card on file, or add a new card.”</p> <p>Jargon: “I can submit a request for a customer service representative to return your call.”<br /> Simpler: “I can have an agent call you back.”</p> <p>Jargon: “Would you like me to submit your order for processing?”<br /> Simpler: “Ready to finish your order?”<br /> <br /> Jargon: “The transaction attempt was not successful.”<br /> Simpler: “Hmm. Something went wrong with your payment.”<br /> <br /> So, what are some techniques for replacing jargon with clearer language? First, fresh eyes are valuable here. Find someone who’s not an expert in your skill’s content, and ask them to read or listen to your dialog and point out words that feel unfamiliar to them. Second, once you’ve identified some clunky words, find synonyms that are less formal. (Don’t be afraid to dust off that thesaurus!)</p> <h2>Benchmark 2: Apply the One-Breath Test for Concision</h2> <p>Remember that your skill’s dialog will be spoken out loud, one word at a time, so excess words in your prompts quite literally add time to the interaction. A useful guideline is that a prompt should be about as long as a human could say in one breath. It’s a great idea to read your dialog out loud or have a colleague read it to you.<br /> <br /> If you identify some prompts that don’t pass the <a href="">one-breath test</a>, here are some ways you can shorten them:</p> <ul> <li>Cut filler words, like “very.” Keep an eye out for words that don’t change the meaning of a sentence or add information; you can eliminate these.</li> <li>Look out for wordiness around verbs. For example, “I’d like to be able to help you” can be shortened to “I can help.”</li> <li>Find information that customers don’t need. For example, if a prompt contains a date, like “Your order will be ready on August 2, 2019,” you can usually omit the year.</li> </ul> <p>There are concrete techniques you can use to make sentences concise. First, make sure each sentence passes the one-breath test by reading it aloud. Next, if you find sentences that don’t pass the test, cut your sentences down by challenging yourself to omit 2-5 words from every line of dialog in your code.</p> <h2>Benchmark 3: Introduce Variety into Your Dialog</h2> <p>Humans use a lot of variation in the way they speak. In contrast, voice experiences that repeat the same phrases don’t sound natural to the human ear. You can avoid repetitive dialog by adding randomized variations to your dialog.&nbsp;<br /> <br /> Look for the &nbsp;skill dialog that your users will hear the most often, starting with the greeting. Imagine a skill that allows you to order groceries called Grocery Store. If you heard “Welcome to the Grocery Store!” with every launch, you’d grow tired of this greeting.<br /> <br /> As a skill builder, you could provide randomized phrases so that customers might hear one of several responses upon launch. For example:</p> <ul> <li>Thanks for stopping by the Grocery Store.</li> <li>Hi, you’ve reached the Grocery Store.</li> <li>Let’s fill your cart at the Grocery Store!</li> </ul> <p>Another opportunity for variation is confirming a purchase, or congratulating a customer for completing a task. For example, if you have a skill that sells cupcakes, you could randomize phrases that confirm the purchase:</p> <ul> <li>You’re all set! Treats are on the way.</li> <li>It’s cupcake time! Your order is complete.</li> <li>Sweet! You’ve successfully ordered your cupcakes.</li> </ul> <p>It’s important to keep aspects of the flow consistent; your skill shouldn’t feel radically different or unfamiliar each time. But creating variation is an important way to keep your skill interesting and fresh, especially for skills a user might open every day, like skills for weather, exercise, or news.<br /> <br /> To make sure your dialog isn’t overly repetitive, you can add a few simple techniques to your process. First, take a look at your list of dialog lines and identify 3-5 prompts that your customers will encounter each time they use your skill. Next, write 2-5 (or more!) variations for each of these lines. It’s a good idea to ask a few friends or colleagues to help you brainstorm, as you may come up with more creative variations as a group.<br /> <br /> For more guidance, check out the Alexa Design Guide’s section on <a href="">adding variety</a>&nbsp;in <a href="">repetitive tasks</a>, and using <a href="">adaptive prompts</a>.</p> <h2>Benchmark 4: Try Contractions and Informal Phrasing</h2> <p>General advice for Alexa dialog is <a href="">“Write it the way you say it.”</a> People use lots of contractions when they speak, such as:</p> <ul> <li>“I’m” instead of “I am”</li> <li>“I’d” instead of “I would”</li> <li>“Don’t” instead of “do not”</li> <li>“Can’t” instead of “cannot”</li> </ul> <p>“I cannot help you with that” sounds much stiffer than “I can’t help you with that.” Because your skill’s dialog should be casual and conversational, for most situations, the contracted version is preferred.<br /> <br /> Humans also use short phrases; not every line of dialog has to be a complete sentence. This keeps your prose natural, and contributes to concise sentences. For example:</p> <ul> <li>“Done!” instead of “This purchase is complete.”</li> <li>“Ready?” instead of “Do you want to continue?”</li> <li>“Got it, four adult tickets to next week’s choir concert!” instead of “Okay, I will place an order for four adult tickets to go see the choir concert taking place next week.”</li> </ul> <p>With just a little extra effort, you can make sure your dialog sounds casual and easy on the ear. First, circle all of the verbs that could be turned into contractions. Reading out loud can help you identify these places, too. Next, you can identify dialog that can be turned into shorter phrases. Some good candidates for phrases are prompts that end with a question and confirmation phrases.</p> <h2>Benchmark 5: Use SSML for Better Pacing</h2> <p>When customers listen to a long string of dialog without meaningful pauses, the words can bleed together and create confusion. It’s a great idea to employ synthetic speech markup language (<a href="">SSML</a>) to adjust aspects of Alexa’s speech so it sounds even more natural to a human ear.<br /> <br /> You can use SSML to do lots of things, from tweaking a word’s pronunciation to adjusting emphasis on a specific syllable. But perhaps the simplest SSML tag with the biggest impact is the <a href="">break time</a> tag, which represents a pause in speech. Sometimes adding even a few milliseconds of extra time can help your customer comprehend the prompt more easily.<br /> <br /> For example, you can use SSML to add time between menu items:</p> <pre> <code>&lt;speak&gt; There are three house plants I’d recommend for your apartment: elephant ear, &lt;break time=&quot;600ms&quot;/&gt; peace lily &lt;break time=&quot;600ms&quot;/&gt; and spider plant. &lt;/speak&gt; </code></pre> <p>You can also add a lengthier pause between sentences, usually to indicate a transition between content and a next step:</p> <pre> <code>&lt;speak&gt; You answered a total of 14 questions right! That beats your all-time high score of 12 correct answers. &lt;break time=&quot;1s&quot;/&gt; Want to play again? &lt;/speak&gt; </code></pre> <p>To identify places where a pause is useful, listen to each prompt being read <em>by Alexa</em>. An easy way is to paste your dialog into the <strong>Voice &amp; Tone</strong> speech simulator, located in the <strong>Test</strong> tab in the Alexa developer console. If a sentence seems rushed, add some break time tags and listen again to fine-tune. You can experiment with adding pauses of varying lengths, from 300 milliseconds to one second.</p> <h2>Benchmark Checklist</h2> <p>If you’ve done all of these things, your dialog will be crafted for a natural, concise, easy-on-the-ear customer experience.</p> <ol> <li>Eliminate jargon by asking for feedback from someone who’s not an expert in your skill’s content.</li> <li>Perform the one-breath test and, if you need to, cut 3-5 words from every sentence.</li> <li>Identify 3-5 prompts that will be commonly encountered and write at least two variations for each.</li> <li>Where you can, reduce your verb phrases to contractions and shorten some sentences to phrases.</li> <li>Listen to Alexa read every line and add spacing between phrases and sentences.</li> </ol> <p>In general, the best way to confirm you’ve got great dialog is to read it aloud. Better yet, read it aloud to a friend or colleague who represents your customer base. Check to make sure they had an easy time understanding and responding to your prompts, and use their feedback to tweak your dialog until it has a conversational tone that’s easy to comprehend. Taking the extra time to scrutinize your dialog will help you craft a skill experience that’s conversational, intuitive, and frictionless for your customers.</p> <h2>Related Content</h2> <ul> <li><a href="">Alexa Design Guide</a></li> <li><a href="">Speech Synthesis Markup Language (SSML) Reference</a></li> <li><a href="">How to Write Great Dialogs for Alexa Skills</a></li> <li><a href="">Best Practices for the Welcome Experience and Prompting in Alexa Skills</a></li> </ul> /blogs/alexa/post/bb807639-efc1-45aa-ac59-11143c0e5a06/more-efficient-machine-learning-models-for-on-device-operation More-Efficient Machine Learning Models for On-Device Operation Larry Hardesty 2019-08-13T13:00:00+00:00 2019-08-13T13:00:00+00:00 <p>Two new papers explore techniques for increasing the computational efficiency and reducing the memory footprints of neural networks that process audio data.</p> <p><sup><em>Ming Sun and Bowen Shi cowrote this post with Chieh-Chi Kao.</em></sup></p> <p>Neural networks are responsible for most recent advances in artificial intelligence, including many of Alexa’s <a href="" target="_blank">latest</a> capabilities. But neural networks tend to be large and unwieldy, and in recent years, the Alexa team has been <a href="" target="_blank">investigating</a> techniques for making them <a href="" target="_blank">efficient enough</a> to run on-device.</p> <p>At this year’s Interspeech, we and our colleagues are presenting <a href="" target="_blank">two</a> <a href="" target="_blank">papers</a> that describe techniques for reducing the complexity of networks that process audio data. One of the networks recognizes individual spoken words; the other does acoustic-event detection.</p> <p>Acoustic-event detection is the technology behind <a href=";node=18021383011" target="_blank">Alexa Guard</a>, a feature that customers can enable on Echo devices to detect and notify them about the sound of smoke and carbon monoxide alarms or glass breaking while they’re away from home. With Guard, running a detector on-device helps protect customer privacy, ensuring that only highly suspicious sounds pass for confirmation to a more powerful detector running in the cloud.</p> <p>Both models rely on convolutional neural networks, although in different ways. Originally developed for image processing, convolutional neural nets, or CNNs, repeatedly apply the same “filter” to small chunks of input data. For object recognition, for instance, a CNN might step through an image file in eight-by-eight blocks of pixels, inspecting each block for patterns associated with particular objects. That way, the network can detect the objects no matter where in the image they’re located.</p> <p>Like images, audio signals can be represented as two-dimensional data. In speech recognition, for instance, it’s standard to represent signals using mel-frequency cepstral coefficients, or MFCCs. A signal’s cepstral coefficients are a sequence of numbers that describe its frequency characteristics; <em>cepstral</em> connotes a transformation of <em>spectral</em> properties. “Mel” means that the frequency bands are chosen to concentrate data in frequency ranges that humans are particularly sensitive to. Mapping cepstral coefficients against time produces a 2-D snapshot of an acoustic signal.</p> <p>In object recognition, a CNN will typically apply a number of filters to each image block, each filter representing a different possible orientation of an object’s edge. Our system, too, applies a number of different filters, each attuned to characteristics of particular words. In our case, however, each filter is relevant only to some cepstral coefficients, not to all.</p> <p>We exploit this difference to increase network efficiency. Our network architecture applies each filter only to the relevant cepstral coefficients, reducing the total number of operations required to identify a particular word. In experiments, we compared it to a traditional CNN and found that, when we held the output accuracy fixed, it reduced the computational load (measured in FLOPS, or floating-point operations per second) by 39.7% on command classification tasks and 49.3% on number recognition tasks.</p> <p><img alt="CNN_comparison.jpg" src="" style="display:block; height:270px; margin-left:auto; margin-right:auto; width:450px" /></p> <p style="text-align:center"><em><sub>A traditional CNN (left) and our more-efficient CNN, which applies filters (</sub></em><sub>Conv1_1</sub><em><sub> through </sub></em><sub>Conv1_3</sub><em><sub>) only to the relevant cepstral coefficients. Note that in the signal representations, time is the y-axis.</sub></em></p> <p>In our other paper, we combine two different techniques to improve the efficiency of a sound detection network: distillation and quantization. Distillation is a technique in which the outputs of a large, powerful neural network — in this case, a CNN — are used to train a leaner, more efficient network — in this case, a shallow long-short-term-memory network, or LSTM.</p> <p>Quantization is the process of considering the full range of values that a particular variable can take on and splitting it into a fixed number of intervals. All the values within a given interval are then approximated by a single number.</p> <p>The typical neural network consists of a large number of simple processing <em>nodes</em>, each of which receives data from several other nodes and passes data to several more. Connections between nodes have associated <em>weights</em>, which indicate how big a role the outputs of one node play in the computation performed by the next. Training a neural network is largely a matter of adjusting its connection weights.</p> <p>As storing a neural network in memory essentially amounts to storing its weights, quantizing those weights can dramatically reduce the network’s memory footprint.</p> <p>In our case, we quantize not only the weights of our smaller network (the LSTM) but also its input values. An LSTM processes sequences of data in order, and the output corresponding to each input factors in the inputs and outputs that preceded it. We quantize not only the original inputs to the LSTM but also each output, which in turn becomes an input at the next processing step.</p> <p>Furthermore, we quantize the LSTM during training, not afterward. Rather than fully training the LSTM and only then quantizing its weights for storage, we force it to select quantized weights during training. This means that the training process tunes the network to the quantized weights, not to continuous values that the quantizations merely approximate.</p> <p>When we compare our distillation-trained and quantized LSTM to an LSTM with the same number of nodes trained directly on the same data, we find that it not only has a much smaller memory footprint — one-eighth the size — but also demonstrates a 15% improvement in accuracy, a result of the distillation training.</p> <p><em>Chieh-Chi Kao is an applied scientist, Ming Sun a senior speech scientist, and Bowen Shi a summer intern (from the Toyota Technological Institute at Chicago), all in the Alexa Speech group.</em></p> <p><strong>Papers</strong>:&nbsp;<br /> “<a href="" target="_blank">Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification</a>”<br /> “<a href="" target="_blank">Compression of Acoustic Event Detection Models With Quantized Distillation</a>”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Yixin Gao, Shiv Vitaladevuni, Viktor Rozgic, Spyros Matsoukas, Chao Wang</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Amazon Mentors Help UMass Graduate Students Make Concrete Advances on Vital Machine Learning Problems</a></li> <li><a href="" target="_blank">Two New Papers Discuss How Alexa Recognizes Sounds</a></li> <li><a href="" target="_blank">New Method for Compressing Neural Networks Better Preserves Accuracy</a></li> <li><a href="" target="_blank">Identifying Sounds in Audio Streams</a></li> <li><a href="" target="_blank">Shrinking Machine Learning Models for Offline Use</a></li> </ul> <p><br /> &nbsp;</p> /blogs/alexa/post/2b4de691-9cad-4c82-86e0-98e674786742/integrate-the-reminders-api-with-your-skill-to-deepen-customer-engagement1 Integrate the Reminders API with Your Skill to Deepen Customer Engagement Pan Wangperawong 2019-08-09T17:11:55+00:00 2019-08-09T17:11:55+00:00 <p><img alt="" src="" /></p> <p>In this tutorial, we will go over how to integrate the Reminders API into your skill&nbsp;to actively extend its utility without requiring customers to launch your skill.</p> <p><img alt="" src="" /></p> <p>Integrating the Reminders API in your skill is a great way to actively extend its utility without requiring customers to launch your skill. With the Reminders API, you can engage more frequently with your customers and become a part of their routines.</p> <p>In this tutorial, we will go over how to integrate the Reminders API into your skill. We'll be using Node.js and the Alexa Skills Kit (ASK) SDK. This tutorial uses an <a href="" target="_blank">Alexa-hosted skill</a>, but it can easily be adapted to use a local development process with the ASK Command-Line Interface (CLI).</p> <h2>How do Reminders Work?</h2> <p>Before we get started, let’s first go over the basics of how users can interact with a skill that utilizes the Reminders API, as well as how developers can interact with the Reminders API. To understand how users can interact with a skill that uses the Reminders API, consider the following interaction:</p> <blockquote> <p><strong>Alexa:</strong> Would you like to schedule a daily reminder at one p.m. to get a banana from the stand?</p> <p><strong>User:</strong> Sounds good.</p> </blockquote> <p>In this example, a reminder will be scheduled at 1:00 p.m. and when it comes time, the Alexa-enabled device will chime and announce the reminder. If you have the Alexa mobile app and have push notifications enabled, it will send the reminder as a push notification as well.</p> <p>For developers, there are two ways to interact with the Reminders API: in-session interactions and out-of-session interactions. For the purposes of this tutorial, we are only going to focus on in-session interactions.</p> <p>In-session interactions allows a user interacting directly with the skill to create, read, update, and delete reminders. It’s important to note that reminders can only be created through in-session interactions, so it’s important to be clear and upfront about what customers can expect when they create the reminder, so they are not surprised by the outcome later. See our documentation for <a href="" target="_blank">best practices integrating reminders in your skill</a> and how to <a href="" target="_blank">develop a good customer experience with reminders</a>. If we recall the dialog shown earlier in this tutorial, a best practice is to specify frequency, time, and purpose.</p> <blockquote> <p><em>Would you like to schedule a daily (<strong>frequency</strong>) reminder at one p. m. (<strong>time</strong>) to get a banana from the stand (<strong>purpose</strong>)?</em></p> </blockquote> <p>Out-of-session interactions do not require a user to interact directly with a skill to read, update, and delete a reminder. These operations can be managed through the skill code on behalf of the user. However, out-of-session interactions cannot create a reminder. The functionality is limited to read, update, and delete. We are only focusing on in-session interactions in this tutorial, but if you’d like to learn more about out-of-session interactions, see <a href="" target="_blank">the documentation</a>.</p> <p>Now that we've covered the basics of the Reminders API and best practices for using it, let's see how we can implement the code to create a reminder. You can follow along here, watch the video, or both.</p> <p style="text-align:center"><iframe allowfullscreen="" frameborder="0" height="360" src="" width="640">&lt;br /&gt; </iframe></p> <h2>Initialize a New Skill Project Using the Hello World Example</h2> <p>Now we're going to create a basic Hello World skill, using the Alexa-hosted option for our back end. If you are already familiar with this process, you can skip to the next section. Otherwise you can watch a video on setting up an <a href="">Alexa-hosted Skill</a> or click on <strong>Create Skill</strong> and take the following steps:</p> <ol> <li>Name the skill <strong>Banana Stand</strong></li> <li>Choose <strong>Custom</strong> for skill type</li> <li>Choose <strong>Alexa-hosted</strong> for skill back end</li> <li>Click <strong>Create skill</strong></li> </ol> <p><img alt="reminders(1).png" src="" style="height:826px; width:800px" /></p> <h2>Enable Reminders Permission</h2> <ol> <li>Select <strong>Permissions</strong> on the bottom left menu<br /> <br /> <img alt="reminders_api_2(1).png" src="" style="height:1022px; width:800px" /><br /> &nbsp;</li> <li>Scroll down to <strong>Reminders</strong> and toggle the switch on to enable it for the skill<br /> <br /> <img alt="reminders3(1).png" src="" style="height:776px; width:800px" /></li> </ol> <h2>Update Packages</h2> <p>Click on <strong>Code</strong> and open <code>package.json</code> and update the packages to the latest versions:</p> <ol> <li>Click on the <strong>Code</strong> tab</li> <li>Open <code>package.json</code></li> <li>Update the following packages <ol> <li>Update <code>ask-sdk-core</code> to the <a href="" target="_blank"><em>latest version</em></a> (<code>&quot;ask-sdk-core&quot;: &quot;^2.6.0&quot;</code> when this tutorial was published)</li> <li>Add <a href="" target="_blank"><em>Moment Timezone</em></a> package (<code>&quot;moment-timezone&quot;: &quot;^0.5.25&quot;</code> when this tutorial was published) for generating and manipulating ISO 8601 timestamps</li> </ol> </li> <li>Click <strong>Deploy</strong></li> </ol> <p style="margin-left:40px"><img alt="reminders_4.png" src="" style="height:776px; width:800px" /></p> <h2>Update LaunchIntentHandler</h2> <ol> <li>Click on the <strong>Code</strong> tab</li> <li>Scroll to <code>LaunchRequestHandler</code></li> <li>Update the <code>speechText</code> to the following: <pre> <code class="language-javascript">const speechText = “Welcome to Banana Stand. Would you like a daily reminder at one p.m. to pickup a banana from the stand?&quot;</code></pre> </li> </ol> <h2>Add CreateReminderIntentHandler</h2> <ol> <li>Click on the <strong>Build</strong> tab and select the <strong>HelloWorldIntent</strong> <ul> <li>Delete <code>HelloWorldIntent</code></li> <li>Add a new intent <ul> <li>Select <strong>Use an existing intent from Alexa's built-in library</strong></li> <li>Search for <code>AMAZON.YesIntent</code></li> </ul> </li> <li>Add sample utterances by clicking on the <strong>Sample Utterances</strong> section</li> <li>Click on <strong>Build Model</strong> to train the model based on what you added</li> </ul> </li> <li>Click on the <strong>Code</strong> tab and rename <code>HelloWorldIntentHandler</code> to <code>CreateReminderIntentHandler</code>: <pre> <code class="language-javascript">const CreateReminderIntentHandler = { ... // handler code }</code></pre> </li> <li>Update the <code>canHandle</code> function for <code>CreateReminderIntentHandler</code> to check for <code>CreateReminderIntent</code> with the following: <pre> <code class="language-javascript">const CreateReminderIntentHandler = { canHandle(handlerInput) { return handlerInput.requestEnvelope.request.type === &quot;IntentRequest&quot; &amp;&amp; /* Update it to check for AMAZON.YesIntent */ === &quot;AMAZON.YesIntent&quot;; }, handle(handlerInput) { ... // handler code } }</code></pre> </li> <li>Go to the <code>return exports.handler</code> at the bottom of the code and update <code>HelloWorldIntentHandler</code> to <code>CreateReminderIntentHandler</code>: <pre> <code class="language-javascript">return exports.handler = Alexa.SkillBuilders.custom() .addRequestHandlers( ... // other intent handlers /* register CreateReminderIntentHandler */ CreateReminderIntentHandler, ) .addErrorHandlers(ErrorHandler) .lambda()</code></pre> </li> </ol> <h2>Create an Instance of theRemindersManagementServiceClient:</h2> <ol> <li>Delete all the code in the <code>handle</code> function</li> <li>At <code>exports.handler</code> after <code>addErrorHandlers(ErrorHandler)</code> add <code>withApiClient(new Alexa.DefaultApiClient())</code>: <pre> <code class="language-javascript">return exports.handler = Alexa.SkillBuilders.custom() ... // RequestHandlers .addErrorHandlers(ErrorHandler) /* add API Client Builder */ .withApiClient(new Alexa.DefaultApiClient()) .lambda()</code></pre> </li> <li>Create an instance of <code>remindersClient</code> to interface with the Reminders API for creating a reminder: <pre> <code class="language-javascript">handle(handlerInput) { const remindersApiClient = handlerInput.serviceClientFactory.getReminderManagementServiceClient() }</code></pre> </li> </ol> <h2>Check If User Has Granted the Skill Permission to Send Reminders</h2> <ol> <li>To declare and set the <code>permissions</code> variable we will use the <a href="">ES6 destructor assignment</a> syntax: <pre> <code class="language-javascript">handle(handlerInput) { const remindersApiClient = handlerInput.serviceClientFactory.getReminderManagementServiceClient(), /* Use ES6 destructor assignment syntax to declare and set permissions object in one step */ { permissions } = handlerInput.requestEnvelope.context.System.user }</code></pre> </li> <li>Check if the user has granted permission for the skill to send reminders. If not, provide the user with an AskForPermissionsConsent consent <pre> <code class="language-javascript">handle(handlerInput) { const apiClient = handlerInput.serviceClientFactory.getReminderManagementServiceClient(), { permissions } = handlerInput.requestEnvelope.context.System.user /* Check if user has granted the skill permissions. If not, send consent card to request reminders read and write permission for skill */ if(!permissions) { return handlerInput.responseBuilder .speak(&quot;Please enable reminders permissions in the Amazon Alexa app&quot;) .withAskForPermissionsConsentCard([&quot;alexa::alerts:reminders:skill:readwrite&quot;]) .getResponse() } }</code></pre> </li> <li>Click <strong>Deploy</strong> and test skill on a device</li> </ol> <h2>Create a One-Time Reminder</h2> <ol> <li>Declare the reminder request object that uses <code>SCHEDULED_RELATIVE</code> to schedule a one-time reminder for 20 seconds later. Read more about about constructing a request body that uses SCHEDULED_RELATIVE in <a href="">the documentation</a>. <pre> <code class="language-javascript">handle(handlerInput) { ... // Continuing from the code we wrote to check if the customer has granted the reminders permission to the skill /* Declare the reminderRequest object to make a request to schedule a reminder */ const reminderRequest = { trigger: { type: &quot;SCHEDULED_RELATIVE&quot;, offsetInSeconds: &quot;20&quot;, }, alertInfo: { spokenInfo: { content: [{ locale: &quot;en-US&quot;, text: &quot;Time to get yo banana&quot;, }], }, }, pushNotification: { status: &quot;ENABLED&quot; } } }</code></pre> </li> <li>Let’s use the <code>reminderRequest</code> object and the <code>remindersApiClient</code> to schedule a reminder 20 seconds from now. <pre> <code class="language-javascript">handle(handlerInput) { ... // reminderRequest variable and other variables /* Use the remindersApiClient that was previously instantiated with the reminderRequest object that specifies specifics of the reminder to schedule a reminder */ remindersApiClient.createReminder(reminderRequest) return handlerInput.responseBuilder .speak(&quot;A reminder to get a banana in 20 seconds has been successfully created.&quot;) .getResponse(); }</code></pre> </li> <li>Click <strong>Deploy</strong>.</li> <li>Test the skill on an Alexa-enabled device by saying “<em>Alexa, open my banana stand</em>” (reminders cannot be tested through the simulator).</li> <li>Now let’s utilize the JavaScript ES6 <a href="" target="_blank">async-await</a> and <a href="" target="_blank">try-catch</a> pattern to better handle any potential errors when using the <code>remindersApiClient</code> to make an <a href="" target="_blank">asynchronous HTTP</a> request to schedule a reminder. <ul> <li>To use the async-await pattern to synchronize the asynchronous HTTP request and response start by prepending the <code>handle</code> function with <code>async</code>: <pre> <code class="language-javascript">async handle(handlerInput) { ... // handle function declared variables and code }</code></pre> </li> <li>Now prepend await to <code>remindersApiClient.createReminder(reminderRequest)</code> as such <pre> <code class="language-javascript">async handle(handlerInput) { ... // handle function declared variables and code await remindersApiClient.createReminder(reminderRequest) }</code></pre> </li> <li>To do error handling, let’s wrap <code>await remindersApiClient.createReminder(reminderRequest)</code> with a <code>try-catch</code> <pre> <code class="language-javascript">async handle(handlerInput) { ... // handle function declared variables and code try { await remindersApiClient.createReminder(reminderRequest) } catch(error) { console.log(&quot;~~~~~ createReminder Error ${error} ~~~~~&quot;) return handlerInput.responseBuilder .speak(&quot;There was an error creating your reminder. Please let the skill publisher know.&quot;) .getResponse(); } }</code></pre> </li> </ul> </li> </ol> <h2>Create a Recurring Reminder</h2> <ol> <li>Import the <code>moment-timezone</code> package at the top of the file after <code>const Alexa = require(&quot;ask-sdk-core&quot;)</code> <pre> <code class="language-javascript">const Alexa = require(&quot;ask-sdk-core&quot;), moment = require('moment-timezone') // Add moment-timezone package</code></pre> </li> <li>Update reminder request object to use <code>SCHEDULED_ABSOLUTE</code> and moment timezone package we installed at the beginning of the video to generate a timestamp that conforms with ISO 8601. Please note that the timestamp does not include the <code>Z</code> at the end of the timestamp. Also, to obtain the user’s timezone you can use the <a href="" target="_blank">Alexa Settings API</a> <pre> <code class="language-javascript">const CreateReminderIntentHandler = { canHandle(handlerInput) { ... // can handle logic }, async handle(handlerInput) { ... // Variable declarations ... // Code to check for permissions const currentTime = moment().tz(&quot;America/Los_Angeles&quot;), // Use Moment Timezone to get the current time in Pacific Time reminderRequest = { requestTime: currentTime.format(&quot;YYYY-MM-DDTHH:mm:ss&quot;), // Add requestTime trigger: { type: &quot;SCHEDULED_ABSOLUTE&quot;, // Update from SCHEDULED_RELATIVE scheduledTime: currentTime.set({ hour: &quot;13&quot;, minute: &quot;00&quot;, second: &quot;00&quot; }).format(&quot;YYYY-MM-DDTHH:mm:ss&quot;), timeZoneId: &quot;America/Los_Angeles&quot;, // Set timeZoneId to Pacific Time recurrence: { freq : &quot;DAILY&quot; // Set recurrence and frequency } }, alertInfo: { spokenInfo: { content: [{ locale: &quot;en-US&quot;, text: &quot;Time to get yo daily banana. You better go before the banistas pack up.&quot;, }] } }, pushNotification: { status: &quot;ENABLED&quot; } } } ... // Code to create reminders }</code></pre> <br /> &nbsp;</li> <li>Since we cannot wait around until tomorrow to get this reminder, let's set the daily reminder to start today and have the first reminder be in 20 seconds for example sake. Update the <code>scheduledTime</code> to the following: <pre> <code class="language-javascript">reminderRequest = { requestTime: currentTime.format(&quot;YYYY:MM:DDTHH:mm:ss&quot;), trigger: { type: &quot;SCHEDULED_ABSOLUTE&quot;, scheduledTime: currentTime.add(20, &quot;seconds&quot;).format(&quot;YYYY-MM-DDTHH:mm:ss&quot;), ... // other request parameters } ... // other request parameters }</code></pre> <h2>&nbsp;</h2> </li> </ol> <h2>Conclusion</h2> <p>In this tutorial, we used the Reminders API to create a skill that reminds users to pick up a banana. I think we can rest assured that we will be getting our daily banana! I hope you will take what we went over in this tutorial and adapt it to enhance your skill’s functionality and deepen engagement with your users.</p> <p>We look forward to what you will build! Follow me on Twitter at <a href="">@ItsPanW</a> for more content like this and keep checking the <a href="">Amazon Developer Blogs</a> for updates.</p> <h2>Related Content</h2> <ul> <li><a href="">Alexa Reminders API Overview</a></li> <li><a href="">Alexa Reminders Guidelines for Usage</a></li> <li><a href="">Alexa Reminders API Reference</a></li> <li><a href="">Alexa Reminders API Best Practices</a></li> <li><a href="">Calling Alexa Service APIs</a></li> <li><a href="">Reminder Management Service Client</a></li> <li><a href="">Example Reminders API Skill on GitHub</a></li> <li><a href="">Moment.js Timezone</a></li> <li><a href="">Obtain User’s Timezone with Alexa Settings API</a></li> <li><a href="">Setting Up an Alexa-hosted Skill Video</a></li> <li><a href="">Live Code Along Video - Integrate the Reminders API with Your Skill to Deepen Customer Engagement</a></li> </ul> /blogs/alexa/post/08755b4e-09ba-450b-b241-919aa7802878/representing-data-at-three-levels-of-generality-improves-multitask-machine-learning Representing Data at Three Levels of Generality Improves Multitask Machine Learning Larry Hardesty 2019-08-08T13:00:00+00:00 2019-08-08T13:00:00+00:00 <p>Pooling the training data for related skills, and using it to train the skills simultaneously, improves performance for all of them.</p> <p>Alexa currently has more than 90,000 skills, or abilities contributed by third-party developers — the Uber ride-sharing skill, the Jeopardy! trivia game skill, the Starbucks drink-ordering skill, and so on.</p> <p>To build a skill, a third-party developer needs to supply written examples of customer requests, such as “Order my usual” or “get me a latte”, together with the actions those requests should map to. These examples are used to train the machine learning system that will process real requests when the skill goes live.</p> <p>Constructing lists of sample requests, however, can be labor intensive, and smaller developers could benefit if, during training, their examples were pooled with those for similar skills. In machine learning, more training data usually leads to better performance, and the examples provided by one developer could plug holes in the list of examples provided by another.</p> <p>In a <a href="" target="_blank">paper</a> we presented last week at the annual meeting of the Association for Computational Linguistics, my colleagues and I explore several different techniques for pooling sample requests from different skills when training a natural-language-understanding (NLU) system. We evaluated our techniques using two different public data sets and an internal data set and found that, across the board, training an NLU system simultaneously on multiple skills yielded better results than training it separately for each skill.</p> <p>The advantage of multitask training is that learning the structure of, say, the request “Order me a cab” could also help an NLU system process the request “Order me a sandwich”. The risk is that too much training data about condiments could interfere with the system’s ability to, say, identify cab destinations.&nbsp;</p> <p>To ensure that our system benefits from generalizations about common linguistic structures without losing focus on task-specific structures, we force the machine learning systems in our experiments to learn three different representations of all incoming data.</p> <p>The first is a general representation, which encodes shared information across all tasks. The second is a group-level representation: Each skill’s category is known — for example, the Uber and Lyft skills are in the Travel category, while the CNN and ESPN skills are in the News category. The group-level representations capture commonalities among utterances in a given skill category. Finally, the third representation is task-specific.</p> <p>The machine learning systems we used were encoder-decoder neural networks, which first learn fixed-size representations (encodings) of input data and then use those as the basis for predictions (decoding). We experimented with four different neural-network architectures. The first was a parallel architecture, meaning that each input utterance passed through the general encoder, a group-level encoder, and a task-specific encoder simultaneously, and the resulting representations were combined before passing to a task-specific decoder.</p> <p><img alt="Parallel_architecture.png" src="" style="display:block; height:256px; margin-left:auto; margin-right:auto; width:500px" /></p> <p style="text-align:center"><em><sup>The architecture of our parallel model, which simultaneously learns to perform three tasks (</sup></em><sup>a</sup><em><sup>, </sup></em><sup>b</sup><em><sup>, and </sup></em><sup>c</sup><em><sup>). Tasks </sup></em><sup>a</sup><em><sup> and </sup></em><sup>b</sup><em><sup> belong to the same group (Group 1), task </sup></em><sup>c</sup><em><sup> to a separate group (Group 2).</sup></em></p> <p>The other three networks were serial, meaning that the outputs of one bank of encoders passed to a second bank before moving on to the decoders. The serial architectures differ in the order in which the shared and task-level encodings takes place and in whether the outputs of the first encoder bank are directly available to the decoders.</p> <p><img alt="Serial_architectures.png" src="" style="display:block; height:554px; margin-left:auto; margin-right:auto; width:450px" /></p> <p style="text-align:center"><sup><em>Our three serial architectures. In the first two ((a) and (b)), the outputs of the more general encoders pass to the task-specific encoders before moving on to the decoders. In the second two ((b) and (c)), the outputs (long arrows) of the first bank of encoders are available to the decoders as separate inputs.</em></sup></p> <p>All of these network architectures contain separate encoder modules for individual tasks, groups of tasks, and the “universe” of all tasks. On any given input utterance, a “switch” in the network controls which of the encoders gets to process the utterance. If the user hasn’t mentioned a skill by name, the system determines the intended skill using a predictive model. If, for instance, the utterance is “Get me an Uber to the hotel”, the task-specific Uber encoder, the group-specific Travel skills encoder, and the general universe encoder process it.&nbsp;</p> <p>During the training phase, the group-specific encoders learn how to best encode utterances characteristic of their groups, and the skill-specific encoders learn how to best encode utterances characteristic of their skills. As a result, the decoders, which always make task-specific predictions, can take advantage of three different representations of the input, ranging from general to specific. If a particular skill does not have sufficient training examples, its task-specific representations may be poor, but the group- and universe-level representations can compensate.</p> <p>All of the tasks on which we tested our architectures were joint intent classification and slot-filling tasks. “Intents” are the actions that a voice agent is supposed to take. If an Alexa customer says, “Play ‘Overjoyed’ by Stevie Wonder”, the NLU system should label the whole utterance with the intent PlayMusic. Slots are the data items on which the intent acts. Here, “Overjoyed” should receive the slot tag SongName and “Stevie Wonder” the slot tag ArtistName.</p> <p>To ensure that the group-level and universe-level representations remain general — that the universe-level representations don’t get hung up on the mechanics of condiment requests, for instance — we impose two constraints during training. The first is adversarial: during training, the network is rewarded when it accurately classifies slots and intents but penalized when its group- and universe-level encodings make it easy to predict which skill an utterance belongs to. This prevents task-specific features from creeping into the shared representation space.</p> <p>The second constraint is an orthogonality constraint. Because the outputs of the encoders are of fixed length, they can be interpreted as points in a multidimensional space. During training, the system is rewarded if the points produced by the different types of encoders tend to cluster in different regions of the space — that is, if the task-specific encoders and the shared encoders are capturing different information.</p> <p>We tested our systems on three different data sets and compared their performance to four different single-task baseline systems. On 90 Alexa skills, two of the serial systems (Serial+Highway and Serial+Highway+Swap) yielded significantly better performance on mean intent accuracy and slot F1 (which factors in both false-negative and false-positive rate) over the baseline systems. On any given test, one or another of the multitask systems was consistently the best-performing, with improvements of up to 9% over baseline.&nbsp;</p> <p><em>Mengwen Liu is an applied scientist in the Alexa Search group.</em></p> <p><a href="" target="_blank"><strong>Paper</strong></a>: “Multi-Task Networks With Universe, Group, and Task Feature Learning”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Shiva Pentyala, Markus Dreyer</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Joint Training on Speech Signal Isolation and Speech Recognition Improves Performance</a></li> <li><a href="" target="_blank">With New Data Representation Scheme, Alexa Can Better Match Skills to Customer Requests</a></li> <li><a href="">Reducing Customer Friction through Skill Selection</a></li> </ul> /blogs/alexa/post/b4b33a98-c931-4129-b96a-b2034db2137c/who-s-on-first-how-alexa-is-learning-to-resolve-referring-terms Who’s on First? How Alexa Is Learning to Resolve Referring Terms Larry Hardesty 2019-08-07T13:00:00+00:00 2019-08-08T13:32:44+00:00 <p>Different Alexa services use different names for the same types of data, which makes it hard to track references across dialogues. By learning correlations between data types, a machine learning model can make better decisions about which references to track from one round of dialogue to the next.</p> <p><em><sup>Pushpendre Rastogi cowrote this post with Chetan Naik</sup></em></p> <p>This year, at the Association for Computational Linguistics’ <a href="" target="_blank">Workshop</a> on Natural-Language Processing for Conversational AI, my colleagues and I won one of two best-paper awards for <a href="" target="_blank">our work</a> on <em>slot carryover</em>.</p> <p>Slot carryover is a method for solving the reference resolution problem that arises in the context of conversations with AI systems. For instance, if an Alexa customer asks, “When is <em>Lion King</em> playing at the Bijou?” and then follows up with the question “Is there a good Mexican restaurant near there?”, Alexa needs to know that “there” refers to the Bijou.</p> <p>One of the things that makes reference resolution especially complicated for a large AI system like Alexa is that different Alexa services use different names — or <em>slots</em> — for the same data. A movie-finding service, for instance, might tag location data with the slot name Theater_Location, while a restaurant-finding service might use the slot name Landmark_Address. Over the course of a conversation, Alexa has to determine which slots used by one service should inherit data from which slots used by another.</p> <p><img alt="Long-range_slot_carryover.jpg" src="" style="display:block; height:499px; margin-left:auto; margin-right:auto; width:400px" /></p> <p style="text-align:center"><em><sub>In large AI systems like Alexa, tracking conversational references is difficult because different services (Weather, Directions) use different “slot names” (WeatherCity, Town) for the same data (San Francisco).</sub></em></p> <p>Last year at Interspeech, we presented a <a href="" target="_blank">machine learning system</a> that learned to carry over slots from previous turns of dialogue to the current turn. That system made independent judgments about whether to carry over each slot value from one turn to the next. Even though it significantly outperformed a rule-based baseline system, its independent decision-making was a limitation.</p> <p>In many Alexa services, slot values are highly correlated, and a strong likelihood of carrying over one slot value implies a strong likelihood of carrying over another. To take a simple example, some U.S. services have slots for both city and state; if one of those slots is carried over from one dialogue turn to another, it’s very likely that the other should be as well.</p> <p>Exploiting these correlations should improve the accuracy of a slot carryover system. The decision about whether to carry over a given slot value should reinforce decisions about carrying over its correlates, and vice versa. In our new work, which was spearheaded by <a href="">Tongfei Chen</a>, a Johns Hopkins graduate student who was interning with our group, we evaluate two different machine learning architectures designed to explicitly model such slot correlations. We find that both outperform the system we reported last year.</p> <p><a href="" target="_blank"><img alt="Ruhi_sidebar_bold.png" src="" style="float:right; height:727px; margin:10px; width:450px" /></a>The first architecture we used to model slot interdependencies was a pointer network based on the long short-term memory (LSTM) architecture. Both the inputs and the outputs of LSTMs are sequences of data. A network configuration that uses pairs of LSTMs is common in natural-language understanding (NLU), automatic speech recognition, and machine translation. The first LSTM — the encoder — produces a vector representation of the input sequence, and the second LSTM — the decoder — converts it back into a data sequence. In machine translation, for instance, the vector representation would capture the semantic content of a sentence, regardless of what language it’s expressed in.</p> <p>In our architecture, we used a bidirectional LSTM (bi-LSTM) encoder, which processes the input data sequence both forward and backward. The decoder was a pointer network, which outputs a subset of slots that should be carried over from previous turns of dialogue.</p> <p>The other architecture we considered uses the same encoder as the first one, but it replaces the pointer-generator decoder with a <em>self-attention</em> decoder based on the transformer architecture. The transformer has recently become very popular for large-scale natural-language processing because of its efficient training and high accuracy. Its self-attention mechanism enables it to learn what additional data to emphasize when deciding how to handle a given input.</p> <p>Our transformer-based network explicitly compares each input slot to all the other slots that have been identified in several preceding turns of dialogue, which are referred to as the dialogue context. During training, it learns which slot types from the context are most relevant when deciding what to do with any given type of input slot.</p> <p>We tested our system using two different data sets, one a standard public benchmark and the other an internal data set. Each dialogue in the data sets consisted of several turns, alternating between customer utterances and system responses. A single turn might involve values for several slots. On both datasets, the new architectures, which are capable of modeling slot interdependencies, outperformed the systems we published last year, which make decisions independently.</p> <p>Overall, the transformer system performed better than the pointer generator, but the pointer generator did exhibit some advantages in recognizing slot interdependencies across longer spans of dialogue. With the pointer generator architecture, we also found that ordering its slot inputs by turn improved performance relative to a random ordering, but further ordering slots <em>within</em> each turn lowered performance.&nbsp;</p> <p>Suppose, for instance, that a given turn of dialogue consisted of the customer instruction “Play ‘Misty’ by Erroll Garner”, which the NLU system interpreted as having two slots, Song_Name (“Misty”) and Artist_Name (“Erroll Garner”). The bi-LSTM fared better if we didn’t consistently follow the order imposed by the utterance (Song_Name first, then Artist_Name) but instead varied the order randomly. This may be because random variation helped the system generalize better to alternate phrasings of the same content (“Play the Erroll Garner song ‘Misty’”).</p> <p>Going forward, we will investigate further means of improving our slot carryover methodology, such as <a href="" target="_blank">transfer learning</a> and the addition of more data from the dialogue context, in order to improve Alexa’s ability to resolve references and deliver a better experience to our customers.</p> <p><em>Chetan Naik and Pushpendre Rastogi are applied scientists in the Alexa AI organization.</em></p> <p><a href="" target="_blank"><strong>Paper</strong></a>: “Improving Long Distance Slot Carryover in Spoken Dialogue Systems”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Tongfei Chen, Hua He, Lambert Mathias</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Teaching Alexa to Follow Conversations</a></li> <li><a href="" target="_blank">Amazon Unveils Novel Alexa Dialog Modeling for Natural, Cross-Skill Conversations</a></li> <li><a href="" target="_blank">Innovations from the 2018 Alexa Prize</a></li> <li><a href="" target="_blank">How Alexa Is Learning to Converse More Naturally</a></li> <li><a href="" target="_blank">3 Questions with Dilek Hakkani-T&uuml;r</a></li> </ul>