Alexa Blogs Alexa Developer Blogs /blogs/alexa/feed/entries/atom 2019-08-22T13:32:42+00:00 Apache Roller /blogs/alexa/post/dbf34a42-70a4-475c-a2af-e2d876c18371/neural-text-to-speech-makes-speech-synthesizers-much-more-versatile Neural Text-to-Speech Makes Speech Synthesizers Much More Versatile Larry Hardesty 2019-08-22T13:00:00+00:00 2019-08-22T13:32:42+00:00 <p>Two Interspeech papers report a system that transfers prosody — inflection and rhythm — from a recorded speaker to a synthesized voice and a neural vocoder that works with any speaker.</p> <p><sup><em>Viacheslav Klimkov cowrote this post with&nbsp;Jaime Lorenzo Trueba</em></sup></p> <p>A text-to-speech system, which converts written text into synthesized speech, is what allows Alexa to respond verbally to requests or commands. Through a service called <a href="" target="_blank">Amazon Polly</a>, text-to-speech is also a technology that Amazon Web Services offers to its customers.</p> <p>Last year, both Alexa and Polly evolved toward neural-network-based text-to-speech systems, which synthesize speech from scratch, rather than the earlier unit-selection method, which strung together tiny snippets of pre-recorded sounds.</p> <p>In user studies, people tend to find speech produced by neural text-to-speech (NTTS) systems more natural-sounding than speech produced by unit selection. But the real advantage of NTTS is its adaptability, something we <a href="" target="_blank">demonstrated</a> last year in our work on changing the speaking style (“newscaster” versus “neutral”) of an NTTS system.</p> <p>At this year’s Interspeech, two <a href="" target="_blank">new</a> <a href="" target="_blank">papers</a> from the Amazon Text-to-Speech group further demonstrate the adaptability of NTTS. One is on prosody transfer, or synthesizing speech that mimics the prosody — shifts in tempo, pitch, and volume — of a recording. In essence, prosody transfer lets you choose whose voice you will hear reading back recorded content, with all the original vocal inflections preserved.</p> <p>The other paper is on <em>universal vocoding</em>. An NTTS system outputs a series of spectrograms, snapshots of the energies in different audio frequency bands over short periods of time. But spectrograms don’t contain enough information to directly produce a natural-sounding speech signal. A vocoder is required to fill in the missing details.</p> <p>A typical neural vocoder is trained on data from a single speaker. But in our paper, we report a vocoder trained on data from 74 speakers in 17 languages. In our experiments, for any given speaker, the universal vocoder outperformed speaker-specific vocoders — even when it had never seen data from that particular speaker before.</p> <p>Our <a href="" target="_blank">first paper</a>, on prosody transfer, is titled “Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-to-Speech”. Past attempts at prosody transfer have involved neural networks that take speaker-specific spectrograms and the corresponding text as input and output spectrograms that represent a different voice. But these tend not to adapt well to input voices that they haven’t heard before.</p> <p>We adopted several techniques to make our network more general, including not using raw spectrograms as input. Instead, our system uses prosodic features that are easier to normalize.</p> <p>First, our system aligns the speech signal with the text at the level of <em>phonemes</em>, the smallest units of speech. Then, for each phoneme, the system extracts prosodic features — such as changes in pitch or volume — from the spectrograms. These features can be normalized, which makes them easy to apply to new voices.</p> <table align="center" border="0" cellpadding="1" cellspacing="1" style="width:650px"> <tbody> <tr> <td><em>“But Germany thinks she can manage it … ”</em></td> <td>&nbsp;<a href="">Original</a>&nbsp;</td> <td><a href="">Transferred</a></td> <td><a href="">Synthesized</a></td> </tr> <tr> <td><em>&quot;I knew of old its little ways ... &quot;</em></td> <td>&nbsp;<a href="">Original</a>&nbsp;</td> <td><a href="">Transferred</a></td> <td><a href="">Synthesized</a></td> </tr> <tr> <td><em>“Good old Harry … ”</em></td> <td>&nbsp;<a href="">Original</a>&nbsp;</td> <td><a href="">Transferred</a></td> <td><a href="">Synthesized</a></td> </tr> </tbody> </table> <p style="text-align:center"><sup><em>Three different versions of the same three text excerpts. &quot;Original&quot; denotes the original recording of the text by a live speaker. &quot;Transferred&quot; denotes a synthesized voice with prosody transferred from the original recording by our system. And &quot;Synthesized&quot; denotes the synthesis of the same excerpt from scratch, using existing Amazon TTS technology.</em></sup></p> <p>This approach works well when the system has a clean transcript to work with — as when, for instance, the input recording is a reading of a known text. But we also examine the case in which a clean transcript isn’t available.</p> <p>In that instance, we run the input speech through an automatic speech recognizer, like the one that Alexa uses to process customer requests. Speech recognizers begin by constructing multiple hypotheses about the sequences of phonemes that correspond to a given input signal, and they represent those hypotheses as probability distributions. Later, they use higher-level information about word sequence frequencies to decide between hypotheses.</p> <p>When we don’t have reliable source text, our system takes the speech recognizer’s low-level phoneme-sequence probabilities as inputs. This allows it to learn general correlations between phonemes and prosodic features, rather than trying to force acoustic information to align with transcriptions that may be inaccurate.</p> <p>In experiments, we find that the difference between the outputs of this textless prosody transfer system and a system trained using highly reliable transcripts is statistically insignificant.</p> <p style="text-align:center"><img alt="Prosody_transfer_architecture.jpg" src="" style="display:block; height:309px; margin-left:auto; margin-right:auto; width:650px" /><sup><em>The architecture of our prosody transfer system, both when speech transcripts are available (top left) and when they're not (top right). &quot;Posteriograms&quot; are sets of phonemic features predicted by an automatic speech recognition system.</em></sup></p> <p>Our <a href="" target="_blank">second paper</a> is titled “Towards Achieving Robust Universal Neural Vocoding”. In the past, researchers have used data from multiple speakers to train neural vocoders, but they didn’t expect their models to generalize to unfamiliar voices. Usually, the input to the model includes some indication of which speaker the voice belongs to.&nbsp;</p> <p>We investigated whether it is possible to train a universal vocoder to attain state-of-the-art quality on voices it hasn’t previously encountered. The first step: create a diverse enough set of training data that the vocoder can generalize. Our data set comprised about 2,000 utterances each from 52 female and 22 male speakers, in 17 languages.</p> <p>The next step: extensive testing of the resulting vocoder. We tested it on voices that it had heard before, voices that it hadn’t, topics that it had encountered before, topics that it hadn’t, languages that were familiar (such as English and Spanish), languages that weren’t (Ahmaric, Swahili, and Wolof), and a wide range of unusual speaking conditions, such as whispered or sung speech or speech with heavy background noise.</p> <p>We compared the output of our vocoder to that of four baselines: natural speech, speaker-specific vocoders, and generalized vocoders trained on less diverse data — three- and seven-speaker data sets. Five listeners scored every output utterance of each vocoder according to the multiple stimuli with hidden reference and anchor (MUSHRA) test. Across the board, our vocoder outperformed the three digital baselines and usually came very close to the scores for natural speech.</p> <p><em>Jaime Lorenzo Trueba and Viacheslav Klimkov are applied scientists&nbsp;in the Amazon Text-to-Speech Group.</em></p> <p><strong>Papers</strong>:&nbsp;</p> <p>“<a href="" target="_blank">Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-to-Speech</a>”<br /> “<a href="" target="_blank">Towards Achieving Robust Universal Neural Vocoding</a>”</p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Thomas Drugman, Srikanth Ronanki, Jonas Rohnke, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Should Alexa Read “2/3” as “Two-Thirds” or “February Third”?: The Science of Text Normalization</a></li> <li><a href="" target="_blank">Training Speech Synthesizers on Data from Multiple Speakers Improves Performance, Stability</a></li> <li><a href="" target="_blank">Varying Speaking Styles with Neural Text-to-Speech</a></li> </ul> /blogs/alexa/post/11d398df-69f1-46ee-ad6e-ec716c9b7eca/tipps-f%C3%BCr-die-zertifizierung-von-kommerziellen-alexa-skills Tipps f&uuml;r die Zertifizierung von kommerziellen Alexa Skills Kristin Fritsche 2019-08-22T08:00:00+00:00 2019-08-22T09:28:38+00:00 <p><img alt="Tipps f&uuml;r die Zertifizierung von kommerziellen Alexa Skills" src="" style="height:480px; width:1908px" /></p> <p>Alle Skills, die im Alexa Skills Store ver&ouml;ffentlicht werden, m&uuml;ssen zuvor einen <a href="https://;sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Zertifizierungsprozess</a> durchlaufen. Die zertifizierten Skills erf&uuml;llen die Anforderungen an Sicherheit, Datenschutz, Zertifizierungs-Richtlinien, Funktionalit&auml;t und die korrekte Entwicklung der Sprachoberfl&auml;che.</p> <p><img alt="Tipps f&uuml;r die Zertifizierung von kommerziellen Alexa Skills" src="" style="height:480px; width:1908px" /></p> <p>Alle Skills, die im Alexa Skills Store ver&ouml;ffentlicht werden, m&uuml;ssen zuvor einen <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Zertifizierungsprozess</a> durchlaufen. Die zertifizierten Skills erf&uuml;llen die Anforderungen an Sicherheit, Datenschutz, Zertifizierungs-Richtlinien, Funktionalit&auml;t und die korrekte Entwicklung der Sprachoberfl&auml;che.</p> <p>Uns liegt aber nicht nur das Nutzererlebnis am Herzen, wir m&ouml;chten auch dir als Entwickler die Skill-Entwicklung so leicht und angenehm wie m&ouml;glich machen.<br /> <br /> &Uuml;ber <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs">In-Skill-K&auml;ufe</a> (ISP) kannst du Premium-Inhalte, wie z. B. Funktionen f&uuml;r Spiele oder interaktive Geschichten, mit einem benutzerdefinierten Interaktionsmodell verkaufen. F&uuml;r In-Skill-K&auml;ufe werden die Zahlungsoptionen genutzt, die schon mit dem Amazon-Konto des jeweiligen Nutzers verkn&uuml;pft sind. Der Kauf von In-Skill-Produkten sollte f&uuml;r die K&auml;ufer ganz einfach und bequem sein. Nutzer fragen vielleicht nach In-Skill-Produkten oder reagieren bei der Interaktion mit deinem Skill auf deine Kaufoptionen.</p> <p>In diesem Blog Post erkl&auml;ren wir einige typische Fehler im Zusammenhang mit der Skill-Zertifizierung, die sich einfach vermeiden lassen. Wenn du schon im Vorfeld darauf achtest, erleichterst du dir den Zertifizierungsprozess.</p> <h2>1. Nur in Sprachmodellen, die In-Skill-K&auml;ufe unterst&uuml;tzen, auf Premium-Inhalte hinweisen</h2> <p>Zurzeit sind <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">In-Skill-K&auml;ufe</a> nur f&uuml;r Alexa Skills in <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">ausgew&auml;hlten Sprachen</a> verf&uuml;gbar. Wenn dein Skill auch Sprachen unterst&uuml;tzt, die In-Skill-K&auml;ufe nicht unterst&uuml;tzen, achte bitte darauf, dass du in den nicht unterst&uuml;tzten Sprachen nicht auf Premium-Inhalte hinweist. Zum Beispiel darf deine Begr&uuml;&szlig;ung, Skill-Interaktionen und Hilfemeldungen f&uuml;r die nicht unterst&uuml;tzen Sprachen keine Hinweise auf Premium-Inhalte enthalten. Tipps zum Hinzuf&uuml;gen von In-Skill-Produkten in unterst&uuml;tzten Sprachen findest du auf den Seiten <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Choose Pricing, Languages, and Distribution for In-Skill Products</a> (Preise, Sprachen und Verbreitung von In-Skill-Produkten) und <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Use In-Skill Product Service APIs</a> (In-Skill-Produktservice-APIs verwenden). Au&szlig;erdem ist wichtig, dass deine Skills mit In-Skill-K&auml;ufen grunds&auml;tzlich <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">keine Kaufaufforderung oder Kaufinformationen in der Skill-Beschreibung, in den Example Phrases, in der „Neuigkeiten“-Sektion („What&acute;s New“) oder auf der Account-Verkn&uuml;pfungsseite</a> enthalten.</p> <h2>2. Nutzern die Stornierung oder Erstattung von K&auml;ufen erm&ouml;glichen</h2> <p>Bei allen kommerziellen Alexa Skills m&uuml;ssen Nutzer ihre K&auml;ufe stornieren oder sich erstatten lassen k&ouml;nnen. Daf&uuml;r musst du einen benutzerdefinierten Intent erstellen, der Erstattungs-/Stornierungsanforderungen unterst&uuml;tzt, und Code zur Verarbeitung des benutzerdefinierten Intents hinzuf&uuml;gen, der den Stornierungsprozess durch Senden einer Anweisung („Directive“) startet. Im Abschnitt <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Handle a refund or cancel request</a> (Erstattungs- oder Stornierungsanforderung verarbeiten) der Dokumentation wird leicht verst&auml;ndlich erkl&auml;rt, wie du diese Funktion implementierst.</p> <h2>3. Wirkungsvolle Kaufvorschl&auml;ge (Upsells) f&uuml;r jedes In-Skill-Produkt anbieten</h2> <p>Biete deine Produkte passend zur Interaktion des Nutzers mit deinem Skill proaktiv an. Daf&uuml;r musst du &uuml;berpr&uuml;fen, ob der Nutzer ein Produkt aus der gespeicherten Liste besitzt, und das Upsell-Produkt zusammen mit einer passenden Nachricht an den Amazon-Kaufprozess &uuml;bergeben. F&uuml;ge f&uuml;r Kaufvorschl&auml;ge Code ein, der den Kaufprozess mit einer Anweisung startet. Ressourcen, in denen die Implementierung von Kaufvorschl&auml;gen erkl&auml;rt wird, findest du im Abschnitt <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Offer Purchase Suggestions</a> (Kaufvorschl&auml;ge anbieten) und <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Help Customers find your in-skill products</a> (Nutzer zu deinen In-Skill-Produkten leiten) der Dokumentation. Die Upsell-Nachricht darf keine Preise oder Details enthalten, da diese bereits im Kaufprozess von Alexa angegeben werden. Binde am besten auch <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">eine Erinnerung</a> in deine Skill-Interaktion ein, um die Nutzer zu motivieren, sich den Premium-Inhalt anzusehen.</p> <h2>4. Direkte Unterst&uuml;tzung f&uuml;r Kaufanfragen hinzuf&uuml;gen</h2> <p>Nutzer sollten die M&ouml;glichkeit haben, In-Skill-Produkte direkt zu kaufen, ohne den Umweg &uuml;ber einen Kaufvorschlag w&auml;hrend der Skill-Interaktion gehen zu m&uuml;ssen. Erstelle hierf&uuml;r einen benutzerdefinierten Intent, der Kaufanfragen unterst&uuml;tzt. F&uuml;ge den Code entsprechend zur Verarbeitung des benutzerdefinierten Intents hinzu, der den Kaufprozess durch Senden einer Anweisung startet. Weitere Informationen zur Implementierung einer direkten Kaufanfrage findest du im Abschnitt <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Add support for purchase requests</a> (Unterst&uuml;tzung f&uuml;r Kaufanfragen hinzuf&uuml;gen) der Dokumentation.</p> <h2>5. F&uuml;hre deine Nutzer nach dem Kaufprozess zur&uuml;ck zu den Skill-Inhalten</h2> <p>Ganz gleich, ob der Nutzer einen Einmalkauf oder ein Verbrauchsprodukt kauft oder nicht oder ein Abo abschlie&szlig;t, du musst ihn anschlie&szlig;end wieder „elegant“ zu deinem Skill zur&uuml;ckleiten. Wenn der Nutzer das Produkt kauft oder ein Abo abschlie&szlig;t, sollte er unbedingt sofort nach dem Kauf Zugang auf die Premium-Inhalte erhalten. Du solltest auch auf jeden Fall eine Alternative vorbereiten, falls der Nutzer sich gegen den Kauf entscheidet. Weitere Informationen hierzu findest du unter <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Handling the post-purchase flow</a> (Abwicklung der Abl&auml;ufe nach dem Kauf).</p> <h2>Hilfe und Unterst&uuml;tzung</h2> <p>Falls du mal Probleme hast oder etwas unklar ist, helfen wir dir gerne weiter! In unserer Feedback-E-Mail zur Skill-Zertifizierung findest du eine Zusammenfassung aller gefundenen Probleme, Schritt-f&uuml;r-Schritt-Anleitungen zur Reproduzierung jedes Problems (falls n&ouml;tig) und Schritte zur Fehlerbehebung, damit du deinen Skill startklar f&uuml;r die Zertifizierung machen kannst. Falls du das Gef&uuml;hl hast, dass wir die Implementierung deines Skills falsch verstanden haben, dann gib dies bitte im <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Feld „Testing Instructions“</a> in der Developer Console an.<br /> <br /> Wenn du bei der Skill-Erstellung nicht weiterkommst oder Fragen hast, kannst du in unserem <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Entwicklerforum</a> Fragen stellen und dich mit anderen Alexa-Skill-Entwicklern austauschen oder uns &uuml;ber unser <a href="" target="_blank">Kontaktformular</a> an uns wenden.</p> <h2>Ressourcen</h2> <ul> <li>Blog: <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs">Build a Monetized Fact Skill with the Premium Facts Sample Skill</a></li> <li>Blog: <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs">Code Deep Dive: Implementing In-Skill Purchasing for Entitlements with Node.js</a></li> <li>Blog: <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs">Best Practices for Building an Effective Monetized Skill That Is Eligible for Amazon Promotion</a></li> <li><a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Dokumentation: ISP Overview</a></li> <li><a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=DEISP&amp;sc_publisher=BL&amp;sc_content=Content&amp;sc_funnel=Publish&amp;sc_country=DE&amp;sc_medium=Owned_WB_DEISP_BL_Content_Publish_DE_DEDevs&amp;sc_segment=DEDevs" target="_blank">Guide: Geld verdienen mit Alexa Skills</a></li> </ul> /blogs/alexa/post/edc40009-1eda-4ffa-b029-ae6d38c848eb/ben-ursu-spins-the-fork-on-the-road-and-wins-bonus-prize-in-alexa-skills-challenge-multimodal Ben Ursu Spins the Fork On The Road and Wins Bonus Prize in Alexa Skills Challenge: Multimodal June Lee 2019-08-21T14:00:00+00:00 2019-08-21T14:00:00+00:00 <p><img alt="" src="" /></p> <p>Read the blog to learn how Ursu created <a href="">Fork On The Road,</a> an engaging multimodal skill that helps users choose between multiple options by combining an engaging voice-first experience with three-dimensional graphics.</p> <p><img alt="" src="" /></p> <p>For years, professional developer Ben Ursu has built technology that brings immersive visual and augmented reality experiences to life. But Ursu was curious how he might integrate such powerful visual interfaces with another engaging technology: voice. When Amazon announced the <a href="">Alexa Skills Challenge: Multimodal</a>, he knew it was the perfect opportunity to build his first Alexa skill. By combining an engaging voice-first experience with three-dimensional graphics, Ursu created <a href="">Fork On The Road</a>, a skill which helps the user choose between multiple options. His efforts won him the Bonus Prize for Best Multimodal Kitchen Experience and a total of $8,000 cash.</p> <p>“A big driver for me—besides the contest itself—was the opportunity to build a skill connecting different types of applications, features, and content,” said Ursu. “The Alexa Skills Challenge: Multimodal was the perfect opportunity to work on it.”</p> <p>Working for an agency that brings innovations to the table for big brands, Ursu is used to visual technologies, but found voice intriguing. When he dug into the <a href="">Alexa Presentation Language</a> (APL), he knew he’d found the way to marry his love of visual effects with the opportunity of voice. APL allows developers to add visual and touch elements to make their skills more delightful and engaging for customers with Alexa-enabled devices with screens of different sizes and shapes, such as Echo Spot, Echo Show, and Fire TV. The multimodal challenge gave him a reason to explore and create his first Alexa skill, and it has opened up opportunities he hadn’t previously imagined with either voice or visual interfaces alone.</p> <p>“Winning in the Alexa Skills Challenge showed me the tremendous opportunity of combining voice with complex visual experiences,” said Ursu. “Looking at features alone, multimodal puts you in a different category. Alexa already allows developers to do more than other voice technologies, but with APL, you have the opportunity to create voice-first skills that are visually stunning.”</p> <h2>Alexa Introduces a Visual Technologist to the World of Voice<strong> </strong></h2> <p>Ursu has been building user experiences with interactivity and animation for more than three decades. Starting in the 1990s with website development, he graduated to building visual application interfaces, 3D experiences for the web, and—most recently—to virtual and augmented reality. As one of the creators of <a href="">Spark AR Studio</a>, Ursu has built software that lets anyone create augmented reality effects in minutes, without writing any code. Today, he’s expanding his personal experience and fostering his team’s abilities and opportunities, especially in voice skills with visual interfaces.</p> <p>Ursu’s curiosity about voice user interfaces began when Amazon introduced Echo Show, its first Alexa-enabled device with a screen. When APL came along, that curiosity grew. By the time the Alexa Skills Challenge: Multimodal was announced, Ursu knew it was time to get serious. He dove into AWS, networking, and the APIs available in the <a href="">Alexa Skills Kit</a>, and was convinced that he could add Alexa development techniques to his visual effects experience.</p> <p>“It was the combination of many technologies that really allowed me to make Fork On The Road the way I did,” said Ursu. “Being able to piece together several different technologies like that, and understand how the underlying AWS structure works, allowed me to flex a bit and learn new things while creating this skill.”</p> <h2>Family Movie Night Inspires a Winning Voice-First Multimodal Skill</h2> <p>Inspiration for Fork On The Road struck Ursu when trying to solve an age-old family dilemma: What should we watch on TV tonight? He asked Alexa to help by flipping a coin, but found he often needed to decide between more than two options. That’s when Ursu had the idea to make a multimodal decision-making skill.</p> <p>“I'm always looking towards real-life scenarios and problems to solve,” said Ursu. “I already let artificial intelligence help me make many important decisions in my life, so this was the perfect inspiration for my first Alexa skill. That’s how Fork On The Road was born.”</p> <p>Because the objective of the multimodal challenge was to create a voice-first—but not voice-only—skill, Ursu could call on his visual development skills and bring his skill to life with dynamic 3D images and animation. Employing simple design with elegant execution, Fork On The Road prompts the user to name up to four different items from which they want to choose, which the skill displays at a “crossroads” on the screen. Alexa then prompts the user to “spin the fork,” displaying a 3D image of a fork which spins until it comes to rest on one of the options, making the decision for the user.</p> <p>Ursu used APL’s capabilities to incorporate multiple technologies to perform the 3D scene for the skill, making many functions work together in a cohesive visual experience that appeals to a wide audience.</p> <p>“A skill like Fork On The Road appeals to a wide, growing audience,” said Ursu. “From two-year-olds to grandmothers, people of all ages and backgrounds use Alexa as part of their daily lives.”</p> <h2>A Visual Developer Looks to Voice to Create Even More Engaging Experiences</h2> <p>Fork On The Road may have been Ursu’s first experiment in developing for voice, but it certainly will not be the last. After his win in the Alexa Skills Challenge: Multimodal, he’s more excited than ever by the opportunities for voice developers. With the ability to combine voice with complex visual experiences, Ursu intends to bring these elements together again in future projects for both his clients and himself. In developing Fork On The Road, Ursu found the key to a rich multimodal experience is to develop the voice-first experience. He added the visual elements only after he had a engaging voice-first skill. He developed the visuals for a small screen first, like the Echo Show, and worked his way up in size to the Fire TV. That way, his skill can reach the broadest audience without relying on one particular Alexa device.</p> <p>“I’ve always focused on visual front ends but building Fork On The Road was so interesting that now I want to build more Alexa skills,” said Ursu. “With Alexa you have the ability to reach many different people and personalize the experience with both voice and visuals. The way I see it, by coupling voice interaction with great visuals, we can build richer, more engaging experiences for our customers.”</p> <h2>Related Resources</h2> <p>Check out the APL resources below and get started with building your own multimodal skills today.</p> <ul> <li><a href="">Alexa Presentation Language Technical Documentation</a></li> <li><a href="">10 Tips for Designing Alexa Skills with Visual Responses</a></li> <li><a href="">4 Tips for Designing Voice-First Alexa Skills for Different Alexa-Enabled Devices</a></li> <li><a href="">How to Design Visual Components for Voice-First Alexa Skills</a></li> <li><a href="">How to Get Started with the Alexa Presentation Language to Build Multimodal Alexa Skills</a></li> </ul> <p>&nbsp;</p> /blogs/alexa/post/94816656-5510-4128-a775-91dcd6af2f4d/how-to-reduce-cognitive-load-for-voice-design How to Reduce Cognitive Load for Voice Design Emma Martensson 2019-08-19T14:00:00+00:00 2019-08-19T14:00:00+00:00 <p><img alt="How to Reduce Cognitive Load for Voice Design" src="" /></p> <p>Voice User Interfaces (VUIs) take up more mental resources than Graphical User Interfaces (GUIs), because information is auditory and presented serially, whereas in a GUI information is visual and presented at once.</p> <p><img alt="How to Reduce Cognitive Load for Voice Design" src="" /></p> <p><em>Today’s guest blog post is from <a href="" target="_blank">Maria Spyropoulou</a>, Speech Systems Analyst at <a href="" target="_blank">Eckoh</a>. Maria is helping to design the dialogue flow and prompts of their services, and is also heavily involved in intent creation and classification among others. </em></p> <p><a href="">Voice User Interfaces (VUIs)</a> take up more mental resources than Graphical User Interfaces (GUIs), because information is auditory and presented serially, whereas in a GUI information is visual and presented at once. Voice browsing is a lot more complex than web browsing. For this reason, when you are building your Alexa skill, you have to design in a way that reduces cognitive load as much as possible.</p> <p>Taking inspiration from <a href=";qid=1563979551&amp;s=gateway&amp;sr=8-1" target="_blank">John Sweller’s cognitive load theory</a>, there are three types of Cognitive Load: Intrinsic Load, Extraneous Load, and Germane Load. Intrinsic Load is the inherent load of every concept (2 + 2 is objectively easier to process than 2^78-456). Extraneous Load relates to efficient presentation of information (if you want to explain to someone the idea of a circle, drawing a circle is a much more efficient presentation of the concept than describing with words what a circle is). Germane Load relates to the mental models of our mind. All people have been having conversations since they were children and they have a mental model of what a conversation is supposed to be like (there is a start, a middle and a finish for example). Moreover, many people have been using Alexa skills and they have a mental model for what interacting with a voice service should be like. VUI designers can’t reduce the Intrinsic Load, but they can reduce Extraneous Load and Germane Load.</p> <h2>Tips to Reduce Extraneous Load and Germane Load</h2> <h3>1. Using Just-In-Time Commands</h3> <p>You should aim to present the available options to the user only in time of need and not a second sooner, and only when it makes sense for the structure of your application. For example, you wouldn’t say:</p> <p><em>“Welcome to Happy Groceries. You can add items to your basket, listen to your previous orders, submit a claim and checkout.”</em></p> <p>It doesn’t make sense for someone to ‘checkout’, unless they have added products to their basket. Only when the user has added groceries to the basket should you present the option to checkout:</p> <p><em>“If you’d like me to go ahead and purchase the items in your basket, say ‘checkout’.”</em></p> <h3>2. Using Sound Effects as Metaphors</h3> <p>Sound effects are a great way to set context and create atmosphere. Sound effects can be used as metaphors for concepts in order to help the user visualize and conceptualize the application structure and feel. Sound effects are great for games, but can be used for all sorts of things. For example, if you have an application that can book plane tickets, you can use a sound effect in the place of speech as a progressive response before your full response:</p> <p><em>“Book me a flight for Barcelona this Saturday”</em></p> <p><em>(Jet taking off sound while you’re waiting on your API connection data)</em></p> <p><em>“I have 2 flights this Saturday for Barcelona.”</em></p> <p>This fills the waiting time in a more creative way than speech (for example <em>“getting your results…”</em>) and also signifies that the skill is looking for flights, or preparing some flight information. Keep in mind that audio in progressive responses is limited to <a href="" target="_blank">30 seconds</a>. You can find sound effects <a href="" target="_blank">here</a> and read about them <a href="" target="_blank">here</a>.</p> <h3>3. Using Universals</h3> <p>Your skill should implement intents to catch universals as defined by the TSSC 2000 (Telephone Speech Standard Committee) and ETSI 2002 (European Telecommunication Standards Institute) standards. Some common ones would be <em>help</em>, <em>repeat</em>, <em>stop</em>, <em>go back</em>, <em>main menu</em>, <em>goodbye. </em>For example, if you have built a navigation game, you can create a contextual help message to help the user get unstuck.</p> <p><em>“You can return to the forest or unlock the barn. What would you like to do?”</em></p> <p><em>“Help”</em></p> <p><em>“Your goal in this game is to find the hidden treasure. You have collected one key so far. Would you like to return to the forest or unlock the barn?”</em></p> <p>You can leverage the built-in <em>AMAZON.HelpIntent</em> handler and configure the back-end in an appropriate way, depending on what you want the outcome to be. More information on universals can be found <a href="" target="_blank">here</a>.</p> <h3>4. Using Discourse Markers</h3> <p>The VUI should resemble an everyday conversation as much as possible, and this includes using conversation management markers also called discourse markers. For example, if your skill is failing to access the user’s settings, instead of playing this prompt:</p> <p><em>“Unable to access account”</em></p> <p>You should use more natural language with plenty of discourse markers to signify that something went wrong, and that you are also presenting the reason for the failure:</p> <p><em>“I’m sorry, but due to technical reasons I can’t access your account right now.”</em></p> <p>The purpose of the discourse markers is to reduce cognitive load, as they are used to introduce a new topic (<em>by the way..</em>), to denote something gone wrong (<em>sorry, due to..</em>), to give feedback that the user and the system are on the same page (<em>thanks, okay, great, so this is what you requested..</em>), serial information will be given (<em>first, second, third, here are your options..</em>) etc. You can find more information <a href="" target="_blank">here</a>.</p> <h2>Tips for Alexa Presentation Language (APL)</h2> <p>Remember that information processing for voice is different than visual. In voice you should put the focal information at the very end of the prompt, as this will reduce the cognitive and memory load for the user. When you are designing with APL, you have to keep in mind that with visual interfaces, you should put the focal information first, at the very top of the screen.</p> <h2>Tips for In-Skill Purchasing (ISP)</h2> <p>Something you should avoid, is to offer a premium version of the skill as soon as the user has opened the skill for the first time. This will confuse and could annoy customers. Ideally you could suggest premium features at the end of the first session or at the beginning of subsequent sessions. You can use dynamic entities to keep track of first time and returning users and configure your code accordingly.</p> <h2>Related Content:</h2> <ul> <li><a href="">Guiding Users with Successful Alexa Prompts for Custom Slots</a></li> <li><a href="">Writing Great Prompts for Built-in Slots in Alexa Skills</a></li> <li><a href="">Build a Strong Language Model to Get the Most Out of Dynamic Entities</a></li> <li><a href="">How to Write Engaging Dialogs for Alexa Skills</a></li> <li><a href="">About In-Skill Purchasing</a></li> <li><a href="" target="_blank">About Alexa Presentation Language </a></li> </ul> /blogs/alexa/post/ca2cfbfb-37a2-49de-840c-f06f6ad8b74d/introducing-custom-interfaces-enabling-developers-to-build-dynamic-gadgets-games-and-smart-toys-with-alexa Introducing Custom Interfaces, Enabling Developers to Build Dynamic Gadgets, Games, and Smart Toys with Alexa Karen Yue 2019-08-15T17:39:52+00:00 2019-08-15T18:55:55+00:00 <p><a href="" target="_blank"><img alt="Alexa Smart Toys" src="" style="height:480px; width:1908px" /></a></p> <p>Today, we are excited to introduce new developer tools that enable you&nbsp;to connect gadgets, games, and smart toy products with immersive skill-based content—unlocking creative ways for customers to experience your product. This is made possible using Custom Interfaces.</p> <p><a href="" target="_blank"><img alt="Alexa Smart Toys" src="" /></a></p> <p>Since launching Alexa more than four years ago, customers have purchased more than 100 million Alexa-enabled devices, allowing them to interact with products in new and&nbsp;engaging ways. Today, we are excited to introduce new developer tools that enable you&nbsp;to connect <a href="" target="_blank">gadgets, games, and smart toy products</a> with immersive skill-based content—unlocking creative ways for customers to experience your product. This is made possible using Custom Interfaces, the newest feature available&nbsp;in the <a href="" target="_blank">Alexa Gadgets Toolkit</a>.</p> <h2>Explore the Fun Side of Alexa: Gadgets, Games, and Smart Toys</h2> <p>Gadgets, games, and smart toys come in all shapes and sizes, and for all ages. With Custom Interfaces, you can design dynamic interactions with Alexa that span multiple product categories from board games and action figures, to gizmos and novelties. For example, a basketball hoop for your office that lights up the scoreboard when you say “Alexa, tell Basketball Hoop to start a game,” and triggers Alexa’s response when you score.</p> <p><a href="" target="_blank"><img alt="" src="" style="float:right; height:202px; padding-left:10px; width:500px" /></a>These ideas are also possible with Custom Interfaces:</p> <ul> <li>A mini keyboard that turns Alexa into a piano teacher, lighting up keys that correspond to a given song and providing feedback on whether you have pressed the right sequence of keys.</li> <li>An indoor drone that flies when you say “Alexa, tell my drone to fly in a figure 8,”&nbsp;and triggers Alexa to play a tune upon landing.</li> <li>A game printer that creates a game sheet when you say, “Alexa, tell Game Printer to give me a Sudoku puzzle.&quot;</li> <li>A dog toy that counts how many times your dog plays fetch, and lights up green when a 20-minute session has concluded.</li> </ul> <h2>The Benefits of Custom Interfaces</h2> <p>With Custom Interfaces, you can build products that can be updated with new functionality and refreshed content to enhance the overall interactive experience. You can also offer premium product features that can be unlocked through in-skill purchasing. Custom Interfaces support the following:</p> <ul> <li><strong>Direct Communication:</strong> Facilitate connection and communication between your product and Alexa, removing the burden of creating a device cloud and customer account management infrastructure.</li> <li><strong>Dynamic Voice Interactions</strong>: Design robust voice interactions for your product, to create extended, story-driven experiences for your customers.</li> <li><strong><strong>Adapts to Your Product</strong></strong>: Get support for a wide range of capabilities, regardless of what you are trying to build.</li> </ul> <h2><a href="" target="_blank"><img alt="diagram" src="" style="float:right; height:268px; margin-bottom:-10px; padding-left:10px; width:500px" /></a>How It Works: The Role of an Alexa Skill</h2> <p>To unlock these features, and enable Alexa to interact with the unique capabilities of your product, you will need to create a compatible Alexa skill. The custom interaction is achieved through the Custom Interface Controller, a skill API that exchanges messages with your product over the course of a given skill session, allowing you to design voice experiences that are tailored to your product’s functionality.</p> <p>Messages sent from your skill to your product, or <em>directives</em>, can be configured to activate a range of reactions from your product through motors, sound chips, lights, and more. You can trigger directives in response to game behavior, alongside specific moments in storytelling, or in its simplest form, in response to an explicit command from your customers.</p> <p>Messages sent from your product to a skill, or <em>events</em>, can be triggered by customers engaging directly with your product whether by activating a button, triggering an accelerometer, or achieving a specific sequence of events. Events can also be triggered by the state of your product. When the skill is in session, you will need to ensure that there is an active input handler to listen for an event. You can determine how long to listen for an event — up to 90 seconds — and filter the specific events that you want your skill to receive.</p> <h2>Build for Younger Audiences (Now in Private Beta)</h2> <p>With the help of Custom Interfaces, we are unlocking additional opportunities for developers to create playful, educational and interactive gadgets, games, and smart toys for younger audiences. From kids role play and action figures to building and learning smart toys, you can create unique story-rich interactions with characters that kids already know and love. For example, a teddy bear that reacts to an audio story provided through a companion Alexa kid skill.</p> <p>All products targeted to kids under the age of 13 must have an accompanying <a href="" target="_blank">kid skill</a>. Consistent with the Children's Online Privacy Protection Act, we require permission from a parent before kid skills can be used.</p> <p>Our Private Beta is limited to commercial developers by invite only.</p> <h2>Get Started with Custom Interfaces</h2> <p>To help you get started on your first prototype using Custom Interfaces, we are excited to share sample projects that enable you to build with Raspberry Pi and Python-based software. The software includes sample applications and step-by-step guides that simplify the process of getting your prototype connected and plugged in to the capabilities of Alexa Gadgets Toolkit. Once connected, you have the flexibility to combine your prototype with off-the-shelf components, such as servos, buttons, lights, and more.</p> <p>Visit our <a href="" target="_blank">resource library</a>, which includes the following:</p> <ol> <li><a href="" target="_blank">Tech documentation</a></li> <li><a href="" target="_blank">Sample application that uses Custom Interfaces and step-by-step guides</a></li> </ol> <p>With Custom Interfaces, there are even more possibilities for engaging experiences that you can build to delight your customers. Start prototyping today and be first-to-market in fun and interactive categories that are not yet connected to Alexa. We can’t wait to see what you build for Alexa customers!</p> /blogs/alexa/post/7eda239b-24a9-45f1-bdc7-d86879dc99d3/new-ai-system-helps-accelerate-alexa-skill-development New AI System Helps Accelerate Alexa Skill Development Larry Hardesty 2019-08-15T13:00:00+00:00 2019-08-20T14:19:34+00:00 <p>Based on embeddings, system suggests named entities — or &quot;slot values&quot; — that developers might want their skills to recognize.</p> <p>Alexa currently has more than 90,000 skills, or abilities contributed by third-party developers — the NPR skill, the Find My Phone skill, the Jeopardy! skill, and so on.</p> <p>For each skill, the developer has to specify both <em>slots</em> — the types of data the skill will act on — and <em>slot values</em> — the particular values that the slots can assume. A restaurant-finding skill, for instance, would probably have a slot called something like CUISINE_TYPE, which could take on values such as “Indian”, “Chinese”, “Mexican”, and so on.</p> <p>For some skills, exhaustively specifying slot values is a laborious process. We’re trying to make it easier with a tool we’re calling catalogue value suggestions, which is currently available to English-language skill developers and will soon expand to other languages.</p> <p>With catalogue value suggestions, the developer supplies a list of slot values, and based on that list, a neural network suggests a range of additional slot values. So if, for example, the developer provided the CUISINE_TYPEs “Indian”, “Chinese”, and “Mexican”, the network might suggest “Ethiopian” and “Peruvian”. The developer can then choose whether to accept or reject each suggestion.</p> <p>“This will definitely improve the dev process of creating a skill,” says Jos&eacute; Chavez Marino, an Xbox developer with Microsoft. “The suggestions were very good, but even if they were not accurate, you just don't use them. I only see positive things on implementing this in the Alexa dev console.”</p> <p>The system depends centrally on the idea of embeddings, or representing text strings as points in a multidimensional space, such that strings with similar semantic content are close together. We use proximity in the embedding space as the basis for three distinct tasks: making the slot value suggestions themselves; weeding offensive terms out of the value suggestion catalogue; and identifying slots whose values are so ambiguous that suggestions would be unproductive.<br /> <br /> <img alt="ambiguous_slots.gif" src="" style="display:block; height:282px; margin-left:auto; margin-right:auto; width:500px" /></p> <p style="text-align:center"><em><sub>Sometimes&nbsp;a skill will include slots such as </sub></em><sub>Things_I_like</sub><em><sub> or even </sub></em><sub>Miscellaneous_terms</sub><em><sub> whose values are so irregular that they provide no good basis for slot value suggestions. Here, the solid blue circle represents the average embedding of the slot values “Bird”, “Dog”, and “Cat” (hollow blue circles), while the solid red square represents the average embedding of the slot values “Left”, “Hamster”, and “Boston” (hollow red squares). If slot-value embeddings lie too far (dotted circles) from their averages, we conclude that suggesting new slot values would be unproductive.</sub></em></p> <p><br /> The first step in building our catalogue of slot value suggestions: assemble a list of <em>phrases</em>, as slot values frequently consist of more than one word — restaurant names and place names, for instance. When training&nbsp;our embedding network, we treated both phrases and non-phrasal words as&nbsp;<em>tokens</em>, or semantic units.&nbsp;</p> <p>We then fed the network training data in overlapping five-token chunks. For any given input token, the network would learn to predict the two tokens that preceded it and the two that followed it. The outputs of the network thus represented the frequencies with which tokens co-occurred, which we used to group tokens together in the embedding space.</p> <p>Next, we removed offensive content from the catalogue. We combined and pruned several publicly available blacklists of offensive terms, embedded their contents, and identified words near them in the embedding space. For each of those nearby neighbors, we looked at its 10 nearest neighbors. If at least five of these were already on the blacklist, we blacklisted the new term as well.</p> <p>When a developer provides us with a list of values for a particular slot, our system finds their average embedding and selects its nearest neighbors as slot value suggestions. If the developer-provided values lie too far from their average <em>(see figure, above)</em>, the system concludes that the slot is too ambiguous to yield useful suggestions.</p> <p>To test our system, we extracted 500 random slots from the 1,000 most popular Alexa skills and used half the values for each slot to generate suggestions. On average, the system provided 6.51 suggestions per slot, and human reviewers judged that 88.5% of them were situationally appropriate.</p> <p><em>Boya Yu is an applied scientist in Alexa AI’s Natural Understanding group.</em></p> <p><a href="" target="_blank"><strong>Alexa science</strong></a></p> <p><strong>Acknowledgments</strong>: Markus Dreyer, Likhitha Patha,&nbsp;Ben Overholts, Sam Sussman, Yash Naik, Adam Hasham</p> <p><strong>Related</strong>:</p> <ul> <li><a href="" target="_blank">Representing Data at Three Levels of Generality Improves Multitask Machine Learning</a></li> <li><a href="" target="_blank">Who’s on First? How Alexa Is Learning to Resolve Referring Terms</a></li> <li><a href="" target="_blank">To Correct Imbalances in Training Data, Don’t Oversample: Cluster</a></li> <li><a href="" target="_blank">With New Data Representation Scheme, Alexa Can Better Match Skills to Customer Requests</a></li> </ul> <p><sub><em>Animation by&nbsp;<a href="" target="_blank">Nick&nbsp;Little</a></em></sub></p> /blogs/alexa/post/eb8ec4df-6ef0-4dba-a291-3a9f8ef4915d/isp-certification Alexaスキル認定へのヒント: スキル内課金 Takuya Goshima 2019-08-15T06:12:27+00:00 2019-08-15T06:30:39+00:00 <p><img alt="" src="" style="height:480px; width:1908px" />みなさまが開発されたAlexaスキルはスキルストアへの公開に当たり、Alexa審査チームが<a href="">規定された要件</a>をもとに認定審査をさせていただき、 スキルがよいユーザー体験をお届けできるよう必要に応じてフィードバックをさせていただいています。このブログではスキルを申請いただいた際に、特に改善の指摘を受けている項目について紹介します。今回は「スキル内課金」についてです。</p> <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>今回は「スキル内課金」について、スキル認定へのヒントをお伝えします。みなさまが開発されたAlexaスキルはスキルストアへの公開に当たり、Alexa審査チームが<a href="">規定された要件</a>をもとに認定審査をさせていただき、 スキルがよいユーザー体験をお届けできるよう必要に応じてフィードバックをさせていただいています。このブログではスキルを申請いただいた際に、特に改善の指摘を受けている項目について紹介します。</p> <p>&nbsp;</p> <h2><strong>スキル内課金</strong></h2> <p><a href="">スキル内課金</a>とは、Alexaスキル内でデジタルコンテンツに課金できるようにするための仕組みです。<br /> 2019年8月時点では、日本語、英語(米国)、英語(英国)、ドイツ語でのみご利用いただけます。スキル内課金がサポートしている言語以外の言語でスキルを作成されている場合は、ご利用いただくことが出来ませんのでご注意ください。</p> <p>スキル内商品の価格範囲、使用できる言語、公開地域の一覧についての詳細は<a href="">こちら</a>のページをご参照ください。</p> <p>スキル内商品の情報を取得するためのAPIについては<a href="">こちら</a>をご確認ください。</p> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>1.</strong> <strong>購入のキャンセルまたは返金</strong></h3> <p>スキル内課金を利用するすべてのスキルは、ユーザーによる購入のキャンセルまたは返金に対応する必要があります。キャンセルまたは返金の処理には、リクエストをサポートするためのカスタムインテントを構築し、返金するユーザーリクエストを処理するコードを追加する必要があります。この機能の実装方法に関しては<a href="">返金またはキャンセルのリクエストを処理する</a>もしくは<a href="">こちら</a>のページをご参照ください。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>ユーザー: 「アレクサ、洞窟探検を返金して。」</p> <p>Alexa:「返金については、Alexaアプリにリンクを送信しましたので、そちらで確認してください。」</p> </div> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>2.</strong> <strong>アップセルの提供</strong></h3> <p>アップセルとは、販売する商品を宣伝するためにスキルがユーザーに伝えるメッセージのことです。<br /> ユーザーが対象の商品を所有しているかどうかを確認し、有料の商品をおすすめするメッセージ(アップセルメッセージ)がユーザーに示される必要があります。価格の詳細はAlexaによる購入フロー内で提供されるため、アップセルメッセージに含めないでください。詳細は<a href="">購入フローをデザインする</a>をご参照ください。</p> <p>&nbsp;</p> <p>・[アップセル]の発話例</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「すばらしい。 50か国中45か国に正答しました。世界の国々についてより詳しく学びたい場合は、国鳥の拡張パックがお勧めです。詳しく知りたいですか?」</p> <p>ユーザー: 「はい」</p> <p>Alexa: 「国鳥の拡張パックには、195種類の鳥が含まれています。実在の鳥であったり、架空の鳥であったり、国によってさまざまです。クイズを楽しみながらさらに知識を増やせます。税込299円でのご提供です。購入しますか?」</p> <p>ユーザー: 「はい」</p> <p>Alexa: 「国鳥の拡張パックをご購入いただきありがとうございます。 プレイを始めますか?」</p> </div> <p>&nbsp;</p> <p>ユーザーがスキルを使用するうえで有用なスキル内商品を、おすすめとしてスキルの応答内で提示することが出来ます。<br /> おすすめの提示の実装については<a href="">お勧めを行う</a>や、ユーザー<a href="">がスキル内商品を見つけやすくする</a>のページをご参照ください。</p> <p>また、スキルの応答内でユーザーがスキル内商品の情報について、いつでも確認できることを<a href="">リマインド</a>することができます。リマインダーが完了したら、スキルを再開するようにしてください。</p> <p>&nbsp;</p> <p>・[おすすめ]の発話例</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa:「洞窟探検拡張パックを入手すると、冒険をもっと楽しめます。詳しく知りたいですか?」</p> <p>ユーザー:「はい」</p> </div> <p>&nbsp;</p> <p>・[リマインダー]の発話例</p> <p>ユーザーが無料のアドベンチャーシリーズをプレイしており、もうすぐ無料コンテンツをクリアするとします。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「6つの宝物のうち5つを見つけました。おみごとです。 この冒険をクリアしたら、いつでも新しい冒険を入手できます。拡張パックについて聞きたいですか?」(一時停止)</p> <p>Alexa: 「さあ、最後の宝物探しを続けましょう。あなたが暗い森を歩いていると…」</p> </div> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>3. ユーザーが希望するスキル内商品の購入</strong></h3> <p>ユーザーが興味を持つスキル内商品についてはアップセルメッセージが提示されていなくてもユーザーが購入出来るようにする必要があります。<br /> そのためには、それを処理するコードを追加して購入をサポートするカスタムインテントを構築し、ディレクティブを送信して購入フローを開始します。<br /> 購入を実施する手順については、<a href="">購入リクエストに対応するコードを追加する</a>もしくは<a href="">直接購入の処理方法</a>を参照してください。</p> <p>&nbsp;</p> <p>・ユーザーが購入するアイテムを名指しした場合</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>ユーザー: 「洞窟探検拡張パックを買いたい。」</p> <p>Alexa: 「洞窟探検拡張パックには、埋もれている古代の宝物を発見する新しい冒険が5つ含まれています。税込199円でのご提供です。購入しますか?」</p> </div> <p>&nbsp;</p> <p>・アイテムの購入を希望したが、名前を指定していない場合</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>ユーザー: 「拡張パックが欲しい。」</p> <p>Alexa: 「今完了した冒険を続ける拡張パックは2つあります。 洞窟探検のアドベンチャーゲームと深海探査のパズルゲームです。どちらにしますか?」</p> <p>ユーザー: 「深海探査」</p> <p>Alexa: 「わかりました。深海探査拡張パックには7つの新しいパズルがあります。税込399円でのご提供です。購入しますか?」</p> </div> <p>&nbsp;</p> <p>&nbsp;</p> <h3><strong>4.</strong> <strong>購入完了後の商品の利用</strong></h3> <p>ユーザーが購入したスキル内商品が、買い切り型、消費型、サブスクリプション型のいずれの場合であったとしても、購入処理が完了したら直ちにコンテンツが利用出来る必要があります。また、ユーザーがスキル内商品を購入しなかった場合は、その場合における選択肢を提示してください。購入後については<a href="">購入フロー後の処理</a>、もしくは<a href="">購入フロー後にスキルを再開する</a>を参照してください。</p> <p>&nbsp;</p> <p>ユーザーが、購入したサブスクリプション型の一部である「大亀裂」スキルをプレイしたいと言いました。購入フロー後、すぐにリクエストに応える必要があります。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「大亀裂を探索しましょう。暗い森をようやく抜けると、地面に謎の裂け目がありました…」</p> </div> <p>&nbsp;</p> <p>ユーザーはトリビアゲームでヒント5個パックを購入しました。購入フロー後すぐに、ヒントにアクセスできることを通知する必要があります。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「トリビアで 追加ヒント5個をご購入いただきありがとうございます。使用するときは、「ヒントをください」と言ってください。 それでは最後の問題に戻りましょう。今すぐヒントを使用しますか?」</p> </div> <p>&nbsp;</p> <p>ユーザーが購入しない場合も、同様にコンテキストに応じて処理する必要があります。ユーザーがすべてのコンテンツを消費し、他のオプションを拒否した場合は、セッションを終了します。</p> <div style="background-color:#e7e7e7; border:0px solid #e4e4e4; margin-bottom:10px; padding:10px"> <p>Alexa: 「6つの宝物をすべて見つけました。いつでも戻ってきて、新しいアドベンチャーやパズルが追加されているかどうか確認してください」</p> </div> <p>&nbsp;</p> <p>スキル認定に関するフィードバックメールには、特定された問題の概要、必要な場合は各問題の再現方法の手順、およびスキルを認定に進めるためのガイダンスが記載されています。スキルの審査において注釈がある場合は<a href="">こちら</a>を参照いただき、スキルの申請時に開発者コンソールの「公開」タブ内にある「プライバシーとコンプライアンス」中の「テスト手順」のフィールドを通してお知らせください。</p> <p><a href="">Alexa Skills Kit (ASK) (日本語)</a>スペースでは、スキル開発に関する質問や、他の開発者の質問に対する回答を投稿することが出来ます。また、<a href=";sc_channel=website&amp;sc_publisher=devportal&amp;sc_campaign=Conversion_Contact-Us&amp;sc_assettype=conversion&amp;sc_team=us&amp;sc_traffictype=organic&amp;sc_country=united-states&amp;sc_segment=all&amp;sc_itrackingcode=100020_us_website&amp;sc_detail=blog-alexa">お問い合わせフォーム</a>もあわせてご利用ください。</p> <p>&nbsp;</p> <p>スキル内課金の関連記事:<br /> ・<a href="">スキル内課金を使ったスキルを日本のAlexaユーザー向けに開発できるようになりました</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキル内課金のベストプラクティス</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキルによる収益化</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキル内課金の開発手順</a><br /> ・<a href="">Alexaスキル開発トレーニングシリーズ: スキル内課金に関するFAQ</a><br /> ・<a href="">スキル内課金の認定ガイド</a><br /> ・<a href="">スキル内課金のよくある質問</a></p> /blogs/alexa/post/6a5b0ad4-b27e-4a6b-87dd-3792bab23c51/what-s-new-in-the-alexa-skills-kit-july2019-release-roundup What's New in the Alexa Skills Kit: July 2019 Release Roundup Leo Ohannesian 2019-08-15T00:18:18+00:00 2019-08-15T00:18:18+00:00 <p><img alt="Intent-history_blog.png" src="" /></p> <p>What's new in the Alexa Skills Kit for July 2019? Read our release roundup blog to find out.</p> <p><img alt="Intent-history_blog.png" src="" /></p> <p><em><strong>Editor's Note: </strong>Our monthly release roundup series showcases the latest in Alexa Skills Kit developer tools and features that can make your skills easier to manage, simpler to deploy, and more engaging for your customers. Build with these new capabilities to enable your Alexa skills to drive brand or revenue objectives.</em></p> <p>&nbsp;</p> <p>In this roundup post we share details about the new things released for skill developers last month, including the General Availability of Skill Connections along with several other features&nbsp;that can help make you be more productive or build more engaging skills. Check out the entire livestream for more information from Alexa evangelists and code samples.</p> <p><iframe allowfullscreen="" frameborder="0" height="360" src="//" width="640"></iframe></p> <h2>1. Improve productivity by outsourcing tasks to other skills with Skill Connections, now Generally Available</h2> <p>Skill connections enable a skill to use another skill to perform a specific task, so you can do more for your customers by extending your skill's abilities with minimal changes. <a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Productivity&amp;sc_country=WW" target="_blank">Check out the announcement here</a> or <a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Productivity&amp;sc_country=WW" target="_blank">learn more about Skill Connections&nbsp;in our tech docs. </a></p> <ul> </ul> <h2>2. Easily integrate leaderboards into your game skills using the skills GameOn SDK (Beta)</h2> <p>Leaderboards are a great way to keep players engaged with your game skills and drive retention. You can now use the Skills GameOn SDK (beta), powered by Amazon GameOn and optimized for Alexa skills, to easily integrate leaderboards into your game skills. We are also excited to announce that we have a special offer to help more skill developers leverage the GameOn capabilities. <a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Games&amp;sc_country=WW" target="_blank">Learn more about the GameOn SDK by reading our blog.</a></p> <h2>3. Alexa Presentation Language 1.1 (Beta)</h2> <p>We are excited to announce the next version of Alexa Presentation Language (APL) with support for animations, vector graphics, better tooling, and a design system that makes APL skill development for multiple viewport profiles faster. Read about&nbsp;<a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=APL&amp;sc_country=WW" target="_blank">it in our announcement.</a></p> <h2>4. Quickly test your VUI with Quick Builds, now on the Developer Console</h2> <p>Save time and start testing early: We are excited to announce developer console support for Quick builds, which enable you to start testing your skill with sample utterances on average 67% quicker than before. This is done by introducing a new intermediate build state called Quick Build. Read about&nbsp;<a href=";sc_category=Owned&amp;sc_channel=BG&amp;sc_campaign=roundup&amp;sc_content_category=Productivity&amp;sc_country=WW#build-and-save" target="_blank">it in our tech docs.</a></p> <p>As always, we can't wait to see what you build. As a reminder, learn how to get the most out of the tech docs by visiting the <a href="" target="_blank">Latest Tips page.</a></p> /blogs/alexa/post/67edf9f0-1ec6-4261-ad6b-46cf36d87fbb/voice-agency-say-it-now-ceo-discusses-reaping-big-rewards-from-the-evolving-voice-industry Voice Agency 'Say It Now' CEO Discusses Reaping Big Rewards from the Voice Industry Emma Martensson 2019-08-14T14:00:00+00:00 2019-08-14T14:00:00+00:00 <p><img alt="Voice Agency 'Say It Now' CEO Discusses Reaping Big Rewards from the Voice Industry " src="" /></p> <p><a href="" target="_blank">Charlie Cadbury</a>, CEO of <a href="" target="_blank">Say It Now</a>, has adapted alongside technology since 1999 when he sold his first website. Today, Say It Now is a group of enterprise natural language processing (NLP) experts building out conversational strategies and products alongside Fortune 500 companies.</p> <p><img alt="Voice Agency 'Say It Now' CEO Discusses Reaping Big Rewards from the Voice Industry " src="" /></p> <p><a href="" target="_blank">Charlie Cadbury</a>, CEO of <a href="" target="_blank">Say It Now</a>, has adapted alongside technology since 1999 when he sold his first website. In 2015 he began playing with voice with a proof of concept for Allegiant Air. Charlie then spent most of 2016 and 2017 attending conferences with an Amazon Echo, pitching (and winning) several travel innovation competitions for a conversational hotel concierge service called <a href="" target="_blank">‘Dazzle’</a>. Charlie got early support from Marriott Hotels for Dazzle and was able to build out the proposition in Marriott Hotel London County Hall. “That recognition gave me confidence that voice had potential; however at that point in 2017 the business case for voice wasn’t very well established and often guests had never seen or heard of a smart speaker before,” Charlie says. Dazzle grew into a multi-award winning conversational service, and together with the VP of product, <a href="" target="_blank">Sander Siezen</a>, Charlie went on to set up Say It Now, a voice agency, at the end of summer 2018.</p> <p>Today, Say It Now is a group of enterprise natural language processing (NLP) experts building out conversational strategies and products alongside Fortune 500 companies. “The business benefits being reaped are better articulated now we’ve been in the NLP space for 4 years and are clearer than ever about the strategies brands should adopt,” Charlie says. Recently Say It Now won the UK and EU rounds of The Alexa Cup, both incredible milestones for them, and won the Bronze medal in the global final. Charlie adds “We’re happy with the top spot in Europe and third in the world!”</p> <h2>How to Collaborate with Clients on a Voice Strategy</h2> <p>Say It Now has, together with their clients, developed a workshop technique that allows them to assess whether voice is the right approach and, if so, where the most value can be found right now. They also create a roadmap for where the value and growth will come from over time. In this workshop they then form an actionable plan that is carried out collaboratively with their client. Whilst Say It Now brings their specific industry expertise to the table to deliver insight and guidance, the clients educate them about the internal machinations of their organisation and their strategy.</p> <p>This is exactly the way they work with their client <a href="" target="_blank">Diageo</a> on their Alexa skill, <a href="" target="_blank">Talisker Tasting</a>. Diageo has <a href="" target="_blank">stated</a> that this kind of relationship has the desired effect and as a result they have committed to continued investment in voice.</p> <p>For Say it Now’s own skills, like the Alexa Cup’s winning submission Book It Now, they took a slightly different approach. “We sought to find a solution for booking services through Alexa and adding that functionality at scale,” Charlie says.</p> <h2>Say It Now Expects Big Rewards from ‘Conversational Commerce’</h2> <p>“I started talking about ‘emerging commerce’, the idea that the way we transact has always and will always evolve, in 2012 when building out some early mobile payment apps.” He continues, “I’ve taken a very keen interest in the development of ‘conversational commerce’ over the past few years and believe it is inevitable and will be transformative to the voice industry.” Charlie believes the way it manifests will take a few years to mature but there are big rewards if some of this transaction value can be captured. Say It Now is working hard to ensure they’re part of this evolution.</p> <h2>Aspiring Voice Agencies Should Connect with the Voice Community</h2> <p>Charlie sees relationships and community as key when building skills and starting a voice business. Say It Now has had a lot of success building relationships with other developers, voice designers and the community at large. “It means you are abreast of the latest developments and have the right kind of people around you to make sense of this rapidly changing world,” he says. He recommends the <a href="" target="_blank"></a> podcast and <a href="" target="_blank">Voice First Community</a> for any aspiring voice developers.</p> <h2>More ‘Wow’ in the Future of Voice</h2> <p>Say It Now is interested to see how <a href="">Name Free Skill Interactions (NFSI) </a>and <a href="">Alexa Conversations </a>develop in the future. “Discovery and cross skill flows are still a challenge,” he says, “but when unlocked we expect to see more ‘wow’ and utilisation of voice as a place where even more complex tasks can be seamlessly delegated to your Amazon Alexa.”</p> <p>Charlie thinks this change has to come from a place of trusted personalisation, and that trust will take a while to be built up and must stem from delightful voice experiences that will provide the data businesses need. “But get there we will,” he concludes. Charlie and Say It Now is excited to play their part in the rapidly evolving story of voice.</p> <h2>Related Content</h2> <ul> <li><a href="">Hugo’s Move from Digital Nomad to Full Time Alexa Skills Developer</a></li> <li><a href="">How Vocala is Creating a Growing Voice Business</a></li> <li><a href="">Make Money with In-Skill Purchasing</a></li> <li><a href="">Sell Premium Content to Enrich Your Skill Experience</a></li> </ul> <h2>Grow Your Voice Business with Monetized Alexa Skills</h2> <p>With in-skill purchasing (ISP), you can sell premium content to enrich your Alexa skill experience. ISP supports one-time purchases for entitlements that unlock access to features or content in your skill, subscriptions that offer access to premium features or content for a period of time, and consumables which can be purchased and depleted. You define your premium offering and price, and we handle the voice-first purchasing flow.&nbsp;If you add ISP to your skill, you may be eligible to earn a voucher for the <a href="" target="_blank">AWS Certified Alexa Skill Builder</a> exam through the&nbsp;<a href=";linkId=67863388#?&amp;sc_category=Owned&amp;sc_channel=SM&amp;sc_campaign=EUPromotion&amp;sc_publisher=LI&amp;sc_content=Promotion&amp;sc_funnel=Publish&amp;sc_country=EU&amp;sc_medium=Owned_SM_EUPromotion_LI_Promotion_Publish_EU_EUDevs&amp;sc_segment=EUDevs">EU Perks Program</a>. <a href=";sc_category=Owned&amp;sc_channel=WB&amp;sc_campaign=wb_acquisition&amp;sc_publisher=ASK&amp;sc_content=Content&amp;sc_detail=vod-webinar&amp;sc_funnel=Convert&amp;sc_country=WW&amp;sc_medium=Owned_WB_wb_acquisition_ASK_Content_vod-webinar_Convert_WW_visitors_makemoney-page_CTA-graphic&amp;sc_segment=visitors&amp;sc_place=makemoney-page&amp;sc_trackingcode=CTA-graphic" target="_blank">Download our introductory guide</a> to learn more.</p> /blogs/alexa/post/d92c7822-d289-44fd-a9fe-9652874fc3c9/five-benchmarks-for-writing-dialog-that-sounds-great-to-alexa-customers Five Benchmarks for Writing Dialog that Sounds Great to Alexa Customers Michelle Wallace 2019-08-13T16:25:21+00:00 2019-08-13T16:25:21+00:00 <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>Great Alexa skills depend on written prompts. This post covers five benchmarks your Alexa skill’s dialog should meet, and specific techniques for how you can get there.</p> <p><img alt="" src="" style="height:480px; width:1908px" /></p> <p>Great Alexa skills depend on written prompts. In voice-first interfaces, the dialog you write isn’t one component of the user interface—it <em>is </em>the interface, because Alexa’s voice is the primary guide leading a customer through your skill.<br /> <br /> But if you don’t have a background in writing, that’s okay! Any skill builder can improve their written dialog so it successfully serves the customer. This post covers five benchmarks your Alexa skill’s dialog should meet, and specific techniques for how you can get there.</p> <h2>Benchmark 1: Avoid Jargon and Ten-Dollar Words</h2> <p>Customers love low-friction interactions, and the individual words in your dialog can be a huge part of keeping the interaction simple and easy. Informal language is faster and less burdensome for a customer to process, so they can follow a voice interaction without pausing to respond.<br /> <br /> Here are some examples of commonly used jargon or overly formal words, along with alternatives that could be used instead:<br /> <br /> Jargon: “You can default to a stored method associated with this account, or override it by selecting an alternate method of payment.”<br /> Simpler: “You can use the credit card on file, or add a new card.”</p> <p>Jargon: “I can submit a request for a customer service representative to return your call.”<br /> Simpler: “I can have an agent call you back.”</p> <p>Jargon: “Would you like me to submit your order for processing?”<br /> Simpler: “Ready to finish your order?”<br /> <br /> Jargon: “The transaction attempt was not successful.”<br /> Simpler: “Hmm. Something went wrong with your payment.”<br /> <br /> So, what are some techniques for replacing jargon with clearer language? First, fresh eyes are valuable here. Find someone who’s not an expert in your skill’s content, and ask them to read or listen to your dialog and point out words that feel unfamiliar to them. Second, once you’ve identified some clunky words, find synonyms that are less formal. (Don’t be afraid to dust off that thesaurus!)</p> <h2>Benchmark 2: Apply the One-Breath Test for Concision</h2> <p>Remember that your skill’s dialog will be spoken out loud, one word at a time, so excess words in your prompts quite literally add time to the interaction. A useful guideline is that a prompt should be about as long as a human could say in one breath. It’s a great idea to read your dialog out loud or have a colleague read it to you.<br /> <br /> If you identify some prompts that don’t pass the <a href="">one-breath test</a>, here are some ways you can shorten them:</p> <ul> <li>Cut filler words, like “very.” Keep an eye out for words that don’t change the meaning of a sentence or add information; you can eliminate these.</li> <li>Look out for wordiness around verbs. For example, “I’d like to be able to help you” can be shortened to “I can help.”</li> <li>Find information that customers don’t need. For example, if a prompt contains a date, like “Your order will be ready on August 2, 2019,” you can usually omit the year.</li> </ul> <p>There are concrete techniques you can use to make sentences concise. First, make sure each sentence passes the one-breath test by reading it aloud. Next, if you find sentences that don’t pass the test, cut your sentences down by challenging yourself to omit 2-5 words from every line of dialog in your code.</p> <h2>Benchmark 3: Introduce Variety into Your Dialog</h2> <p>Humans use a lot of variation in the way they speak. In contrast, voice experiences that repeat the same phrases don’t sound natural to the human ear. You can avoid repetitive dialog by adding randomized variations to your dialog.&nbsp;<br /> <br /> Look for the &nbsp;skill dialog that your users will hear the most often, starting with the greeting. Imagine a skill that allows you to order groceries called Grocery Store. If you heard “Welcome to the Grocery Store!” with every launch, you’d grow tired of this greeting.<br /> <br /> As a skill builder, you could provide randomized phrases so that customers might hear one of several responses upon launch. For example:</p> <ul> <li>Thanks for stopping by the Grocery Store.</li> <li>Hi, you’ve reached the Grocery Store.</li> <li>Let’s fill your cart at the Grocery Store!</li> </ul> <p>Another opportunity for variation is confirming a purchase, or congratulating a customer for completing a task. For example, if you have a skill that sells cupcakes, you could randomize phrases that confirm the purchase:</p> <ul> <li>You’re all set! Treats are on the way.</li> <li>It’s cupcake time! Your order is complete.</li> <li>Sweet! You’ve successfully ordered your cupcakes.</li> </ul> <p>It’s important to keep aspects of the flow consistent; your skill shouldn’t feel radically different or unfamiliar each time. But creating variation is an important way to keep your skill interesting and fresh, especially for skills a user might open every day, like skills for weather, exercise, or news.<br /> <br /> To make sure your dialog isn’t overly repetitive, you can add a few simple techniques to your process. First, take a look at your list of dialog lines and identify 3-5 prompts that your customers will encounter each time they use your skill. Next, write 2-5 (or more!) variations for each of these lines. It’s a good idea to ask a few friends or colleagues to help you brainstorm, as you may come up with more creative variations as a group.<br /> <br /> For more guidance, check out the Alexa Design Guide’s section on <a href="">adding variety</a>&nbsp;in <a href="">repetitive tasks</a>, and using <a href="">adaptive prompts</a>.</p> <h2>Benchmark 4: Try Contractions and Informal Phrasing</h2> <p>General advice for Alexa dialog is <a href="">“Write it the way you say it.”</a> People use lots of contractions when they speak, such as:</p> <ul> <li>“I’m” instead of “I am”</li> <li>“I’d” instead of “I would”</li> <li>“Don’t” instead of “do not”</li> <li>“Can’t” instead of “cannot”</li> </ul> <p>“I cannot help you with that” sounds much stiffer than “I can’t help you with that.” Because your skill’s dialog should be casual and conversational, for most situations, the contracted version is preferred.<br /> <br /> Humans also use short phrases; not every line of dialog has to be a complete sentence. This keeps your prose natural, and contributes to concise sentences. For example:</p> <ul> <li>“Done!” instead of “This purchase is complete.”</li> <li>“Ready?” instead of “Do you want to continue?”</li> <li>“Got it, four adult tickets to next week’s choir concert!” instead of “Okay, I will place an order for four adult tickets to go see the choir concert taking place next week.”</li> </ul> <p>With just a little extra effort, you can make sure your dialog sounds casual and easy on the ear. First, circle all of the verbs that could be turned into contractions. Reading out loud can help you identify these places, too. Next, you can identify dialog that can be turned into shorter phrases. Some good candidates for phrases are prompts that end with a question and confirmation phrases.</p> <h2>Benchmark 5: Use SSML for Better Pacing</h2> <p>When customers listen to a long string of dialog without meaningful pauses, the words can bleed together and create confusion. It’s a great idea to employ synthetic speech markup language (<a href="">SSML</a>) to adjust aspects of Alexa’s speech so it sounds even more natural to a human ear.<br /> <br /> You can use SSML to do lots of things, from tweaking a word’s pronunciation to adjusting emphasis on a specific syllable. But perhaps the simplest SSML tag with the biggest impact is the <a href="">break time</a> tag, which represents a pause in speech. Sometimes adding even a few milliseconds of extra time can help your customer comprehend the prompt more easily.<br /> <br /> For example, you can use SSML to add time between menu items:</p> <pre> <code>&lt;speak&gt; There are three house plants I’d recommend for your apartment: elephant ear, &lt;break time=&quot;600ms&quot;/&gt; peace lily &lt;break time=&quot;600ms&quot;/&gt; and spider plant. &lt;/speak&gt; </code></pre> <p>You can also add a lengthier pause between sentences, usually to indicate a transition between content and a next step:</p> <pre> <code>&lt;speak&gt; You answered a total of 14 questions right! That beats your all-time high score of 12 correct answers. &lt;break time=&quot;1s&quot;/&gt; Want to play again? &lt;/speak&gt; </code></pre> <p>To identify places where a pause is useful, listen to each prompt being read <em>by Alexa</em>. An easy way is to paste your dialog into the <strong>Voice &amp; Tone</strong> speech simulator, located in the <strong>Test</strong> tab in the Alexa developer console. If a sentence seems rushed, add some break time tags and listen again to fine-tune. You can experiment with adding pauses of varying lengths, from 300 milliseconds to one second.</p> <h2>Benchmark Checklist</h2> <p>If you’ve done all of these things, your dialog will be crafted for a natural, concise, easy-on-the-ear customer experience.</p> <ol> <li>Eliminate jargon by asking for feedback from someone who’s not an expert in your skill’s content.</li> <li>Perform the one-breath test and, if you need to, cut 3-5 words from every sentence.</li> <li>Identify 3-5 prompts that will be commonly encountered and write at least two variations for each.</li> <li>Where you can, reduce your verb phrases to contractions and shorten some sentences to phrases.</li> <li>Listen to Alexa read every line and add spacing between phrases and sentences.</li> </ol> <p>In general, the best way to confirm you’ve got great dialog is to read it aloud. Better yet, read it aloud to a friend or colleague who represents your customer base. Check to make sure they had an easy time understanding and responding to your prompts, and use their feedback to tweak your dialog until it has a conversational tone that’s easy to comprehend. Taking the extra time to scrutinize your dialog will help you craft a skill experience that’s conversational, intuitive, and frictionless for your customers.</p> <h2>Related Content</h2> <ul> <li><a href="">Alexa Design Guide</a></li> <li><a href="">Speech Synthesis Markup Language (SSML) Reference</a></li> <li><a href="">How to Write Great Dialogs for Alexa Skills</a></li> <li><a href="">Best Practices for the Welcome Experience and Prompting in Alexa Skills</a></li> </ul>