ANNOUNCING THE WINNER OF THE ALEXA PRIZE SOCIALBOT GRAND CHALLENGE 3
Congratulations to Emora from Emory University!
Alexa Prize Socialbot Grand Challenge 3 Proceedings
Dilek Hakkani-Tur, Senior Principal Scientist, Alexa AI
Prem Natarajan, Vice President, Alexa AI
Amazon Alexa Prize
Further Advances in Open Domain Dialog Systems in the Third Alexa Prize Socialbot Grand Challenge
Building open domain conversational systems that allow users to have engaging conversations on topics of their choice is a challenging task. The Alexa Prize Socialbot Grand Challenge was launched in 2016 to tackle the problem of achieving natural, sustained, coherent and engaging open-domain dialogs. In the third iteration of the competition, university teams have moved the needle on the state of the art, bringing together common sense knowledge representations, neural response generation models, NLU systems enhanced by large-scale transformer models and improved dialog policies to switch between graph-based representations or retrieval-based or templated dialog fragments, along with generated responses. The Third Socialbot Grand Challenge included an improved version of the CoBot (conversational bot) toolkit from the prior competition, along with topic and dialog act detection models, conversation evaluators, and a sensitive content detection model so that the competing teams could focus on building knowledge-rich, coherent and engaging multi-turn dialog systems. This paper outlines the advances developed by the university teams as well as the Alexa Prize team to move closer to the Grand Challenge objective. We address several key open-ended problems such as conversational speech recognition, open domain natural language understanding, commonsense reasoning, statistical dialog management and dialog evaluation. These collaborative efforts have driven improved ratings in the Semifinals of the competition from 3.19 in the prior competition cycle to 3.47 across all teams, an increase of 8.8%. As of the end of the final feedback phase, the top 7-day average rating achieved by a socialbot was 3.71 (out of 5), with the top 90th percentile conversation duration at 14 minutes 19 seconds.
Raefer Gabriel, Yang Liu, Anna Gottardi, Mihail Eric, Anju Khatri, Anjali Chadha, Qinlang Chen, Behnam Hedayatnia, Pankaj Rajan, Ali Binici, Shui Hu, Karthik Gopalakrishnan, Seokhwan Kim, Lauren Stubel, Kate Bland, Arindam Mandal, Dilek Hakkani-Tür
Emory University - Emora
Emora: An Inquisitive Social Chatbot Who Cares For You
Inspired by studies on the overwhelming presence of experience-sharing in human- human conversations, Emora, the social chatbot developed by Emory University, aims to bring such experience-focused interaction to the current field of conver- sational AI. The traditional approach of information-sharing topic handlers is balanced with a focus on opinion-oriented exchanges that Emora delivers, and new conversational abilities are developed that support dialogues that consist of a collaborative understanding and learning process of the partner’s life experiences. We present a curated dialogue system that leverages highly expressive natural lan- guage templates, powerful intent classification, and ontology resources to provide an engaging and interesting conversational experience to every user.
Sarah E. Finch, James D. Finch, Ali Ahmadvand, Ingyu (Jason) Choi, Xiangjue Dong, Ruixiang Qi, Harshita Sahijwani, Sergey Volokhin, Zihan Wang, Zihao Wang, Jinho D. Choi
Stanford University - Chirpy Cardinal
Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations
We present Chirpy Cardinal, an open-domain dialogue agent, as a research plat- form for the 2019 Alexa Prize competition. Building an open-domain socialbot that talks to real people is challenging – such a system must meet multiple user expectations such as broad world knowledge, conversational style, and emotional connection. Our socialbot engages users on their terms – prioritizing their interests, feelings and autonomy. As a result, our socialbot provides a responsive, person- alized user experience, capable of talking knowledgeably about a wide variety of topics, as well as chatting empathetically about ordinary life. Neural generation plays a key role in achieving these goals, providing the backbone for our con- versational and emotional tone. At the end of the competition, Chirpy Cardinal progressed to the finals with an average rating of 3.6/5.0, a median conversation duration of 2 minutes 16 seconds, and a 90th percentile duration of over 12 minutes.
Ashwin Paranjape, Abigail See, Kathleen Kenealy, Haojun Li, Amelia Hardy, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Christopher D. Manning
Czech Technical University in Prague - Alquist
Alquist 3.0: Alexa Prize Bot Using Conversational Knowledge Graph
The third version of the open-domain dialogue system Alquist developed within the Alexa Prize 2020 competition is designed to conduct coherent and engaging conversations on popular topics. The main novel contribution is the introduction of a system leveraging an innovative approach based on a conversational knowledge graph and adjacency pairs. The conversational knowledge graph allows the system to utilize knowledge expressed during the dialogue in consequent turns and across conversations. Dialogue adjacency pairs divide the conversation into small con- versational structures, which can be combined and allow the system to react to a wide range of user inputs flexibly. We discuss and describe Alquist’s pipeline, data acquisition and processing, dialogue manager, NLG, knowledge aggregation, and a hierarchy of adjacency pairs. We present the experimental results of the individual parts of the system.
Jan Pichl, Petr Marek, Jakub Konrád, Petr Lorenc, Van Duy Ta, Jan Šedivý
University of California, Davis - Gunrock
Gunrock 2.0: A User Adaptive Social Conversational System
Gunrock 2.0 is built on top of Gunrock with an emphasis on user adaptation. Gunrock 2.0 combines various neural natural language understanding modules, including named entity detection, linking, and dialog act prediction, to improve user understanding. Its dialog management is a hierarchical model that handles various topics, such as movies, music, and sports. The system-level dialog manager can handle question detection, acknowledgment, error handling, and additional functions, making downstream modules much easier to design and implement. The dialog manager also adapts its topic selection to accommodate different users’ profile information, such as inferred gender and personality. The generation model is a mix of templates and neural generation models. Gunrock 2.0 is able to achieve an average rating of 3.73 at its latest build from May 29th to June 4th.
Kaihui Liang, Austin Chau, Yu Li, Xueyuan Lu, Dian Yu, Mingyang Zhou, Ishan Jain, Sam Davidson, Josh Arnold, Minh Nguyen, Zhou Yu
University of California, Santa Cruz - Athena
Athena: Constructing Dialogues Dynamically with Discourse Constraints
This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena’s dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response gener- ators. This allows Athena to procure responses from dynamic sources, such as knowledge graph traversals and feature-based on-the-fly response retrieval methods. After describing the dialogue system architecture, we perform an analysis of con- versations that Athena participated in during the 2019 Alexa Prize Competition. We conclude with a report on several user studies we carried out to better understand how individual user characteristics affect system ratings.
Vrindavan Harrison, Juraj Juraska, Wen Cui, Lena Reed, Kevin K. Bowden, Jiaqi Wu, Brian Schwarzmann, Abteen Ebrahimi, Rishi Rajasekaran, Nikhil Varghese, Max Wechsler-Azen, Steve Whittaker, Jeffrey Flanigan, and Marilyn Walker
Carnegie Mellon University - Tartan
Tartan: A Two-Tiered Dialog Framework For Multi-Domain Social Chitchat
Tartan is a social bot that engages users in sharing daily personal experiences in multiple domains. Our work contributes to Conversational AI in two aspects: 1) We extract common-sense knowledge expressed in large-scale user utterances in conversations, and find that more than 20% of the shared information is related to personal life, such as social relationships and individual activities. 2) Based on the underlying structure of daily life common sense knowledge, we decompose the task of open-domain social chat into a dialog management problem over a set of independent topical bots. In addition to analysis of the effectiveness of the critical components in our design, we also present analysis on the breadth of common sense knowledge expressed in conversational language and the depth of conversations that can be grounded on common sense knowledge.
Fanglin Chen, Ta-Chung Chi, Shiyang Lyu, Jianchen Gong, Tanmay Parekh, Rishabh Joshi, Anant Kaushik, Alexander Rudnicky
Moscow Institute of Physics and Technology - DREAM
DREAM technical report for the Alexa Prize 2019
Building a dialogue system able to talk fluently and meaningfully in an open domain conversation is one of the foundational challenges in the field of AI. Recent progress in NLP driven by the application of the deep neural networks and large language models opened new possibilities to solve many hard problems of the conversational AI. Alexa Prize Socialbot Grand Challenge gives a unique opportunity to test cutting edge research ideas in the real-world setting. In this report, we outline the DREAM socialbot solution and present evaluation results. DREAM socialbot is implemented as a multi-skill conversational agent with the modular micro-service architecture. DREAM agent orchestrates a dozen text preprocessing annotators and more than 25 conversational skills to generate responses in the context of the open domain conversation. Feedback from Alexa users during the evaluation period allowed us to gradually develop our solution by increasing the number of conversational skills and improving the transition between them. As a result, dialogues became 50% longer, and average rating grew from ∼ 3 during the initial stage in December’19 to ∼ 3.4 during the last two weeks of April’20. The final version of DREAM socialbot is a hybrid system that combines rule-based, deep learning, and knowledge base driven components.
Yuri Kuratov, Idris Yusupov, Dilyara Baymurzina, Denis Kuznetsov, Daniil Cherniavskii, Alexander Dmitrievskiy, Elena Ermakova, Fedor Ignatov, Dmitry Karpov, Daniel Kornev, The Anh Le, Pavel Pugin, Mikhail Burtsev
University of California, Irvine - ZotBot
ZOTBOT: Using Reading Comprehension and Commonsense Reasoning in Conversational Agents
We describe the ZOTBOT system for open-ended conversations, designed for the Alexa Prize competition. We focus on two main shortcomings in existing conver- sational agents: lack of awareness in commonsense reasoning when responding to user utterances (resulting in nonsensical or uninteresting responses) and inabil- ity to understand semantics and converse naturally about fact-based articles in a compelling manner. First, we combine existing work in commonsense KBs, pretrained language models, and graph completion models to generate natural and intuitive responses, consistent with our commonsense knowledge, for open-domain utterances. Second, we utilize question generation models for both reading com- prehension and conversational followups for discussing fact-based articles. These contributions are implemented within a system engineered to be modular, allowing easy injection of manually-scripted responses, as well as supporting a detailed logging and analysis system. We present examples and analyses highlighting the benefits and shortcomings of ZOTBOT, and conclude with the lessons learned for future research.
William Schallock, Daniel Agress, Yao Du, Dheeru Dua, Lyuyang Hu, Yoshitomo Matsubara, Sameer Singh
University of California, San Diego - Bernard
Bernard: A Stateful Neural Open-domain Socialbot
We propose Bernard: a framework for an engaging open-domain socialbot. While the task of open-domain dialog generation remains a difficult one, we explore various strategies to generate coherent dialog given an arbitrary dialog history. We incorporate a stateful autonomous dialog manager using non-deterministic finite automata to control multi-turn conversations. We show that powerful pretrained language models are capable of generating coherent and topical responses in the presence of grounding facts. Finally, we implement Acknowledge-Retrieve- Reply strategy to combine template-based and neural dialog generation for greater diversity and increased naturalness. Extensive human evaluation shows that the combination of generative models and retrieval models in a stateful dialog machine can achieve desired user experiences in terms of topic diversity and engagingness, as showed in extensive human evaluation.
Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, Huanru Henry Mao, Sophia Sun, Julian McAuley
University of Michigan - Audrey
Audrey: A Personalized Open-Domain Conversational Bot
Conversational Intelligence requires that a person engage on informational, personal and relational levels. Advances in Natural Language Understanding have helped recent chatbots succeed at dialog on the informational level. However, current techniques still lag for conversing with humans on a personal level and fully relating to them. The University of Michigan’s submission to the Alexa Prize Grand Challenge 3, Audrey, is an open-domain conversational chat-bot that aims to engage customers on these levels through interest driven conversations guided by customers’ personalities and emotions. Audrey is built from socially-aware models such as Emotion Detection and a Personal Understanding Module to grasp a deeper understanding of users’ interests and desires. Our architecture interacts with customers using a hybrid approach balanced between knowledge-driven response generators and context-driven neural response generators to cater to all three levels of conversations. During the semi-finals period, we achieved an average cumulative rating of 3.25 on a 1-5 Likert scale.
Chung Hoon Hong, Yuan Liang, Sagnik Sinha Roy, Arushi Jain, Vihang Agarwal, Ryan Draves, Zhizhuo Zhou, William Chen, Yujian Liu, Martha Miracky, Lily Ge, Nikola Banovic, David Jurgens