CMU-HCII-21-102 Human-Computer Interaction Institute School of Computer Science, Carnegie Mellon University
A Multi-Modal Intelligent Agent that Learns from Toby Jia-Jun Li May 2021 Ph.D. Thesis
To address this problem, I and my collaborators designed SUGILITE, a new intelligent agent that allows end users to teach new tasks and concepts in a natural way. SUGILITE uses a multi-modal approach that combines programming by demonstration (PBD) and learning from natural language instructions to support end-user development for intelligent agents. The lab usability evaluation results showed that the prototype of SUGILITE allowed users with little or no programming expertise to successfully teach the agent common smartphone tasks such as ordering coffee, booking restaurants, and checking sports scores, as well as the appropriate conditionals for triggering these actions and the task-relevant concepts. My dissertation presents a new human-AI interaction paradigm for interactive task learning, where the existing third-party app GUIs are used as a medium for users to communicate their intents with an AI agent in addition to being the interface for interacting with the underlying computing services. Through the development of the integrated SUGILITE system over the past five years, this dissertation presents seven main technical contributions including: (i) a new approach to allow the agent to generalize from learned task procedures by inferring task parameters and their associated possible values from verbal instructions and mobile app GUIs, (ii) a new method to address the data description problem in PBD by allowing users to verbally explain ambiguous or vague demonstrated actions, (iii) a new multi-modal interface to enable users to teach the conceptual knowledge used in conditionals to the agent, (iv) a new mechanism to extend mobile app based PBD to smart home and Internet of Things (IoT) automation, (v)a new multi-modal interface that helps users discover, identify the causes of, and recover from conversational breakdowns using existing mobile app GUIs for grounding, (vi) a new privacy-preserving approach that can identify and obfuscate the potential personal information in GUI-based PBD scripts based on the uniqueness of information entries with respect to the corresponding app GUI context, and (vii) a new self-supervised technique for generating semantic representations of GUI screens and components in embedding vectors without requiring manual annotation.
221 pages
Jodi Forlizzi, Head, Human-Computer Interaction Institute
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |