CMU-ML-19-110
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-19-110

Towards Literate Artificial Intelligence

Mrinmaya Sachan

June 2019

Ph.D. Thesis

CMU-ML-19-110.pdf


Keywords: Machine Learning, Natural Language Understanding, Computational Linguistics, Question Answering, Knowledge Representation and Reasoning, Machine Comprehension


Standardized tests are used to test students as they progress in the formal education system. These tests are readily available and have clear evaluation procedures. Hence, it has been proposed that these tests can serve as good benchmarks for AI. In this thesis, we propose approaches for solving some common standardized tests taken by students such as reading comprehensions, elementary science exams, geometry questions in the SAT exam and mechanics questions in the AP physics exam. Answering these test problems requires deep linguistic (and sometimes visual) understanding and reasoning capabilities which is challenging for modern AI systems.

In the first part of this thesis, we explore novel approaches to answer natural language comprehension tests such as reading comprehensions and elementary science tests (chapters 4, 5 and 6). These tests evaluate the system's ability to understand text through a question-answering task. We present new latent structure models for these tasks. We posit that there is a hidden (latent) structure that explains the relation between the question, the correct answer, and the piece of text. We call this the answer-entailing structure; given the structure, the correctness of the answer is evident. Since the structure is latent, it must be inferred. We present a unified max-margin framework that learns to find these hidden structures given a corpus of question-answer pairs, and uses what it learns to answer questions on novel texts. We also describe a simple but effective extension of this framework to incorporate multi-task learning on the different subtasks that are required to perform the over-all task (chapter 4), a deeper representation of language based on AMRs (chapter5) and how can we incorporate external knowledge in the answer-entailing structure (chapter 6). These advances help us obtain state-of-the-art performance on two well-known natural language comprehension benchmarks.

In the second part of this thesis (chapter 7), we tackle some hard reasoning problems in the domains of math and science - geometry questions in the SAT exam and mechanics question in the AP physics exam. Solving these problems requires an ability to incorporate the rich domain knowledge as well as an ability to perform reasoning based on this knowledge. We propose a parsing to programs (P2P) approach for these problems. P2P assumes a formal representation language of the domain and domain knowledge written down as programs. This domain knowledge can be manually provided by a domain expert, or, as we show in our work, can be extracted by reading a number of textbooks in an automated way. When presented with a question, P2P learns a representation of the question in the formal language via a multi-modal semantic parser. Then, it uses the formal question interpretation and the domain knowledge to obtain an answer by using a probabilistic reasoner.

A key bottleneck in building these models is the amount of domain-specific supervision required to build them. Thus, in the final part of this thesis (chapter 8), we propose a self-training method based on curriculum learning that jointly learns to generate and answer questions. This method obtains near state-of-the-art models on a number of natural language comprehension tests with lesser supervision.

155 pages

Thesis Committee:
Eric P. Xing (Chair)
Jaime Carbonell
Tom Micthell
Dan Roth (University of Pennsylvania)

Roni Rosenfeld, Head, Machine Learning Department
Tom M. Mitchell, Interim Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science