Computer Science Department
School of Computer Science, Carnegie Mellon University


Annotating and Automatically Tagging
Constructions of Causal Language

Jesse Dunnietz

Ph.D. Thesis

February 2018


Keywords: Natural Language Processing, Semantics, Construction Grammar, Causality, Computational Linguistics, Linguistic Annotation

Automatically extracting relationships such as causality from text presents a challenge at the frontiers of natural language processing. This thesis focuses on annotating and automatically tagging causal expressions and their cause and effect arguments.

One popular paradigm for such tasks is SHALLOW SEMANTIC PARSING-marking relations and their arguments in text. Efforts to date have focused on individual propositions expressed by individual words. While this approach has been fruitful, it falters on semantic relationships that can be expressed by more complex linguistic patterns than words. It also struggles when multiple meanings are entangled in the same expression. Causality exhibits both challenges: it can be expressed using a variety of words, multi-word expressions, or even complex patterns spanning multiple clauses. Additionally, causality competes for linguistic space with phenomena such as temporal relations and obligation (e.g., allow can indicate causality, permission, or both).

To expand shallow semantic parsing to such challenging relations, this thesis presents approaches based on the linguistic paradigmknown as CONSTRUCTION GRAMMAR (CxG). CxG places arbitrarily complex form/function pairings called CONSTRUCTIONS at the heart of both syntax and semantics. Because constructions pair meanings with arbitrary forms, CxG allows predicates to be expressed by any linguistic pattern, no matter how complex. This thesis advocates for a new "surface construction labeling" (SCL) approach to applying CxG: given a relation of interest, such as causality, we annotate just the words that consistently signal a construction expressing that relation. Then, to automatically tag such constructions and their arguments, we need not wait for automated CxG tools that can analyze all the underlying grammatical constructions. Instead, we can build on top of existing tools, approximating the underlying constructions with patterns of words and conventional linguistic categories.

The contributions of this thesis include a CxG-based annotation scheme andmethodology for annotating explicit causal relations in English; an annotated corpus based on this scheme; and three methods for automatically tagging causal constructions. The first two tagging methods use a pipeline architecture based on tentative pattern-matching to combine automatically induced rules with statistical classifiers. The third method is a transition-based deep neural network. The thesis demonstrates the promise of these methods, discusses the tradeoffs of each, and suggests future applications and extensions.

Thesis Committee:
Jaime Carbonell (Co-chair)
Lori Levin (Co-chair)
Eduard Hovy
Nianwen Xue (Brandeis University)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science

207 pages

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by