COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-20-111
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-20-111

Injecting output constraints into neural NLP models

Jay Yoon Lee

Ph.D. Thesis

July 2020

Keywords: Structured Prediction, Hard constraint injection, Arbitrary knowledge injection, Natural Language Processing, Multi-task Learning, Transfer Learning, Domain adaptation, Semantic Role Labeling, Span-based models

The goal of this thesis is injecting prior knowledge and constraints into neural models, primarily for natural language processing (NLP) tasks. While neural models have set new state of the art performance in many tasks from computer vision to NLP, they often fail to learn to consistently produce well-formed structures unless there is an immense amount of training data. This thesis argues that not all the aspects of the model have to be learned from the data itself, and shows that injecting simple knowledge and constraints into neural models can help low-resource, out-of-domain settings, as well as improve state-of-the-art models.

This thesis focuses on structural knowledge of the output space and injects knowledge of correct or preferred structures as an objective to the model without any modification to the model structure, in a model-agnostic way. The first benefit in focusing on knowledge on the output space is that it is intuitive as we can directly enforce output to satisfy logical/linguistic constraints. Another advantage of structural knowledge is that it often does not require a labeled dataset.

Focusing on deterministic constraints on the output values, this thesis first applies output constraints at inference time via the gradient-based inference (GBI) method. In the spirit of gradient-based training, GBI enforces constraints for each input at test-time by optimizing continuous model weights until the network's inference procedure generates an output that satisfies the constraints.

Then, this thesis shows that constraint injection on inference-time can be extended to training time: from instance-based optimization at test time to generalization to multiple instances at training time. In training with structural constraints, this thesis presents (1) a structural constraint loss, (2) a joint objective of structural loss and supervised loss on a training set and, (3) a joint objective in a semi-supervised setting. All the loss functions show improvements and among them, the semi-supervised approach shows the largest improvement and is particularly effective in a low-resource setting. The analysis shows that the efforts at training time and at inference time are complementary rather than exclusive: the performance is best when efforts on train-time and inference-time methods are combined.

Finally, this thesis presents an agreement constraint on a multi-view learning that can utilize the semi-supervised approach with the constraint. The presented agreement constraint in multi-view learning is general in that it can be applied to any sequence-labeling problem with multiple views, while other constraints in this thesis consider prior knowledge about specific tasks. This semi-supervised approach again shows large gains in low-resource settings and shows effectiveness on high-resource as well.

126 pages

Thesis Committee:
William W. Cohen (Co-Chair)
Jaime G. Carbonell (Co-Chair)
Graham Neubig
Yulia Tsvetkov
Dan Roth (University of Pennsylvania)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu