|
CMU-CS-03-216
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-03-216
Modeling Syntax for Parsing and Translation
Peter Venable
December 2003
Ph.D. Thesis
CMU-CS-03-216.ps
CMU-CS-03-216.ps.gz
CMU-CS-03-216.pdf
Keywords: Statistical, syntax, parsing, translation
Syntactic structure is an important component of natural language
utterances, for both form and content. Therefore, a variety of
applications can benefit from the integration of syntax into their
statistical models of language. In this thesis, two new syntax-based
models are presented, along with their training algorithms: a
monolingual generative model of sentence structure, and a model of the
relationship between the structure of a sentence in one language and
the structure of its translation into another language. After these
models are trained and tested on the respective tasks of monolingual
parsing and word-level bilingual corpus alignment, they are
demonstrated in two additional applications. First, a new statistical
parser is automatically induced for a language in which none was
available, using a bilingual corpus. Second, a statistical
translation system is augmented with syntax-based models. Thus the
contributions of this thesis include: a statistical parsing system; a
bilingual parsing system, which infers a structural relationship
between two languages using a bilingual corpus; a method for
automatically building a parser for a language where no parser is
available; and a translation model that incorporates phrase structure.
130 pages
|