Efficient inference for trees and alignments: Modeling monolingual and bilingual syntax with hard and soft constraints and latent variables
by Smith, David Arthur, Ph.D., THE JOHNS HOPKINS UNIVERSITY, 2010, 244 pages; 3440663

Abstract:

Much recent work in natural language processing treats linguistic analysis as an inference problem over graphs. This development opens up useful connections between machine learning, graph theory, and linguistics.

The first part of this dissertation formulates syntactic dependency parsing as a dynamic Markov random field with the novel ingredient of global constraints. Global constraints are enforced by calling combinatorial optimization algorithms as subroutines during message-passing inference in the graphical model, and these global constraints greatly improve on the accuracy of collections of local constraints. In particular, combinatorial subroutines enforce the constraint that the parser's output must form a tree. This is the first application that uses efficient computation of marginals for combinatorial problems to improve the speed and accuracy of belief propagation. If the dependency tree is projective, the tree constraint exploits the inside-outside algorithm; if non-projective, with discontiguous constituents, it exploits the directed matrix-tree theorem, here newly applied to NLP problems. Even with second-order features or latent variables, which would make exact parsing asymptotically slower or NP-hard, approximate inference with belief propagation is as efficient as a simple edge-factored parser times a constant factor. Furthermore, such features significantly improve parse accuracy over exact first-order methods. Incorporating additional features increases the runtime additively rather than multiplicatively.

The second part extends these models to capture correspondences among non-isomorphic structures. When bootstrapping a parser in a low-resource target language by exploiting a parser in a high-resource source language, models that score the alignment and the correspondence of divergent syntactic configurations in translational sentence pairs achieve higher accuracy in parsing the target language. These noisy (quasi-synchronous) mappings have further applications in adapting parsers across domains, in learning features of the syntax-semantics interface, and in question answering, paraphrasing, and information retrieval.

 
AdviserJason Eisner
SchoolTHE JOHNS HOPKINS UNIVERSITY
SourceDAI/B 72-03, p. , Feb 2011
Source TypeDissertation
SubjectsLinguistics; Computer science
Publication Number3440663
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3440663
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.