Entity Resolution, the problem of resolving token sequences in text to discourse entities, is a key problem in the Natural Language Processing domain, to solve which, statistical machine learning techniques are increasingly being utilized.
One of the most commonly-used frameworks for viewing Entity Resolution as a supervised learning problem is the Mention-Pair model, where as a first step, pairwise decisions are taken regarding every pair of token sequences, or mentions, regarding whether they refer to the same discourse entity or not. A trained classier is used for this purpose. The second step involves creating coreference chains taking into account these pairwise decisions.
In this dissertation, we present two supervised learning approaches to Entity Resolution. The first approach uses Affinity Propagation, a message-passing algorithm, to create coreference chains, taking as input pairwise probabilities of coreference given by a trained classier. This method also seeks to incorporate linguistic information like Part-of-Speech tags.
The second approach looks at methods that enforce transitivity within coreference chains while building them, and attempts to learn only over valid configurations. It uses the concept of pseudo-likelihood for this.
Both our methods perform better than the baselines we compare them against, and on par, or better than the other supervised learning approaches to Entity Resolution published on the Semeval Coreference Resolution Task dataset.
|School||UNIVERSITY OF CALIFORNIA, IRVINE|
About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.
PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.