UMI  
ProQuest® Dissertations & Theses
The world's most comprehensive collection of dissertations and theses. Learn more...
ProQuest  
 
 
Methods for haplotype construction and their applications
by Ayers, Kristin Lynn, Ph.D., UNIVERSITY OF CALIFORNIA, LOS ANGELES, 2008, 138 pages; 3316973
 

Abstract:

Haplotypes are frequently used in association testing and can improve the power to detect a disease locus. The EM algorithm is a widely used method for haplotype frequency estimation in short regions showing linkage disequilibrium. The optimal size of these regions, referred to as a block or window, has come into question when imputing maternal and paternal haplotypes. We propose two methods to improve haplotype imputation. Chapters 2 and 3 describe a dictionary model for haplotyping and its applications. According to the model, a haplotype is constructed by randomly concatenating haplotype segments from a given dictionary of haplotype segments. The dictionary model produces a parsimonious list of overlapping haplotype segments, which may parallel what remains from full length ancestral haplotypes after recombination and mutation have broken them into smaller fragments. Likelihood evaluations rely on forward and backward recurrences similar to the ones encountered in hidden Markov models. Parameter estimation is carried out with the EM algorithm.

These estimated haplotype segments in the dictionary may be used to haplotype (or phase) individuals and estimate missing genotypes using an MCMC method. The true pair of haplotypes corresponding to a person's multimarker genotype is reconstructed using a Markov chain that visits haplotype pairs according to their posterior probabilities. The dictionary model yields expected counts of conserved haplotype segments, which can be used as genetic predictors in association testing.

Chapter 4 proposes a diversity penalty for the frequently used EM algorithm for haplotype frequency estimation. The standard EM algorithm for haplotype frequency estimation can accommodate the penalty if one passes over to a more general MM (minorize-maximize) scheme for estimation. Our MM algorithm can improve haplotype frequency estimation, haplotyping, and missing data imputation by enforcing parsimony in estimation of haplotype frequencies. The penalty automatically and quickly discards potential haplotypes with low explanatory power. Our new MM algorithm converges in fewer iterations, dramatically reduces the computational complexity of each iteration, and eliminates marginal haplotypes from further consideration. Imposition of the diversity penalty shows large decreases in computation times compared to naive application of the EM algorithm with modest improvement in haplotyping and genotype imputation.

 
Advisor: Lange, Kenneth
School: UNIVERSITY OF CALIFORNIA, LOS ANGELES
Source: DAI-B 69/07, p. , Jan 2009
Source Type: Ph.D.
Subjects: Biostatistics; Genetics; Bioinformatics
Publication Number: 3316973
     
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3316973
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

 
 
 

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.il.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.



Copyright © 2007 ProQuest. All rights reserved. Terms and Conditions

ProQuest