Algorithms and inference for mixture models with application to protein sequence analysis
by Fong, Youyi, Ph.D., UNIVERSITY OF WASHINGTON, 2010, 103 pages; 3406120

Abstract:

Mixture model-based clustering is a commonly used statistical tool. The first part of my dissertation describes new search algorithms for finding the partition that maximizes a criterion function, and new Markov chain Monte Carlo algorithms for drawing partitions from a target distribution. These algorithms are based on a neighborhood pruning technique that incorporates bottom-up hierarchical clustering methods. The second part of my dissertation gives a new estimator of mixture order for multivariate categorical data. The estimator is related to the finding mixture order via Bayes factors. The finite sample performance of the estimator is good, and its large sample behavior can be analyzed using rate distortion theory and is conjectured to not over-estimate mixture order, asymptotically. The third part of my dissertation uses a Bayesian mixture profile hidden Markov model to find the subfamilies in a protein family. Application to simulated and real datasets show that meaningful partitions with the correct numbers of components can be identified. As subfamilies usually differ in their functions, valuable insights can be gained through this cluster analysis.

 
AdvisersJonathan C. Wakefield; Kenneth M. Rice
SchoolUNIVERSITY OF WASHINGTON
SourceDAI/B 71-05, p. , May 2010
Source TypeDissertation
SubjectsBiostatistics; Statistics
Publication Number3406120
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3406120
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.