Geometry of maximum likelihood estimation in Gaussian graphical models
by Uhler, Caroline, Ph.D., UNIVERSITY OF CALIFORNIA, BERKELEY, 2011, 114 pages; 3499094

Abstract:

Algebraic statistics exploits the use of algebraic techniques to develop new paradigms and algorithms for data analysis. The development of computational algebra software provides a powerful tool to analyze statistical models. In Part I of this thesis, we use methods from computational algebra and algebraic geometry to study Gaussian graphical models. Algebraic methods have proven to be useful for statistical theory and applications alike. We describe a particular application to computational biology in Part II.

Part I of this thesis investigates geometric aspects of maximum likelihood estimation in Gaussian graphical models. More generally, we study multivariate normal models that are described by linear constraints on the inverse of the covariance matrix. Maximum likelihood estimation for such models leads to the problem of maximizing the determinant function over a spectrahedron, and to the problem of characterizing the image of the positive definite cone under an arbitrary linear projection. In Chapter 2, we examine these problems at the interface of statistics and optimization from the perspective of convex algebraic geometry and characterize the cone of all sufficient statistics for which the maximum likelihood estimator (MLE) exists. In Chapter 3, we develop an algebraic elimination criterion, which allows us to find exact lower bounds on the number of observations needed to ensure that the MLE exists with probability one. This is applied to bipartite graphs, grids and colored graphs. We also present the first instance of a graph for which the MLE exists with probability one even when the number of observations equals the treewidth. Computational algebra software can be used to study graphs with a limited number of vertices and edges. In Chapter 4, we study the problem of existence of the MLE from an asymptotic point of view by fixing a class of graphs and letting the number of vertices grow to infinity. We prove that for very large cycles already two observations are sufficient for the existence of the MLE with probability one.

Part II of this thesis describes an application of algebraic statistics to association studies. Rapid research progress in genotyping techniques have allowed large genome-wide association studies. Existing methods often focus on determining associations between single loci and a specific phenotype. However, a particular phenotype is usually the result of complex relationships between multiple loci and the environment. We develop a method for finding interacting genes (i.e. epistasis) using Markov bases. We test our method on simulated data and compare it to a two-stage logistic regression method and to a fully Bayesian method, showing that we are able to detect the interacting loci when other methods fail to do so. Finally, we apply our method to a genome-wide dog data set and identify epistasis associated with canine hair length.

 
AdviserBernd Sturmfels
SchoolUNIVERSITY OF CALIFORNIA, BERKELEY
SourceDAI/B 73-07(E), p. , Mar 2012
Source TypeDissertation
SubjectsApplied mathematics; Genetics; Statistics
Publication Number3499094
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3499094
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.