Statistical methods for analysis of graph-constrained genomic data
by Li, Caiyan, Ph.D., UNIVERSITY OF PENNSYLVANIA, 2009, 136 pages; 3381748

Abstract:

Graphs and networks are common ways of depicting information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of prior information accumulated over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by known biological pathways into the analysis of numerical data raises interesting statistical challenges. This dissertation develops several statistical methods for analysis of genomic data by incorporating the prior biological network information. We consider the high-dimensional regression problem when the covariates are measured on undirected graphs and develop methods for identifying genes and sub-networks that are related to the phenotypes. Specifically, we present the problem formulation, efficient computational algorithm of our procedure - GRAph-Constrained Estimator (GRACE) and develop theoretical properties of GRACE, including non-asymptotic error bounds and sign consistency for both fixed and diverging number of parameters. We also introduce an empirical Bayes method to take into account the biological network structure information using a discrete Markov Random Field model prior for identifying genes and subnetworks whose transcription activities are perturbed by or activated in response to experimental conditions. We apply both GRACE and the empirical Bayes method to a microarray gene expression study of human brain aging to identify genes or subnetworks that are related or perturbed by the human brain aging. Extensions of the proposed methods to censored survival data are also presented.

 
AdviserHongzhe Li
SchoolUNIVERSITY OF PENNSYLVANIA
SourceDAI/B 70-10, p. , Dec 2009
Source TypeDissertation
SubjectsBiostatistics
Publication Number3381748
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3381748
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.