Bayesian Modeling for High Throughput Genomic Data
by Hu, Ming, Ph.D., UNIVERSITY OF MICHIGAN, 2010, 196 pages; 3441155

Abstract:

The explosion of high throughput genomic data in recent years has already altered our view of the extent and complexity of biology. Technologically specific features, heterogeneous data structures and massive sample sizes present great challenges and opportunities to develop novel statistical methodologies in computational biology. This dissertation presents three Bayesian modeling methods in high throughput genomic data analysis.

In chapter 2, we develop a model-based gene expression query algorithm built under the Bayesian model selection framework. This algorithm is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust in the presence of sporadic outliers in the data. Our simulation studies suggest that this method outperforms existing query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons, as well as novel potential target genes of numerous key transcription factors.

In chapter 3, we introduce a novel computational algorithm named Hybrid Motif Sampler (HMS), specifically designed for transcription factor binding sites (TFBS) motif discovery in ChIP-Seq data. HMS incorporates sequencing depth information to aid motif identification, allows intra-motif dependency to describe more accurately the underlying motif pattern and combines stochastic sampling and deterministic search to accelerate the computation process. Simulation studies demonstrate favorable performance of HMS compared to other existing methods. When applying HMS to real ChIP-Seq datasets, we find that the accuracy of existing TFBS motif patterns can be significantly improved.

In chapter 4, we propose a spatial Poisson regression model to provide a portrait of base-level sequencing depth in RNA-Seq data. The model utilizes two random effects to explain the spatial correlation and the non-spatial variation and incorporates GC content effects into the mean structure for better fitting. Both simulation study and real data analysis demonstrate that this method can capture local genomic features that affect coverage depth, and therefore, offers improved quantification of the true underlying expression levels.

The research in this dissertation demonstrates that Bayesian modeling methods have achieved great success and have the potential to accelerate biomedical research.

 
AdviserZhaohui Qin
SchoolUNIVERSITY OF MICHIGAN
SourceDAI/B 72-03, p. , Feb 2011
Source TypeDissertation
SubjectsBiostatistics; Genetics; Bioinformatics
Publication Number3441155
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3441155
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.