Computational inference of genetic regulatory networks in human cancer cells
by Margolin, Adam Arne, Ph.D., COLUMBIA UNIVERSITY, 2008, 282 pages; 3299359

Abstract:

The dysregulated activity of oncogenic transcription factors contributes to neoplastic transformation by promoting aberrant expression of target genes involved in regulating cell homeostasis. Therefore, characterization of the regulatory networks controlled by these transcription factors is a critical objective in understanding the molecular mechanisms of cell transformation. Modern high throughput technologies are providing the first window into regulatory processes on the genome-scale, foretelling the ability of computational inference algorithms to produce models of regulatory networks that will revolutionize our understanding and treatment of cancer biology by (1) describing how genomic alterations cause functional disruptions in the network regulating cell homeostasis, leading to aberrant cell growth and cancer, and (2) predicting therapeutic interventions, in which critical components of the network can be targeted to revert the cancer phenotype.

This thesis will develop methods that advance the current state of the art in inferring transcriptional regulatory networks from high throughput data, with specific application to both gene expression and ChIP-on-chip data. Prior to this thesis, several methods had been proposed to infer regulatory networks from microarray data; however, these methods were applicable only to model organisms, such as yeast, due to high computational complexity. Moreover, all methods relied to some extent on various assumptions that are not biologically realistic. Here, I will develop a novel method, based on information theory, that overcomes these limitations in that it has low computational complexity, allowing application to mammalian systems, and makes minimal assumptions about the structure of the network or about the type of statistical interaction between genes (e.g. linear models). I will apply this method to reconstruct the first genome-wide regulatory network inferred from microarray data for mammalian cells, and further demonstrate how this method can be used to deduce regulatory interactions between subnetworks controlled by different oncogenes, using only microarray data. I will extend this analysis, again using the tools of information theory, to consider inference of interactions involving more than two variables. To do so, I provide a rigorous definition of statistical dependency in the multivariate setting, which previously had not been done. I demonstrate that this framework effectively identifies groups of genes that interact in a pathway to jointly regulate a common set of targets. While the microarray analysis methods are motivated by issues specific to inferring gene regulatory networks, the resulting algorithmic advances are novel from a purely mathematical/computational perspective, and should be generally applicable to reverse engineering networks from measurements of the interacting variables, which is a general problem both in other branches of systems biology (e.g. metabolic networks, neural networks), as well as scientific applications outside of systems biology (e.g. social networks, electrical networks).

In the second part of the thesis I consider analysis of ChIP-on-chip experiments, which is a new technology that more directly measures transcription factor-chromatin interactions. I show that existing methods to analyze these data are not able to assign meaningful statistical significance scores (p-values) to bound promoters, due to a number of flawed assumptions. I then develop a data driven method that accurately predicts the extent of TF/DNA binding, and reveals an order of magnitude more interactions than previous methods. When combined with DNA sequence and gene expression data, I will demonstrate how application of this method can deduce regulatory networks of substantially greater complexity than previously appreciated. Moreover, I use this method to analyze the interaction between regulatory networks controlled by two important proto-oncogenes (MYC and NOTCH1), which were predicted to be statistically significantly overlapping by the gene expression-based analysis of the first section. This analysis reveals that these networks are in fact virtually completely overlapping, with MYC and NOTCH1 jointly regulating several thousand targets.

Much additional work must be done in this new field, both computationally and technologically, to reach the goal of building predictive models able to describe the connection between genomic alterations and malignancies such as cancer. However, this thesis takes steps in this direction by developing computational methods to leverage cutting-edge genome-wide measurement technologies to understand the regulatory networks controlling cellular function and homeostasis. The resulting systems-level view of transcriptional regulation already reveals fundamentally more complexity than previously anticipated, altering the traditional view of genetic regulatory networks.

 
AdviserAndrea Califano
SchoolCOLUMBIA UNIVERSITY
SourceDAI/B 69-01, p. , Apr 2008
Source TypeDissertation
SubjectsGenetics; Bioinformatics; Artificial intelligence
Publication Number3299359
Adobe PDF Access the complete dissertation:
 

» This is an open access dissertation.
  Use the link below to access the full text PDF of this graduate work:
  http://gradworks.umi.com/3299359.pdf
  Use the link below to search and retrieve all open access dissertations:
  http://pqdtopen.proquest.com

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.