Synergistic Associations in Systems Biology
by Watkinson, John, Ph.D., COLUMBIA UNIVERSITY, 2011, 119 pages; 3451529

Abstract:

The primary goal of this thesis is to discover multivariate interactions in high-throughput biological data. The criteria for a valid multivariate interaction is quite strict. It is required that the “whole” of the interaction demonstrate a greater association than the “sum of the parts”. This phenomenon is called synergy. This concept is rigorously defined in information theoretic terms and the subtleties of its implications explored. Various methods for estimating synergy in biological settings are presented.

Next, a gene expression data set consisting of prostate cancer samples and normal prostate tissue are used to create a “synergy network” of gene pairs with respect to the phenotype. The strongest connections in the network are validated in an independent data set. The network as well as its hub gene members could help shed light on the etiology and progression of the disease.

In a different setting, gene regulatory interactions are inferred from a compendium of E. coli gene expressiou samples. The existence of a synergistic partner interacting with a transcription factor and its putative target adds additional evidence that the transcription factor does indeed regulate the target. The method was the best-performing in the DREAM2 Genome Scale Network Challenge.

Expanding on the E. coli results, two large H. sapiens gene expression data sets consisting of ovarian cancer samples are studied. A validated interaction network is generated, and Gene Ontology enrichment is performed to find putative biological processes associated with ovarian cancer progression.

Finally, a synergy metric is defined for genotype data, called synergy disequilibrium. It is closely related to the concept of linkage disequilibrium, but directly involves a phenotype as a third variable. The method is used to characterize some well-known disease associations. The method shows real promise in discovering epistatically disease-associated pairs of loci once data sets become sufficiently large and statistically powerful.

 
Advisor
SchoolCOLUMBIA UNIVERSITY
SourceDAI/B 72-06, p. , May 2011
Source TypeDissertation
SubjectsBioinformatics
Publication Number3451529
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3451529
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.