Data integration and applications of functional gene networks in Drosophila melanogaster
by Costello, James Christopher, Ph.D., INDIANA UNIVERSITY, 2009, 204 pages; 3380070

Abstract:

Understanding the function of every gene in the genome is a central goal in the biological sciences. This includes full characterization of a genes phenotypic effects, molecular interactions, the evolutionary forces that shape its function(s), and how these functions interrelate. Despite a long history and considerable effort to understand all genes in a genome, mainly focused on "model" organisms, which include bacteria, yeast, worm, flies, and mouse, we are still far from accomplishing this task. For example, the experimentally amenable eukaryotic organism, Saccharomyces cerevisiae (yeast), has a limited set of roughly 6,000 genes, yet we still do not know a single function for over 1,000 of these genes. This problem only gets worse for metazoan organisms. For the fruit fly, Drosophila melanogaster, a major biomedical model organism, experimental evidence exists for only ≈ 40% of its roughly 15,000 genes. Both new and improving genomics technologies can help us fully characterized gene function. These assay methods have the advantage of being high-throughput, but can be challenging to interpret. In this thesis, I take a computational approach to address the growing problem of managing, integrating, and analyzing large-scale genomics data for Drosophila melanogaster. This dissertation demonstrates several complementary and novel findings. First, I have shown that disparate sources of data can be integrated together to derive a network with richer information than any individual data source for fly. This result is in support of similar work done in yeast, worm, mouse, and human. Second, I have demonstrated several ways in which the integrated gene networks can be utilized, which include predicting function onto unannotated genes, providing evidence that strong functional relationships tend to show signatures of purifying selection, and using the networks to reanalyze and derive new insight from previously published genome-scale datasets. Third, I have shown that the computational predictions made using the gene networks are reliable through machine learning techniques. Lastly, I have shown that gene networks and experimental data can be used to inform how biological processes are related.

 
AdvisersMehmet M. Dalkilic; Justen R. Andrews
SchoolINDIANA UNIVERSITY
SourceDAI/B 70-12, p. , Jan 2010
Source TypeDissertation
SubjectsBioinformatics; Information science
Publication Number3380070
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3380070
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.