Exploring gene interactions in DNA microarray data
by Ye, Yong, Ph.D., THE UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE, 2007, 128 pages; 3296785

Abstract:

DNA Microarray technology provides a powerful basis for analysis of gene expression levels. Analyzing microarray data can reveal gene activities associated with biological processes and group genes into interaction networks. The discovery and exploration of these networks will help us identify gene functions, design new methods of disease diagnosis and provide target for drug development.

Data mining methods such as clustering and association rule mining have been widely applied to Microarray data to find groups of genes that show similar expression patterns. However, these approaches usually can only find associations or co-expressions, but fail to unveil pairwise or multi-way gene interactions within one cluster. Bayesian networks, which are based on directed acyclic graph and can provide model of causal influence, have been used for causal structure learning. However learning Bayesian network structure is an NP-hard problem. Since the number of genes involved in the learning are usually in the size of hundreds or thousands, it can not be used for realtime interactive gene interaction exploration.

In this dissertation, we combine the use of graphical model and causal model with other data mining techniques to identify and screen gene interaction networks. We apply Graphical Gaussian Model (GGM) to discover undirected pairwise gene interactions. For causal structure learning, we develop an enhanced constraint based approach which allows fast exploration of directed gene interactions. To solve the dimensionality limitations of these modeling methods, we develop an algorithm based on graphical decomposition. It can divide large interaction networks into smaller components to which our modeling methods can be applied successfully.

We also develop an approach to improve the identification of gene interaction networks by incorporating protein function modules. To find meaningful protein modules from usually very sparse protein interaction networks, we first develop two strategies (data integration and using Commute Time) to increase the density of the networks. Then we apply clique finding algorithm to find cliques (and also conserved cliques) from protein interaction networks. The cliques found are further analyzed by using our modeling methods.

We design and implement an integrated system with GUI that allows rapid interactive exploration of gene relationships. We also test our methodology on both Yeast and E. coli microarray data and protein interaction data.

 
AdviserXintao Wu
SchoolTHE UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE
SourceDAI/B 69-01, p. , Apr 2008
Source TypeDissertation
SubjectsComputer science
Publication Number3296785
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3296785
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.