To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology.
This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes.
A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data.
With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data.
The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.