All events taking place within a cell, from gene expression to activation and deactivation of enzymes to signal transduction and programmed cell death, are tightly regulated. The need to coordinate all these events and maintain the intricate balance of biomolecules necessary for survival and proper functioning of the cell, produced, through selective pressure, a complex ensemble of diverse regulatory mechanisms. The body of work presented here proposes a set of computational methods for elucidating some of the elements of the cellular regulation machinery. The first part of the thesis addresses post-transcriptional regulation via the small interfering RNA (siRNA) targeted degradation. We are especially interested in understanding the relationship between small interfering RNA and the naturally overlapping transcripts in plants, in response to stress conditions. Our experiments are based on data generated by a novel high-throughput pyrosequencing protocol, and in order to analyze this data we developed a probabilistic framework for matching small RNA sequences against the reference genome. By addressing some of the protocol-specific sequencing errors, we are able to recover up to between 26.4% and 28.8% more small RNAs in comparison to the standard sequence matching techniques. The second part of the thesis deals with the regulation of protein function via the post-translational modification (PTM) mechanism. More specifically, we are interested in understanding the sequence and structural determinants which mediate recognition of post-translational modification sites. To this end, we developed a visualization method which facilitates understanding of the statistically significant properties of sequence motifs recognized by enzymes that catalyze post-translational modifications. In addition, we developed a novel class of kernels based on combinatorial enumeration of the local topological elements in labeled graphs, which we termed the graphet kernel. Graphlet kernels are suitable for tackling a number of classification problems arising in bioinformatics research, such as prediction of functional sites in proteins or analysis of protein-protein interaction networks. Applied to a graph representation of protein structure, graphlet kernel is able to accurately predict PTM sites.
|School||UNIVERSITY OF CALIFORNIA, RIVERSIDE|
About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.
PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.