Predicting protein molecular function
by Engelhardt, Barbara Elizabeth, Ph.D., UNIVERSITY OF CALIFORNIA, BERKELEY, 2007, 162 pages; 3311662

Abstract:

The number of known nucleotide sequences encoding proteins is growing at an extraordinarily fast rate due to technologies developed in the last decade that enable rapid sequence acquisition. Such rapid acquisition is a prelude to understanding the molecular function and tertiary structure of these protein sequences, and from there to an understanding of the role these proteins play in a particular organism. The experimental technologies that enable us to understand molecular function have not progressed as fast as those for sequencing. One role of computational biology is to accurately predict protein molecular function based on the protein’s sequence alone.

Phylogenomics is a field of study that approaches the problem of protein molecular function prediction from an evolutionary perspective. In particular, a phylogenomic analysis transfers existing (but sparse) molecular function annotations to a query protein based on a reconciled phylogeny, which explicitly represents the evolutionary relationships of a set of related proteins. In my dissertation, I formalize the phylogenomics methodology as a statistical graphical model of molecular function evolution. Within this framework, we can predict protein molecular function from protein sequence alone. Molecular function evolution is represented as a simple continuous time Markov chain, and the random variables at each node in the tree are a subset of functional terms from the Gene Ontology. The model is encapsulated in a framework called

SIFTER

(Statistical Inference of Function Through Evolutionary Relationships).

S

IFTER

has performed well on a number of diverse protein families, as compared to standard annotation transfer methods and other phylogenomics-based approaches. S

IFTER

has been applied to the complete genomes of 46 fungal species, and is able to make molecular function predictions for a large percentage of the predicted proteins in these genomes. Moreover, through these predictions we can explore some genomic comparisons for fungi. Motivated by the high cost of characterization experiments, active learning techniques have also been applied to

SIFTER

’s protein function predictions, with good results.

 
AdviserMichael I. Jordan
SchoolUNIVERSITY OF CALIFORNIA, BERKELEY
SourceDAI/B 69-05, p. , Sep 2008
Source TypeDissertation
SubjectsBioinformatics; Artificial intelligence; Computer science
Publication Number3311662
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3311662
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.