Accurate prediction of protein function using GOstruct
by Sokolov, Artem, Ph.D., COLORADO STATE UNIVERSITY, 2011, 90 pages; 3489918

Abstract:

With the growing number of sequenced genomes, automatic prediction of protein function is one of the central problems in computational biology. Traditional methods employ transfer of functional annotation on the basis of sequence or structural similarity and are unable to effectively deal with today's noisy high-throughput biological data. Most of the approaches based on machine learning, on the other hand, break the problem up into a collection of binary classification problems, effectively asking the question “does this protein perform this particular function?”; such methods often produce a set of predictions that are inconsistent with each other.

In this work, we present GOstruct, a structured-output framework that answers the question “what function does this protein perform?” in the context of hierarchical multilabel classification. We show that GOstruct is able to effectively deal with a large number of disparate data sources from multiple species. Our empirical results demonstrate that the framework achieves state-of-the-art accuracy in two of the recent challenges in automatic function prediction: Mousefunc and CAFA.

 
AdviserAsa Ben-Hur
SchoolCOLORADO STATE UNIVERSITY
SourceDAI/B 73-04, p. , Jan 2012
Source TypeDissertation
SubjectsBioinformatics; Computer science
Publication Number3489918
Adobe PDF Access the complete dissertation:
 

» This is an open access dissertation.
  Use the link below to access the full text PDF of this graduate work:
  http://gradworks.umi.com/3489918.pdf
  Use the link below to search and retrieve all open access dissertations:
  http://pqdtopen.proquest.com

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.