UMI  
ProQuest® Dissertations & Theses
The world's most comprehensive collection of dissertations and theses. Learn more...
ProQuest  
 
 
Computational inference of protein structure and function from microbial genomes and metagenomes
by Miller, Christopher Scott, Ph.D., UNIVERSITY OF CALIFORNIA, LOS ANGELES, 2008, 182 pages; 3347005
 

Abstract:

DNA sequences derived from genomes and metagenomes encode a wealth of information about protein structure and function. However, because of the large number of available sequences, computational and statistical methods are necessary to infer biological meaning. Here, three approaches are explored which infer protein structure or function from microbial genomes and metagenomes. First, the host-pathogen interaction between human macrophages and Mycobacterium leprae is investigated. By comparing human functional lipase domains upregulated in lepromatous lesions with the genomic repertoires of several Mycobacteria, we find that host proteins may complement lipid- associated metabolic deficiencies of M. leprae. Second, function is inferred for protein families in an ocean metagenome by identifying conserved genomic neighbors with known functions. This approach correctly infers function for many well annotated proteins, and suggests high-confidence functions for several large novel protein families. Further scrutiny of the genomic neighbors reveals that many of the novel families are phage proteins, and many other phage protein families are of bacterial origin. Finally, the information contained in large protein families derived from genome and metagenome sequences is exploited to infer residue pairs that are in contact in the 3-dimensional structures of proteins. We integrate multiple lines of evidence via a Bayesian inference procedure to produce a posterior probability of contact for all residue pairs in a protein. We use these probabilistic predicted contacts to evaluate predicted 3D protein models, and find that models that best satisfy predicted contacts are those that are most similar to correct protein structures.

 
Advisor: Eisenberg, David S.
School: UNIVERSITY OF CALIFORNIA, LOS ANGELES
Source: DAI-B 70/02, p. , Aug 2009
Source Type: Ph.D.
Subjects: Bioinformatics
Publication Number: 3347005
     
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3347005
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

 
 
 

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.il.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.



Copyright © 2007 ProQuest. All rights reserved. Terms and Conditions

ProQuest