Fuzzy methods for meta-genome sequence classification and assembly
by Nasser, Sara, Ph.D., UNIVERSITY OF NEVADA, RENO, 2008, 101 pages; 3307706

Abstract:

Traditional methods obtain a microorganism's DNA by culturing it individually. Recent advances in genomics have lead to the procurement of DNA of more than one organism from its natural habitat. Indeed, natural microbial communities are often very complex with tens and hundreds of species. Assembling these genomes is a crucial step irrespective of the method of obtaining the DNA. This dissertation presents fuzzy methods for multiple genome sequence assembly.

An optimal alignment of DNA genome fragments is based on several factors, such as the quality of bases and the length of overlap. Factors such as quality indicate if the data is high quality or an experimental error. Sequence assembly does not have crisp results and is based on degree of similarity. To address this challenge we propose a sequence assembly solution based on fuzzy logic, which allows for tolerance of inexactness or errors in fragment matching and that can be used for improved assembly.

Assembly of a single organism's genome is presented using a modified dynamic programming approach with fuzzy characteristic functions. The characteristic functions are used to select alignments of sequence fragments. Assembly of environmental genomes starts with the classification of mixed fragments from different organisms into homogeneous groups. Separating closely related species is a difficult task because the fragments contain many similarities. We propose fuzzy classification using modified fuzzy weighted averages to classify fragments belonging to different organisms within an environmental genome population. Our proposed approach uses DNA-based signatures such as GC content and nucleotide frequencies as features for the classification. This divide-and-conquer strategy also improves performance on larger datasets. We evaluate our method on artificially created environmental genomes to test various combinations of organisms and on environmental genomes obtained from acid mine drainage available at National Center for Biotechnology Information. The assembler is enhanced by a parallel version for high performance.

 
AdviserFrederick C. Harris
SchoolUNIVERSITY OF NEVADA, RENO
SourceDAI/B 69-04, p. , Aug 2008
Source TypeDissertation
SubjectsBioinformatics; Computer science
Publication Number3307706
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3307706
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.