A survey of deoxyribonucleic acid motif finding algorithms
by Das, Modan Kumar, M.S., OKLAHOMA STATE UNIVERSITY, 2006, 78 pages; 1440408

Abstract:

Scope and method of study. Unraveling the mechanisms that regulate the expression of genes is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments which are called motifs. Given a set of DNA sequences (promoter region), the motif finding problem is the task of detecting overrepresented motifs that are good candidates for being transcription factor binding sites. Co-regulated genes are known to share some similarities in their regulatory mechanism, possibly at transcriptional level, their promoter regions might contain some common motifs that are binding sites for transcriptional regulators. In recent years, due to combined efforts of computer scientists and molecular biologists several algorithms have been developed for finding DNA motifs. The current study is a survey of these motif finding algorithms. We present some relevant information from biology, then we present a list of motif finding algorithms and describe some of these algorithms in detail, next we present some results that have been obtained in the literature using these algorithms, then we present performance comparisons of some of these algorithms and finally we present a discussion on the motif finding algorithms.

Findings and conclusions. A survey of the motif finding algorithms in the current study shows that a sensible approach to detect regulatory elements is to search for statistically overrepresented motifs in the promoter region of a set of co-regulated genes. The weak point of the currently available motif finding algorithms is that they tend to be sensitive to the noise. Noise is due to the presence of upstream sequences in the data set that do not contain the motif. All the algorithms studied were able to correctly detect the motifs that have been previously detected by laboratory experimental approaches. In addition, some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. We conclude that although the biology of the regulatory mechanism is still poorly understood, motif discovery algorithm should include most available biological information. Instead of relying on a single motif finding tool, biologists should use a few complementary tools in combination and pursue the top few predicted motifs of each rather than the single most significant motif.

 
AdviserH. K. Dai
SchoolOKLAHOMA STATE UNIVERSITY
SourceMAI/ 45-03, p. , Mar 2007
Source TypeThesis
SubjectsMolecular biology; Bioinformatics; Computer science
Publication Number1440408
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:1440408
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.