Extracting cancer genomics knowledge from biomedical literature
by Jin, Yang, Ph.D., UNIVERSITY OF PENNSYLVANIA, 2007, 119 pages; 3271773

Abstract:

The exponential proliferation of biomedical literature presents an unprecedented challenge in biomedical research, making it increasingly difficult for researchers to track and utilize relevant information to their interests. Biomedical text mining has shown recent promise in assisting with this problem by transforming textual information into structured and queryable information, using computational procedures. This dissertation outlines efforts to acquire and utilize cancer genomic knowledge from biomedical literature in an automated fashion.

This work commenced with the classification of biomedical entities across the genomic and phenotypic domains, into entity classes for genes, genomic variation, types of malignancy, and a number of malignancy phenotypic attributes. Through an iterative process, entity definitions were refined based on the consensus of annotators and domain experts during manual annotation of biomedical text. Automated entity extractors that performed with high accuracy were developed and applied to all MEDLINE abstracts. Extracted entity mentions were then assigned to standard referents by the creation and invocation of rule-based normalization methods. Finally, extracted and normalized entity mentions were integrated with microarray expression analysis to prioritize genes differentially expressed between two closely related signal transduction pathways, known to be critical differentiators for whether the pediatric tumor neuroblastoma progresses or differentiates. Pathway analysis proved that the genes determined by the integrated method were more functionally relevant to neuroblastoma, and their differential behaviors were further validated by RT-PCR experiments.

This research has demonstrated that a thoughtful annotation process can be successful for extracting information from text relevant to a particular research problem in cancer genomics. It also shows that this extracted information can assist with hypothesis generation and interpretation of lab results by synergizing with experimental data. Developed and applied in this way, biomedical text mining achieves more than the transformation of data into discrete information, as it provides a source of inferential power that can significantly enhance traditional approaches to analyze experimental data. As the biomedical text mining techniques continue to mature, the alignment of literature-based knowledge with molecular and clinical observational data will likely become more complete, likely resulting in more frequent and profound literature-based discoveries in the foreseeable future.

 
AdviserPeter S. White
SchoolUNIVERSITY OF PENNSYLVANIA
SourceDAI/B 68-07, p. , Nov 2007
Source TypeDissertation
SubjectsBioinformatics
Publication Number3271773
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3271773
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.