Automated methods of auditing and using terminology/ontology knowledge bases for natural language processing
by Fan, Jung-Wei, Ph.D., COLUMBIA UNIVERSITY, 2009, 120 pages; 3400568

Abstract:

Due to our cognitive nature of communicating in natural language, narrative information plays a critical role in storing and disseminating knowledge. In a knowledge-intensive domain such as biomedicine, the overhead to digest huge amount of texts in clinical reports, research literature, and consumer websites, is extremely demanding. Biomedical natural language processing (BioNLP) is an informatics specialty that aims to automatically analyze and restructure biomedical text into more digestible size and format so that it can be easily post-processed by humans or other automated programs. In order to handle the comprehensive lexical and semantic knowledge in biomedicine, BioNLP systems need to incorporate domain-specific terminology/ontology knowledge bases. In addition, using standardized lexical/semantic entities will benefit the interoperability between BioNLP systems and associated applications. However, two major issues have been observed as hindering the optimal use of terminology/ontology for BioNLP: First, the existing terminology/ontology knowledge bases are not customized for NLP purposes and contain problematic contents; Second, automated solutions for improving and using the knowledge bases are still inadequate and therefore limiting their use in BioNLP.

To address the issues, corresponding solutions were proposed in the dissertation both to improve terminology/ontology for BioNLP purposes and to demonstrate feasibility of using terminology/ontology in BioNLP applications. For the first task, two automatic classifiers were developed to reclassify and audit semantic classification of terminology concepts. The classifiers use empirical language features and complement other auditing methods that apply ontological principles. For the second task, we developed unsupervised methods that use terminology/ontology for word sense disambiguation (WSD). The methods can help reduce the labor of manual annotation and sample representative evaluation instances for WSD research. Promising results have been achieved in both tasks and we have made the reclassified concepts a public database for the community. The results also enhanced our understanding about the biomedical terminology/ontology knowledge bases and pointed out interesting directions for future research. The methods by the dissertation can be generalized to other fields and should promote the use of standardized terminology/ontology in biomedicine and healthcare.

 
AdviserCarol Friedman
SchoolCOLUMBIA UNIVERSITY
SourceDAI/B 71-03, p. , Apr 2010
Source TypeDissertation
SubjectsBioinformatics; Artificial intelligence
Publication Number3400568
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3400568
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.