Supervised and knowledge-based methods for disambiguating terms in biomedical text using the UMLS and MetaMap
by Thomson McInnes, Bridget, Ph.D., UNIVERSITY OF MINNESOTA, 2009, 249 pages; 3373429

Abstract:

Word Sense Disambiguation is the task of automatically identifying the appropriate sense (or concept) of an ambiguous word, for example, the term cold could refer to the temperature or a virus depending on the context in which it is used. Not being able to identify the intended concept of an ambiguous word negatively impacts the accuracy of biomedical applications such as medical coding and indexing which are becoming essential in the biomedical and clinical world with the push towards electronic medical records and the growing amount of information that is available to biomedical researchers and clinicians. This dissertation focuses on disambiguating ambiguous words in biomedical text.

This dissertation presents two methods, K-CUI and A-CUI, that can disambiguate ambiguous terms in any biomedical text using information from the Unified Medical Language System (UMLS). K-CUI explores the use of Concept Unique Identifiers (CUIs) as assigned by MetaMap, as features for a supervised learning method for word sense disambiguation. It also investigates four techniques to reduce the noise in the feature set by restricting which CUIs to include. The first technique is windowing, whose results show that in biomedical text indicative CUIs are highly localized. The second is a frequency cutoff, whose results show that when a dataset contains a high majority concept, the features that only occur a few times are essential in disambiguating the minority concepts. The third is a MetaMap Indexing cutoff, whose results show that word concepts are correlated with the topical information describing an instance. The fourth is a semantic similarity cutoff, whose results show in biomedical text, indicative features have a high semantic similarity with at least one of the possible concepts of the ambiguous word.

 
AdvisersTed Pedersen; John Carlis
SchoolUNIVERSITY OF MINNESOTA
SourceDAI/B 70-09, p. , Nov 2009
Source TypeDissertation
SubjectsComputer science
Publication Number3373429
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3373429
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.