A Framework of Quantitative Pharmacovigilance using Natural Language Processing, Statistics and Electronic Health Records
by Wang, Xiaoyan, Ph.D., COLUMBIA UNIVERSITY, 2010, 100 pages; 3448012

Abstract:

Adverse drug events (ADEs) cause public health problems world-wide. About ten percent of ADEs are estimated to cause permanent disability. In the United States alone, ADEs cause more than 770,000 injuries or death each year. Therefore, establishing safety profiles over the market life of a drug accurately and timely is critical for patient safety. Currently, signal detection algorithms in pharmacovigilance have focused on coded and structured data. However, important clinical information, such as "feeling suicidal", is relevant for pharmacovigilance, and is generally only available in the narrative reports electronic health record (EHR).

For a long time, pharmacovigilance researchers have been seeking a real time, continuous and prospective approach. Towards this goal, this dissertation proposes a framework for a high throughput system that demonstrates the relevance and significance of using unstructured data from an EHR for pharmacovigilance. The framework consists of three components that utilize natural language processing (NLP), statistics, information theory, and narrative reports from an EHR. The first component is a prototypical framework for pharmacovigilance based on narrative clinical reports. The results demonstrate that the framework is feasible although there are a number of challenging issues such as the need to reduce the amount of confounding interdependencies. The second component is a simple but effective method to select information to reduce confounders. This study demonstrates that selecting information in narrative electronic reports based on the section improves the detection of drug-ADE types of relations. The third component is a method using information theory to further reduce inter-dependencies of clinical entities and to help characterize drug-ADE detection. The results achieved by the methodology demonstrate its effectiveness on reducing confounders and improving the precision of drug-ADE detection.

The research presented in this dissertation has produced several novel findings and provided new solutions towards the challenging problem of pharmacovigilance. In this dissertation, I provide a high throughput model and method to identify drug safety signals by mining narrative reports in an EHR, and demonstrate the potential of the method. To the best of my knowledge, this is the first study demonstrating the use of unstructured patient data, NLP, and information theory for pharmacovigilance. In conclusion, this dissertation provides a framework for the development of automated, active and prospective pharmacovigilance which could potentially unveil drug safety profiles and novel adverse events in a timely fashion.

 
AdviserCarol Friedman
SchoolCOLUMBIA UNIVERSITY
SourceDAI/B 72-05, p. , Apr 2011
Source TypeDissertation
SubjectsPublic health
Publication Number3448012
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3448012
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.