A study of methods for missing data problems in epidemiologic studies with historical exposures
by Zhang, Xinbo, Ph.D., UNIVERSITY OF SOUTHERN CALIFORNIA, 2009, 141 pages; 3355323

Abstract:

In this thesis we consider methods for a specific missing pattern where missing values occur across the exposure history of an individual, thus creating gaps and holes in the exposure history. We propose a missing indicator induced intensity (MIND) method under the rare disease assumption. The idea originated from Prentice (1982) and the theoretical development can find its root within the Cox regression framework under cohort design, in which the essential part is the parametrization of the induced intensity. The parametrization of the "induced" intensity actually reflects the missing mechanism, therefore the missing mechanism assumption such as missing completely at random (MCAR), missing at random (MAR) found in other literature is no longer required.

The MIND method is compared against simple imputation methods in a Monte Carlo simulation study under logistic model with case-control sampling and demonstrates to be better in term of bias and efficiency compared to the single imputation methods considered, and far superior to the complete case analysis method in term of relative efficiency. The method is shown to reach an asymptotic efficiency equal to the expected non-missing proportion under cohort design, assuming the exposure and the missingness as a pair is independently and identically distributed across different years. Under nested case-control sampling design, the asymptotic efficiency varies slightly but stays close to cohort design nonetheless. Under rare disease assumption, the method can be bridged back to case-control design based on logistic model. The method is then applied to the University of Southern California prostate cancer-pesticide pilot study to assess its performance. The MIND method is overall efficient and flexible in solving the missing data problem where the missingness occurs in the exposure history. The method can be further improved with better parametrization for the induced intensity under complex situations, especially when the exposure is correlation over years.

Keywords: Missing data, case-control study, exposure history, Cox regression, logistic regression, asymptotic efficiency.

 
AdviserBryan Langholz
SchoolUNIVERSITY OF SOUTHERN CALIFORNIA
SourceDAI/B 70-05, p. , Jul 2009
Source TypeDissertation
SubjectsBiostatistics; Statistics; Epidemiology
Publication Number3355323
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3355323
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.