Targeted maximum likelihood estimation techniques for time to event data and the implications of coarsening an explanatory variable of interest via dichotomization in the context of causal inference in semi-parametric models
by Stitelman, Ori Michael, Ph.D., UNIVERSITY OF CALIFORNIA, BERKELEY, 2010, 107 pages; 3449077

Abstract:

This dissertation focuses on three important issues in causal inference. The three chapters focus on the common theme of causal inference in semi-parametric models. The first two chapters focus on further developing targeted maximum likelihood estimation (TMLE) methods for particular situations in survival analysis. Chapter 1 presents the collaborative targeted maximum likelihood estimator (C-TMLE) for the treatment specific survival curve. This estimator improves upon commonly used estimators in survival analysis and is particularly necessary for analyzing observational studies, data that exhibits dependent censoring, or both. Chapter 2 presents two interesting parameters of interest for quantifying effect modification in time to event studies. It then presents the TMLE for estimating these parameters. The third chapter presents the implicit assumptions practitioners make but do not acknowledge when dichotomizing treatment/exposure variables when trying to assess the causal effect of those variables.

Chapter 1. Current methods used to analyze time to event data either, rely on highly parametric assumptions which result in biased estimates of parameters which are purely chosen out of convenience, or are highly unstable because they ignore the global constraints of the true model. By using Targeted Maximum Likelihood Estimation (TMLE) one may consistently estimate parameters which directly answer the statistical question of interest. Targeted Maximum Likelihood Estimators are substitution estimators, which rely on estimating the underlying distribution. However, unlike other substitution estimators, the underlying distribution is estimated specifically to reduce bias in the estimate of the parameter of interest. We will present here an extension of TMLE for observational time to event data, the Collaborative Targeted Maximum Likelihood Estimator (C-TMLE) for the treatment specific survival curve. Through the use of a simulation study we will show that this method improves on commonly used methods in both robustness and efficiency. In fact, we will show that in certain situations the C-TMLE produces estimates whose mean square error is lower than the semi-parametric efficiency bound. We will also demonstrate that a semi-parametric efficient substitution estimator (TMLE) outperforms a semiparametric efficient non-substitution estimator (the Augmented Inverse Probability Weighted estimator) in sparse data situations. Lastly, we will show that the bootstrap is able to produce valid 95 percent confidence intervals in sparse data situations, while influence curve based inference breaks down.

Chapter 2. The Cox proportional hazards model or its discrete time analogue, the logistic failure time model, posit highly restrictive parametric models and attempt to estimate parameters which are specific to the model proposed. These methods are typically implemented when assessing effect modification in survival analyses despite their flaws. The targeted maximum likelihood estimation (TMLE) methodology is more robust than the methods typically implemented and allows practitioners to estimate parameters that directly answer the question of interest. TMLE will be used in this chapter to estimate two newly proposed parameters of interest that quantify effect modification in the time to event setting. These methods are then applied to the Tshepo study, to assess if either gender or baseline CD4 level modify the effect of two cART therapies of interest, efavirenz (EFV) and nevirapine (NVP), on the progression of HIV. The results show that women tend to have more favorable outcomes using EFV while males tend to have more favorable outcomes with NVP. Furthermore, EFV tends to be favorable compared to NVP for individuals at high CD4 levels.

Chapter 3. It is common in analyses designed to estimate the causal effect of a continuous exposure/treatment to dichotomize the variable of interest. By dichotomizing the variable and assessing the causal effect of the newly fabricated variable practitioners are implicitly making assumptions, though typically these assumptions are ignored in the interpretation of the resulting estimates. In this chapter we formally address what assumptions are made by dichotomizing variables to assess the semi-parametrically adjusted associations of these constructed binary variables and an outcome. Two assumptions are presented, either of which must be met, in order for the estimates of the causal effects to be unbiased estimates of the parameters of interest. Those assumptions are titled the Mechanism Equivalence and Effect Equivalence assumptions. Furthermore, we quantify the bias induced when these assumptions are violated. Lastly, we present an analysis of a Malaria study that exemplifies the danger of naively dichotomizing a continuous variable to assess a causal effect.

 
AdviserMark J. van@der@Laan
SchoolUNIVERSITY OF CALIFORNIA, BERKELEY
SourceDAI/B 72-06, p. , Apr 2011
Source TypeDissertation
SubjectsBiostatistics; Statistics
Publication Number3449077
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3449077
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.