Categorizing a continuous variable is easy for communication and statistical analysis in public health and medical research. However, categorization loses information, reduces statistical power, and biases the estimate of a dose-response association while reducing its efficiency. Further, it jeopardizes the validity and efficiency of a meta-analysis because of the single cutoff point and/or inconsistent cutoff points in the included studies.
In order to appropriately summarize the estimates from each study in a meta-analysis with comparable categories or dose-response association, a new approach on re-estimating the underlying distribution of a categorized covariate by using the published information is the first step.
This dissertation research proposes two types of approaches to estimate the underlying distribution. The first approach is linear model approach. When the underlying distribution follows a normal distribution, a linear model can be constructed by using the mean, standard deviation, and cutoff points with their cumulative probabilities in each study. The parameters can be estimated via the weighted mixed-effect linear regression model. When the underlying distribution follows a gamma distribution, a linear model is derived by applying a property of the incomplete gamma distribution. The parameters can be estimated by using a numerical iteration algorithm.
The second approach is a goodness-of-fit approach. When the parameters of the underlying distribution cannot be linearized, based on the cutoff points and their cumulative probabilities in each study, the parameter estimates minimize the distance between the expected and observed values. We also applied this approach to estimate the parameters of a categorized zero-inflated distribution: the proportion of excess zero and the continuous variables.
In addition, we discuss the impacts from categorization on the relative efficiency of estimating the parameters and the dose-response association, and the validity of the dose-response association by maximum likelihood approach via the multinomial distribution and simulation studies.
In summary, the main contribution from this dissertation is that our approaches use published data to convert from the disadvantage of inconsistent cutoff points in many studies into useful information and to improve meta-analysis. We also generalize the approaches of evaluating the impacts from categorizing a continuous variable.