To ensure test security and fairness, alternative forms of the same test are administered in practice. However, alternative forms of the same test generally do not have the same test difficulty level, even though alternative test forms are designed to be as parallel as possible. Equating adjusts for differences in difficulties among forms of the test.
Six traditional equating methods are considered in this study: equipercentile equating without smoothing, equipercentile equating with pre-smoothing and post-smoothing, IRT true-score and observed-score equatings, and kernel equating. A common feature of all of the traditional procedures is that the end result of equating is a single transformation (or conversion table) that is used for all examinees who take the same test.
Van der Linden has proposed conditional equipercentile (or local) equating (CEE) to reduce the error of equating contained in the traditional equating procedures by introducing individual level equating. Van der Linden's CEE is conceptually closest to IRT-T in that CEE is with respect to a type of true score (&thetas;, or proficiency), but it shares similarities with to IRT-O in that CEE uses an estimated observed score distribution for each individual &thetas; to equate scores using equipercentile equating.
No real-data study has yet compared van der Linden’s CEE with each of the traditional equating procedures. Indeed, even for the traditional procedures, no study has compared all six of them simultaneously. In addition to van der Linden's CEE, two additional variations of CEE are considered: CEE using maximum likelihood (CEE-MLE) and CEE using the true characteristic curve (CEE-TCC). The focus of this study is on comparing results from CEE vis-à-vis the traditional procedures, as opposed to answering a “best-procedure” question, which would require a common conception of “true” equating.
Although the results of the traditional equating methods are quite similar, the kernel equating method and equipercentile equating with log-linear presmoothing generally show better fit to the respective original form statistical moments under various data conditions. Although IRT-T and IRT-O usually are found to be least favorable under all circumstance in terms of statistical moments, the equated raw score difference distribution illustrates more stable performance than traditional equating methods.
It was found here that the number of examinees having a particular score point does not influence results for CEE as much as it does for traditional equatings. CEE-EAP and CEE-MLE are very similar to one another and the equated score difference distributions are similar to those of IRT-O. CEE-TCC involves a part of the IRT-T procedure. Hence, CEE-TCC behaves somewhat similar to IRT-T. Although CEE results are less desirable in terms of maintaining statistical moments, the equated score differences are more consistent and stable than for the traditional equating methods.