Many large-scale assessments have implemented tests that include both constructed response (CR) items and multiple choice (MC) items. Compared to scoring MC items, scoring CR items requires raters and thus introduces an additional, subjective layer to the scoring process. The use of raters in scoring CR items raises issues with respect to how to use scores from CR items along with scores from MC items.
The present study explores an approach to combining scores from CR and MC items via an extension of a hierarchical rater model (HRM). The extended HRM incorporates a latent class signal detection theory (SDT) model, which provides a useful model of rater behavior, in the first level of the model, whereas the second level relates the latent classes of the SDT model to examinee ability using a item response theory (IRT) model. In addition, scores from MC items can be used as direct indicators of ability in the second level of the HRM
SDT
model.
Simulations and analysis of real world data were conducted to examine the performance of the HRM
SDT
. The simulations showed that the rater parameters were accurately recovered for versions of the HRM
SDT
with or without MC items. The results also showed that adding MC items improved estimation of the rater parameters, and greatly improved estimation of the CR item parameters. In addition, increasing the number of CR items considerably improved estimation of the CR item parameters, but only for the HRM
SDT
without MC items. Thus, one can accurately evaluate CR item characteristics by either including MC items in the model or by adding more CR items.
The study also found that ability estimation using both CR and MC items was noticeably better than when only CR items were used. Compared to other approaches to combining CR and MC items, the approach via HRM
SDT
yielded the best estimation of ability. For example, it was found that the HRM
SDT
model provided the best weighted composite for MC and CR items, as compared to commonly used weighting schemes. Thus, the HRM
SDT
model appears to offer advantages over simply using arbitrary composite weights.