This study examined interrater reliability and accuracy in coding and interpreting the Rorschach Inkblot Test according to the Comprehensive System. Previous research on the interrater reliability of the Comprehensive System focused on the reliability of coding decisions, with most studies reporting an adequate to high level of interrater reliability; however, one study reported inadequate levels of inter-rater reliability. No study has examined to what extent differences in coding decisions, as measured by the interrater reliability, affect the reliability and accuracy of interpretations derived from the test.
This study examined the interrater reliability and accuracy of the interpretive hypotheses generated by the coding decisions that make up the outcome of the test. The researcher: (1) examined the interrater reliability for each variable (i.e., variable level) and each coded response (i.e., response level), (2) generated a Structural Summary according to standard procedures, (3) examined the interrater reliability of the Structural Summary variables (i.e., protocol level), (4) generated specific interpretive hypotheses based on the data from the Structural Summary, (5) examined the interrater reliability of the interpretive hypotheses (i.e., interpretation level), and (6) compared the data generated by the participants (codings and interpretative statements) to those published in a standard text on the Comprehensive System.
The participant sample comprised of 12 experts in the Comprehensive System and 19 graduate students who recently learned the test. Coders were randomly given one of four Rorschach protocols from a published source. Coded protocols were assessed for interrater reliability against other coders, both within and between samples. To assess accuracy, participants' codings were compared to correct codings from published materials. Interpretative hypotheses derived from the participants' codings were compared to interpretative hypotheses derived from the published codings. The statistic Kappa was used to examine interrater reliability at the response and interpretation levels. An Interclass Correlation Coefficient (ICC) was used to examine interrater reliability at the protocol level.
The author hypothesized that: (1) coders would display adequate levels of interrater reliability and coding accuracy at all levels of analysis; (2) experts would display higher interrater reliability and scoring accuracy than students; (3) coders would have a lower level of scoring accuracy on rare and complex variables, but experts would have a higher level of scoring accuracy than students on these variables; (4) interrater differences in coding variables at the variable level would not significantly alter the ratios, percentages, or derived scores of the Structural Summary or most of the interpretative hypotheses derived from the Structural Summary; and (5) lower scoring accuracy on the rare and complex variables would affect the interpretive hypotheses derived from these variables.
The results of the study supported the hypotheses of the researcher. Overall, the experts and students displayed acceptable levels of interrater reliability and scoring accuracy at the response, protocol, and interpretation levels of analysis. While experts displayed higher levels of interrater reliability and scoring accuracy, both experts and students displayed adequate levels of interrater reliability and accuracy in the interpretative hypotheses derived from the test according to the Comprehensive System. As predicted, experts and students had a lower level of scoring accuracy on rare and complex low base rate variables. The interrater differences at the variable and response levels of analysis did not significantly alter most of the ratios, percentages, and derived scores of the Structural Summary. However, lower scoring accuracy for the low base rate variables affected the interpretive hypotheses derived from these variables.