Adaptive Testing With Equated Number-Correct Scoring (CT-99-12)
by Wim J. van der Linden, University of Twente, Enschede, The Netherlands

Executive Summary

Distributions of the numbers of items correct in an adaptive test are generally peaked in the neighborhood of the 50% correct score. This feature is the result of the fact that, in adaptive testing, the items are selected to have a difficulty matching the ability of the examinees. Nevertheless, several cases exist in which one would want to equate the number-correct scores on an adaptive test to a target distribution on a reference test with a different shape. For example, if the testing program offers its examinees the choice between a computerized adaptive and a linear paper-and-pencil test, the choice is only fair if the score distribution on the former is identical to the one on the latter. Also, a testing program may wish to offer its examinees a previously-released linear version of the test to help them interpret their number-correct scores. To enable the comparison, the population of examinees should produce the same distribution of observed scores on the adaptive and released tests.

In this study, a method of constrained item selection for CAT was proposed that allows for automatic equating of the number-correct scores to those on a reference test. The constraints were derived from an earlier statistical result on the conditions on the response functions of two test forms to have identical observed number-correct scores. They were implemented using the framework of constrained adaptive testing with shadow tests applied earlier to impose content specifications on an adaptive version of the LSAT (van der Linden & Reese, 1998).

The performance of the method was assessed by simulating an adaptive test from a previous item pool from the LSAT. An old linear form of the LSAT was used as the reference test. The method performed well and produced observed number-correct score distributions nearly identical to those on the old LSAT form. Also, it outperformed the method of predicting (true) number-correct scores on reference tests using the test characteristic function currently in use in several CAT programs.

