CUSUM Statistics for Large Item Banks: Computation of Standard Errors (CT-98-11)
by C.A.W. Glas, University of Twente
In a computer-based testing (CBT) or computerized adaptive testing (CAT) environment, the process of statistically calibrating the pool of test items may consist of the following two stages:
1. Pretesting stage. In this stage, subsets of items are administered to subsets of test takers in a series of pretest sessions, and an item response theory (IRT) model is fitted to the response data to obtain empirical estimates of such item properties as their difficulty, discriminating power, and liability to guessing.
2. Online stage. In this stage, response data are gathered in an operational computerized testing environment and used to estimate the values of the item parameters. An essential feature of online calibration is that the data are gathered sequentially and the item parameter estimates are improved in a gradual manner.
In a previous report by Glas, statistical tests for detecting unwanted differences between results from pretesting and online calibration were studied. These tests were based on three classes of statistics: Lagrange multiplier, Wald, and CUSUM statistics. For each statistic the standard errors of the parameter estimates have to be approximated. In the previous report, these standard errors were computed for the 2-parameter logistic (2-PL) and 3-parameter logistic (3-PL) models with fixed values for the guessing parameter and an approximate Fisher information matrix. However, when the number of items in the item bank becomes large, inversion of information matrices becomes a very time-demanding operation. Therefore, in the present report, the use of a block-diagonal approximation to the Fisher information matrix is investigated.
In the 2-PL model and the 3-PL model with fixed guessing parameter, every item is characterized by two unknown parameters: the discrimination and the difficulty parameter. In the diagonal approach, item information is approximated by a two-by-two matrix with diagonal entries for information on both item parameters and off-diagonal entries to account for the common part in their information. Information across items is not taken into account. Using simulation studies, it was shown that the asymptotic standard errors are underestimated by the block-diagonal approach but that the magnitude of the bias in the standard errors was relatively small. Further, it was shown that the power of the statistical test based on a CUSUM statistic using these approximated standard errors is well under control.
When employing the CUSUM test in practice, it is suggested that the CUSUM statistic be tuned to the application by running a simulation study with a test administration design and item parameter values similar to those in the real application. The results of the simulation study can then be used to determine how underestimated standard errors can be translated into threshold values for the CUSUM statistic that guarantee a test with acceptable power.