Modeling Variability in Item Parameters in Educational Measurement (CT-01-07)
by Cees A. W. Glas and Wim J. van der Linden, University of Twente, Enschede, The Netherlands
In the analysis of data for the Law School Admission Test (LSAT), a mathematical model called item response theory (IRT) is applied to derive estimates of statistics, commonly called item parameters, that describe the statistical characteristics of the test items. Typically, three item parameters are used to describe LSAT items, and these specify the discrimination power, difficulty level, and susceptibility to guessing for an item. In paper-and-pencil testing, since a large number of test takers respond to each test item, these item parameters are very stable and, once estimated, are usually treated as fixed values.
In some areas of educational measurement, it is thought that item parameters should be modeled as random rather than fixed values. Examples of such areas are: testing in which the items for each test taker are sampled from a pool, such as in computerized adaptive testing (CAT); testing in which the items are generated by the computer; and test formats with items grouped under a common stimulus or in a common context.
This research investigated a method for modeling item parameters as random rather than fixed. A mathematical model called the three-parameter normal-ogive model (similar to the three-parameter IRT model described above) was used to model the variability of random item parameter values. The results uncovered one situation that caused the estimation procedure to break down. However, a solution to this problem was derived and presented. A simulation study was conducted using the Bayesian procedures for estimating the hyperparameters that are explicated in this paper. The results were favorable, although the differences between the true values and the estimated values were smaller for item discrimination and guessing parameters than for the difficulty parameter. Generally, the item difficulty parameter is easier to estimate. The reason for these results, however, was that the population variance for item difficulty was much larger than the population variance for either of the other parameters in this study.