On the Use of Collateral Item Response Information to Improve Pretest Item Calibration (CT-98-13)
William Stout, Terry Ackerman, Dan Bolt, Amy Goodwin Froelich, Dan Heck, University of Illinois at Urbana-Champaign

Executive Summary

An increasingly important issue in considering the feasibility of a computerized Law School Admission Test (LSAT) is the development of accurate statistical estimation tools that work well for the modest amounts of test taker data per test question (”item”) that are typical of computerized tests. This is especially important for pretest items, whereby, currently, a pretest item is an item administered on an earlier exam as a nonoperational item in order to assess its characteristics so the item can be effectively used operationally on future LSAT administrations.
In computer adaptive testing, the rates at which items are exposed to test takers must be carefully controlled to ensure test security. Hence a very large operational item pool, whose items have already had their statistical properties analyzed at the pretest stage described above, must continuously be maintained and gradually renewed. Thus the goal of obtaining accurate estimates of characteristics (called “item parameter calibration”) of pretest items by using as few test takers as possible for each pretest item becomes even more important in computer adaptive testing than in convential paper-and-pencil testing. In this study, several approaches for extracting useful collateral information for the purpose of pretest item parameter calibration are developed and studied. In the general statistical estimation context, collateral information refers to additional estimation information derived from variables that are distinct from, but correlated with, the studied relevant variable of interest.

That LSAT currently consists of three item types: logical reasoning (LR), analytical reasoning (AR), and reading comprehension (RC). Even though separate scores are not currently reported for the different item types, the reporting of separate scores for at least some item types could be considered for a computerized LSAT. Thus, for calibrating pretest items for one item type, the use of collateral information from some other item type could be helpful.

The project’s specific goal is the investigation of the use of such collateral information to aid in reducing the number of test takers per pretest item required for calibrating test items. For the LSAT, collateral information may be of particular value in calibrating pretest items, because, as previous studies have suggested, the LSAT measures two dominant abilities (”dimensions”). One such dimension is defined by the AR items, and the second by the combined LR and RC items. While in practice, pretest RC items may be calibrated by considering only RC operational item responses, it is also possible to use test taker performances on AR items as well (i.e., collateral information from the AR items) to aid in the calibration of pretest RC items.

This study evaluates the practical benefit (if any) of using collateral information from one item type when statistically analyzing pretest items of some other item type. The criterion for evaluation of pretest item calibration accuracy was the reduction achieved by the use of collateral information in the number of test takers that must be administered each pretest item so that its item parameter estimates reach a specified level of accuracy. Our proposed methods involve both statistical tools that presume only one dominant ability being measured on a test and tools that assume two dominant abilities being measured, as has been confirmed to be true for the LSAT by statistical analyses as mentioned above. They also involve tools that include and exclude an intermediate ability estimation stage. These methods are described in detail and evaluated through simulation studies.
Results from simulation studies demonstrate that there is a practically important amount of improvement in calibrating pretest items of one item type attributable to the use of collateral information from some other item type provided only a moderate number of operational items are present for each item type. Unfortunately, for purposes of an LSAT computerized administration as currently contemplated by Law School Admission Council (LSAC) where the number of operational items per item type is expected to be more substantial, the high level of ability estimation accuracy for any one item type (because of the more substantial number of items being anticipated) seems to offset any advantage that could be gained through introducing collateral information from some other item type. However, if future LSAT pretest item calibration needs to evolve in any way where short substantively similar operational sets of items need to be calibrated, then our results suggest collateral information could provide practically important gains in pretest item calibration accuracy.

On the Use of Collateral Item Response Information to Improve Pretest Item Calibration (CT-98-13)

Research Report Index