Linking Response-Time Parameters Onto a Common Scale (RR-08-02)
by Wim J. van der Linden, University of Twente, Enschede, The Netherlands

Executive Summary

In the analysis of data from the Law School Admission Test (LSAT) and similar standardized tests, a mathematical model called item response theory (IRT) is commonly used to estimate both the characteristics of the test questions (items) and the ability level of the test takers. Typical item-level statistics (called item parameters) estimated by an IRT model are difficulty, discrimination (i.e., the power of an item to distinguish between more able and less able test takers), and susceptibility to guessing. This research addresses the case of a testing program characterizing test items not only with respect to the IRT parameters, but also for a response-time model. The advantages of evaluating these factors simultaneously include the opportunity to check test items for dysfunction, to evaluate test forms with regard to speededness (i.e., the extent to which test takers run out of time before finishing the test), and to diagnose the test takers’ RTs for possible aberrances (e.g., answer copying) during the test. In addition, as shown in an earlier report for the Law School Admission Council, calculating these statistics simultaneously results in more stable estimates.

Although RTs on test items are recorded on a natural scale (e.g., in seconds), the scale for some of the parameters in the lognormal RT model being applied is not fixed. As a result, when the model is used to estimate item parameters, the estimates from different samples have to be mapped onto a common scale. Such mappings are possible if the samples have a design with common items (an anchor test design) and/or involve common test takers (either a single group or partially overlapping groups) or randomly equivalent samples of test takers.

In this research, several combinations of such linking designs and linking procedures that map the parameter estimates onto a common scale are examined and the precision of the results evaluated. Linking designs with a single group tend to outperform the anchor test design and the randomly equivalent groups design. In addition, for larger samples, the anchor test design produces better linking than the randomly equivalent groups design.

Linking Response-Time Parameters Onto a Common Scale (RR-08-02)

Research Report Index