Using Response-Time Constraints in Item Selection to Control for Differential Speededness in Computerized Adaptive Testing (CT-98-03)
by Wim J. van der Linden, University of Twente, Enschede, The Netherlands; David J. Scrams, Virtual Psychometrics; and Deborah L. Schnipke, Virtual Psychometrics
Test takers tend to differ from one another in the amount of time required to respond to items. This is true even among test takers of the same ability level. Although this finding is not surprising, it may lead to a serious scoring problem. If some test takers do not complete all test items, the test-scoring procedure must include a provision for unreached items. Such items could be treated as incorrect (e.g., a test taker's final score could be influenced by the number of unreached items) or unreached items could be ignored (i.e., treated as missing). This decision should be made according to beliefs about the independence and relative importance of response speed and response accuracy in the context of the test.
If speed and accuracy are independent and the test is designed to measure accuracy, test taker ability should be based on accuracy alone, and test takers should not be penalized for unreached items. If speed and accuracy are related or if both are important in the test context, response speed may be included in the scoring rubric, and unreached items would count against a test taker. In the latter case, estimates of test taker ability would reflect both response speed and response accuracy. Realistic scoring models that combine measures of speed and accuracy are not yet available, but the scant empirical research concerning the relationship between response speed and response accuracy in large-scale testing suggest that speed and accuracy are independent factors in power tests (i.e., tests that measure accuracy alone).
The best solution to the problem of unreached items may be to design the test in such a way that they do not occur or are minimized. This could be accomplished with very generous time limits (a costly solution). Computer adaptive testing (CAT), however, offers an attractive alternative. Test taker speed can be assessed along with test taker ability (measured in terms of response accuracy), and the estimated test taker speed can be included in the item-selection algorithm. Thus, items are selected for a test taker that are appropriate for the test taker's ability, but are unlikely to be so time-consuming that the test taker fails to complete all test items. This solution requires a model of response speed and an item-selection algorithm that accommodates response-speed constraints. Both aspects are addressed by the current work.
A model of response speed is used as the basis for predicting a test taker's response time for each item in the item pool. Items are selected according to an algorithm for constrained CAT. The item-selection algorithm constrains item selection so that the test taker is likely to have sufficient time to answer all items while simultaneously insuring that test specifications are met and all test takers receive items that are tailored to their ability level. Response-time predictions are modified according to the time taken by the test taker to respond to items already administered. Analyses of operational data from a large-scale standardized test support the use of the response-speed model, and simulations of the item-selection algorithm demonstrate that response-time constraints could be included in item selection while maintaining test quality.
The present approach to adaptive item selection is a solution to the scoring problems introduced by differences in response speed across test takers. This solution may be preferable to the obvious alternatives of reduced test length (with a reduction in measurement precision) or increased time limits (with added administration costs). The preliminary results reported here demonstrate the reasonableness of the response-speed model and the feasibility of including response-time constraints in item selection.