Optimal Assembly of Tests with Item Sets (CT-99-04)
by Wim J. van der Linden, University of Twente, Enschede, The Netherlands
The testing format called item sets, wherein a set of items is related to a common passage or set of conditions, is well-known in educational measurement. The format is effective in that it allows for in-depth testing of such skills as reading comprehension and does not require the production of a large amount of reading material to produce reliable test scores. In automated test assembly, the presence of item sets does present serious problems, though. First of all, longer lists of test specifications have to be imposed on the test assembly process to deal with the attributes of the stimulus material (e.g., the reading passage of a reading comprehension item set). Also, the assembly process now needs logical constraints to keep item and stimulus selection consistent.
In the current project, the process of assembling tests with item sets was analyzed and various categories of item and stimulus attributes were identified. In addition, the various constraints to be imposed on the process were classified. Six new methods of automated assembly of tests with item sets were proposed. Two of these methods were exact, the others had heuristic aspects or required (light) manual preprocessing by the test assembler. It was shown how these methods can be formalized using the technique of 0–1 linear programming (LP). Application of this technique allows for computerized assembly of the tests. Each of these methods and their 0–1 LP model represented another choice with respect to the well-known speed-accuracy dilemma in automated test assembly.
The two sections of the Law School Admission Test (LSAT) that have an item-set structure were used to study the performances of the methods. For each section and method, the 0–1 LP model needed to guide the assembly process was established. For two methods, test specialists from Law School Admission Council (LSAC) did the item pool preprocessing required for applying the method. One method could not be used because it led to a 0–1 LP model with too many variables. For both sections of the LSAT, all solutions were obtained within 4–5 minutes; the majority of the cases required less than 1–2 seconds of CPU time. Also, in nearly all cases, all content specifications were realized. The only exception was a case in which one specification appeared to be slightly too tight. Relaxation of this specification immediately produced a solution. However, the quality of the results in terms of the target information functions for the LSAT varied considerably. Nearly perfect results were obtained for the methods that were based on simultaneous assembly of items and stimuli and for methods that used so-called pivot items to represent the stimuli in the test assembly process. Methods based on two-stage selection of stimuli and items generally performed less well.