An Evaluation of a Two-Stage Testlet Design For Computerized Testing (CT-96-04)
by Lynda M. Reese and Deborah L. Schnipke
In a standard computer-adaptive testing (CAT) design, test takers are first administered a test question of approximately medium difficulty. Based on their response, an attempt is made to choose subsequent items for administration to the test takers that are appropriate to their ability level. Testing proceeds until some termination criterion, such as a fixed test length or a sufficiently precise ability estimate, is achieved. In this pure form, CAT holds many theoretical advantages. Because the test taker's time is not wasted on test items that are too difficult or too easy, test length may be reduced, usually by about one half, without loss of precision.
As large-scale, high-stakes testing programs, such as the Law School Admission Test (LSAT), consider converting to a computer-adaptive mode of test administration, a standard computer-adaptive test, as described above, is rarely practical. For example, most large-scale testing programs contemplating CAT must face the challenge of maintaining content balancing requirements which usually compromise the efficiency and precision that make CAT attractive. Some researchers have advocated the use of testlets (or collections of items) as an alternative to individually selected and delivered items. These testlets may be preassembled to achieve certain content coverage requirements; employing these requirements may help to control context effects.
Prior to the advances in computer technology that made CAT feasible, the concept of two-stage testing emerged as a rudimentary means of tailoring the difficulty level of the test to the ability level of the test taker. In the first stage of this procedure, all test takers take a "routing test" of medium difficulty. Based on their scores on the routing test, test takers are branched to a second stage "measurement test" that is roughly adapted to their ability level. The test taker's ability is then estimated based on the items administered at both testing stages.
This simulation study evaluated the efficiency of a two-stage testlet design as compared with that achieved by a standard computer-adaptive test administration and a paper-and-pencil test. The precision of the ability estimates derived from a 25-item two-stage testlet design was compared to those estimates derived from a standard CAT of 25-items and both a 25-item and a 50-item paper-and-pencil test. The results indicate that if the testlets are carefully assembled, the 25-item two-stage testlet design results in greater precision than a 50-item paper-and-pencil test. This study provides a baseline against which future research that incorporates content constraints can be compared.