A Comparison Testlet-Based Test Designs for Computerized Adaptive Testing (CT-97-01)
by Deborah L. Schnipke and Lynda M. Reese
Because of the many benefits of computer-administered testing (e.g., the potential for new item types, more frequent testing, immediate scoring), the Law School Admission Council (LSAC) is considering computerizing the Law School Admission Test (LSAT). Several concerns have been raised about the standard computerized adaptive test (CAT) design for the LSAT, however. For example, it would be difficult to explain the standard item selection and scoring algorithms to test takers and test-score users because of their complexity. While the LSAC is interested in a computerized LSAT that adapts item difficulty to test taker ability, we are also interested in investigating less complicated (and easier to explain) ways of doing so.
Prior to the advances in computer technology that made CAT feasible, the concept of two-stage testing emerged as a reduimentary means of tailoring the difficulty level of the test to the ability level of the test taker. In the first stage of this procedure, all test takers take a "routing test" of average difficulty. Based on their scores on the routing test, test takers are branched to a second-stage "measurement test" that is roughly adapted to their ability level. The test taker's ability is then estimated based on the items administered at both testing stages. This design can be expanded to more levels with difficulty being more closely targeted to test taker ability at higher stages.
Additional future concerns include making provisions for items that refer to a common stimulus (set-bound items, such as reading comprehension) and whether to allow item review. The use of testlets (bundles of items that are administered as a unit) may provide a solution to these concerns. A common stimulus (e.g., a reading passage) and its associated items can be designated as a testlet; thus, the items will automatically be administered together. By not adapting within a testlet, item review within a testlet can be allowed without leading to undesirable test-taking strategies affecting the precision of the test. Results from our field tests indicate that test takers are comfortable with item review within a testlet.
Two-stage or multistage tests can be built from testlets, and such designs may provide a solution to the concerns raised about the standard CAT design. The present study compares the precision of ability estimates of various test designs. The test designs are a two-stage testlet design, a two-stage testlet design that reroutes test takers within the second stage as needed, a multistage testlet design (which had four stages in the present study), the standard item-level CAT design (which is the psychometric ideal in terms of precision and efficiency), a CAT design that adapts at the testlet level rather than at the item level, and a paper-and-pencil (i.e., nonadaptive) design of two lengths (the same length as the other designs and twice as long). The paper-and-pencil design of the same length as the other designs serves as the minimally acceptable criterion for the new designs.
Results indicate that all testlet-based designs lead to improved precision over the same-length paper-and-pencil test and almost as much precision as the paper-and-pencil test of double length. The two-stage and multistage designs were very similar to each other across the entire ability scale. In terms of psychometric characteristics, the two-stage and multistage designs performed at an acceptable level. Given the many other (nonpsychometric) advantages of these designs, they may be viable options for a computerized LSAT, and future research will continue to investigate these designs.