Modifying Existing Dimensionality Assessment Tools for Use in a CAT Environment (CT-99-10)
by Amy Goodwin Froelich, William Stout, and Terry Ackerman, University of Illinois at Urbana-Champaign
A vital requirement in the design of a modern standardized test such as the Law School Admission Test (LSAT) is that the test be valid in the sense of measuring well those constructs (also called "dimensions") central to its purpose while not measuring constructs irrelevant to that purpose. Thus, an important part of the design of the LSAT and, in particular, of its contemplated conversion to a computerized adaptive test (CAT) format is the availability of effective statistical tools to assess latent dimensional structure. Results from such dimensionality monitoring can then feed back into the articulation of future LSAT-CAT test specifications and into the manufacture of future items and testlets for the LSAT-CAT.
The primary purpose of this research was to convert two existing nonparametric dimensionality assessment tools, DIMTEST (Stout, 1987; Nandakumar & Stout, 1993) and HCA/CCPROX (Roussos, Stout, & Marden, 1998), for use in an LSAT-CAT environment and to evaluate their performance. The successful performance of these tools has already been demonstrated in previous applications to paper-and-pencil (P&P) test data, including applications to LSAT P&P test data (Stout, Habing, Douglas, Kim, Roussos & Zhang, 1996; Douglas, Kim, Roussos, Stout, & Zhang, 1999).
In this study, CAT-DIMTEST was evaluated by testing the hypothesis of whether each pretest testlet of a simulated sectional administration of a proposed LSAT-CAT structure is dimensionally similar to or distinct from the operational items. CAT-HCA/CCPROX was also evaluated, by simulating one and two dimensional testlets where the second dimension was distinct from the operational dimension. After CAT-HCA/CCPROX broke such testlets into two clusters, these clusters were evaluated for their dimensional correctness.
CAT-HCA/CCPROX has two natural interactive roles relative to CAT-DIMTEST for use with future LSAT-CAT dimensionality analyses. First, it can provide a preliminary culling of items that measure the same dimension as the operational items from a candidate testlet prior to a CAT-DIMTEST analysis, thereby increasing the statistical power of CAT-DIMTEST. Second, as a follow up to CAT-DIMTEST flagging a testlet, CAT-HCA/CCPROX can help search for which items of the testlet are in fact measuring dimensions distinct from the dimensional of the operational items.
Thorough simulation studies of both CAT-converted procedures were conducted using data simulated from a logical reasoning-based LSAT-CAT multiple forms structure as developed in Armstrong, Jones, Koppel & Pashley (in press). The results were exceptionally encouraging. The hypothesis testing observed that Type I error rate (i.e., rate of falsely flagging a testlet as dimensionally distinct from the operational items) for CAT-DIMTEST was very close to the nominal 5% error rate. The power rate for DIMTEST for correctly rejecting testlets that were generated to be dimensionally distinct from the operational items was usually close to or at 100% when either four of five or all five items of the testlet measured a distinct dimension. Even when only three of the five testlet items measured a distinct dimension, the power rate averaged 68%. This rate increased to 95% when CAT-DIMTEST analyzed just those three items (such testlet subset selection is possible using CAT-HCA/CCPROX). When only two of the five items in a testlet were dimensionally distinct, DIMTEST exhibited inadequate power; but when analyzing just those two items alone, the power rose to approximately 68%.
CAT-HCA/CCPROX performance was evaluated by examining its two-cluster solution for testlets that had three of their five items being dimensionally distinct, this being the most important case to achieve effective CAT-HCA/CCPROX performance. In this case, CAT-HCA/CCPROX found either the correct dimensionally distinct cluster, or the correct cluster but containing one extra item, over 95% of the time.
These successful results suggest important areas for further dimensionality procedure development. First, it is important to design a specific method for combining CAT-DIMTEST and CAT-HCA/CCPROX to work together in accurately identifying within-testlet subsets of items that are dimensionally distinct from the operational items. Second, it is important to be able to use CAT-DIMTEST and CAT-HCA/CCPROX in both confirmatory and exploratory modes to analyze arbitrarily bundled subsets of pretest items, not just subsets lying within a single pretest testlet. Third, it is important to include operational items in such analyses. These last two challenges will require fundamental changes in the two procedures so they can deal with different pretest and operational testlets being taken by different test takers in the LSAT-CAT design.