The Effects of Dimensionality on True Score Conversion Tables for the Law School Admission Test (SR9201)
by Gregory Camilli, Mingmei Wang, and Jacqueline Fesq
Executive Summary
In this study, we examined the Law School Admission Test (LSAT) to see if the items on a form could be divided into different subgroups where items looked statistically similar within the subgroups, but statistically different between subgroups. When subgrouping can be detected, it is likely that the subgroups of items measure different abilities and therefore the test can be described as measuring multiple abilities or as “multidimensional” for short. In contrast, the term “unidimensional” is used to describe a test for which no subgroups exist—all the items look relatively similar from a statistical point of view. Such a test measures a single ability.
The LSAT is equated so that a test score obtained in the current year is comparable to scores obtained in previous years. Technically, a test model based on item response theory (IRT) is used to equate each new form of the LSAT to the base scale. This IRT model makes the basic assumption that the LSAT is unidimensional, as defined above. It is possible that a violation of this assumption could lead to unsatisfactory equating results; however, it must be recognized that 1) most tests are multidimensional to some degree, and 2) all practical test models for equating are unidimensional. Therefore, the two important issues with real tests concern the degree of multidimensionality of the test, and whether this has a practically significant effect on test equating.
To explore these issues, we conducted an analysis of multidimensionality for six forms of the LSAT using factor analysis. This statistical technique is commonly used to determine whether statistical subgroups of items exist, and which items correspond to which subgroups. We found two subgroups of items or “factors” for each of the six forms. The following pattern of results was remarkably consistent—the AR items corresponded to one factor, while the RC and LR items corresponded to the other. The main conclusion of the factor analysis component of this study was that the LSAT appears to measure two different reasoning abilities: inductive and deductive. Both RC and LR items appear to measure inductive reasoning, and AR items deductive reasoning. The item groupings identified are thus consistent with the content specifications of the LSAT. It is important to add that the analysis showed that these two reasoning abilities are highly correlated.
The technique of Dorans and Kingston (1985) was used in this study to examine the effect of dimensionality on equating. In brief, we began by calibrating (with IRT methods) all items on a form to obtain a set (say Set 1) of estimated item parameters (as, bs, and cs). Next, the test was divided into two homogeneous subgroups of items, each having been determined to represent a different ability (i.e., inductive and deductive reasoning). The items within these subgroups were then recalibrated separately to obtain item parameter estimates. These latter estimates were then combined into Set II. (All estimates were placed on the same scale.)
If the LSAT were strictly unidimensional, then the estimated item parameters in Set I would be very close to the corresponding estimates in Set II (only small differences would be obtained due largely to sampling errors). In other words, the same item statistics (as, bs, and cs) would be obtained whether AR items were included with RC+LR items or not. Consequently, if the item statistics were the same, the equating tables based on parameter Sets I and II would be practically identical. On the other hand, if nonignorable multidimensionality exists, then the result of a single calibration of all LSAT items would differ noticeably from that of separate calibrations for the two subgroups of items. This could lead to different true score equating tables, depending on whether Set I or Set II item statistics were used.
In this study, we found that the equating tables based on Set I item statistics were highly similar to those based on Set II item statistics. We concluded, as did Dorans and Kingston (1985), that violations of unidimensionality may not have a substantial impact on equating. Although the IRT model theoretically requires unidimensional tests, it appears to give satisfactory results with the LSAT.
The Effects of Dimensionality on True Score Conversion Tables for the Law School Admission Test (SR9201)
Research Report Index
