Assessing the Effect of Multidimensionality on LSAT Equating for Subgroups of Test Takers (SR-95-01)
by Andre F. De Champlain
Testing organizations typically disclose test forms after they have been administered to large test-taker populations. Therefore, several test forms must be developed annually to be as similar as possible to one another in terms of statistical and content attributes. Although a great deal of effort is placed on assembling comparable tests, forms will tend to vary somewhat in terms of their statistical characteristics. Hence, scores must be transformed in order to enable direct comparisons across forms. The process by which scores are adjusted so as to make them comparable to each other is referred to as equating. The Law School Admission Council (LSAC) employs item response theory (IRT) true-score equating to equate the LSAT.
One of the main assumptions underlying IRT true-score equating methods is unidimensionality of the construct underlying the items of the forms to be equated. This assumption of the model must be met in order to benefit from the many advantages of IRT-based procedures, namely population invariance. Simply stated, IRT equating functions should theoretically be independent of the groups from which they were derived, assuming the postulates of the model hold.
Several studies had previously attempted to assess how multidimensionality might affect the quality of IRT true-score equating results. Most studies examining this issue concluded that (unidimensional) IRT true-score equating procedures were quite robust to departures from the assumption of unidimensionality. The effects of multidimensionality on the quality of IRT true-score equating results were found to be negligible (Bogan & Yen, 1983; Camilli, Wang, & Fesq, 1995; Cook & Douglass, 1982; Cook, Dorans, Eignor, & Petersen, 1985; Dorans & Kingston, 1985; Kolen & Whitney, 1982; Modu, 1982; Snieckus & Camilli, 1993; Stocking & Eignor, 1986; Wang, 1985; Yen, 1984). However, these studies generally focused on dimensionality at the content level only, that is, they did not specifically examine the interaction of both content and test-taker population as suggested by Lord and Novick (1968) and Bejar (1983). Other studies, which centered on assessing how the interaction of both multidimensional test content and heterogeneous populations might impact on IRT true-score equating, did show that conversions could differ substantially across diverse groups of test takers, most notably when the content was also heterogeneous (Angoff & Cowell, 1985; Cook, Eignor, & Taft, 1988; Eignor & Cook, 1991). Although informative, none of these studies systematically investigated how differences in the latent trait composite of subgroups might affect IRT true-score equating results. The purpose of this study was therefore to assess the dimensionality of one form of the Law School Admission Test (LSAT) with respect to three ethnic groups of test takers and to investigate whether differences in their latent trait composite have any noticeable impact on IRT true-score equating results for these subgroups. More precisely, the equating functions estimated for African American and Hispanic test takers were compared to those derived for the majority Caucasian group as well as the total test-taker population to see if there existed any noteworthy differences.
Results obtained with respect to the dimensionality of the LSAT with the three ethnic groups showed that a two-dimensional model, specifying Analytical Reasoning and Logical Reasoning + Reading Comprehension as two factors, adequately accounted for the item responses of both Caucasian and African American test takers whereas a more complex model was required for the Hispanic subgroup.
Equating results indicated that the differences between the conversion lines obtained for the three ethnic groups and the total test-taker population were negligible. The largest residuals obtained when comparing the minority-group conversion lines to either the Caucasian or total population equating functions were well within one conditional standard error of measurement for score differences which again would indicate that the variations are of no practical significance.
Also, the effect of matching Caucasian test takers on the basis of the African American raw score frequency distribution did tend to increase the disparities between the equating functions at the extremes, hence contributing to a slightly larger mean absolute residual value. However, the discrepancies between the two conversion lines in the middle of the scale were smaller. These findings support those of Cook, Eignor, and Schmitt (1990) as well as Kolen (1990) who stated that matching generally did not contribute to a more accurate equating.
Regardless, the results obtained in this study suggest that African American and Hispanic conversion lines are statistically equivalent to the equating function of the majority Caucasian group as well as to the one derived from the total test-taker population. In other words, the current practice of applying a conversion function obtained from the total population to all test takers, irrespective of ethnicity, does not penalize minority test takers.