A Formulation of the Mantel-Haenszel Differential Item Functioning Parameter With Practical Implications (SR-96-03)
by Louis A. Roussos, Deborah L. Schnipke, and Peter J. Pashley, Law School Admission Council

Executive Summary

When examinees from two different subgroups have the same ability distribution (or are “matched” on ability) but are not equally likely to answer a particular item correctly, the item is said to exhibit DIF (differential item functioning; that is, the item functions differently in the two groups). When test data are analyzed, a statistical measure of DIF is calculated for each item so that items with large values of DIF (i.e., items with a large difference in the probability of equal ability examinees in the two groups answering correctly) can be investigated to determine if the item should be removed from the test and/or item pool (the group of items from which new tests are assembled). The Mantel-Haenszel (MH) procedure, which is used at the Law School Admission Council (LSAC), has become the most widely used procedure for measuring DIF and is recognized as the testing industry standard. The behavior of the MH DIF parameter is well understood for items on which no guessing occurs, but not for items where guessing does occur; often the case with multiple-choice items.

This research report presents a general formulation of the MH DIF parameter that is equally appropriate for items on which guessing occurs and for items on which no guessing occurs. The value for this parameter is calculated for numerous, realistic conditions to explore its behavior in situations where DIF might occur with real data. Practitioners have assumed that the MH DIF parameter behaves similarly regardless of guessing behavior, but our results indicate that guessing can affect the parameter’s value for relatively difficult items. As a result, the MH DIF statistic should be used with caution until the apparent deficiencies of this procedure are better understood or corrected.

Before items are tested empirically for DIF at LSAC, and even before they are pretested (administered to examinees for the first time), they are subjected to rigorous sensitivity reviews. Additionally, real data do not mimic simulated data exactly. Thus, the implications of this study on the routine operational task of identifying DIF at LSAC are still unknown, and may in fact be minimal. However, because some items on the Law School Admission Test (LSAT) are known to exhibit guessing behavior, the results certainly suggest that additional research is warranted.

