A Comparison of Mantel-Haenszel Differential Item Functioning Parameters (RR-98-03)
by Deborah L. Schnipke, Louis A. Roussos, and Peter J. Pashley
Items on large-scale standardized tests, such as the Law School Admission Test (LSAT), undergo an extensive sensitivity review before they are ever presented to test takers. Despite precautions, some items may still function differently among subgroups, so statistical analyses of differential item functioning (DIP) are performed after test takers respond to the items. Many DIP procedures have been developed, but the Mantel-Haenszel (MH) is the primary DIP procedure used at the Law School Admission Council (LSAC) and other major testing companies.
The MH procedure was first proposed for situations in which items cannot be answered correctly by guessing. Under this constraint, the MH statistic has a direct relationship to item difficulty, as specified by item response theory, so the statistic's behavior and interpretation are well understood. When items can be answered correctly by guessing (e.g., many multiple-choice items), the relationship between the MH DIP statistic and IRT difficulty is more complicated, so the behavior and interpretation of the statistic are not well understood. Several theorists have proposed MH DIP parameters in the attempt to explain the statistic's behavior under these more complicated circumstances. The purpose of the present study is to compare the proposed MH DIP parameters in order to determine which parameter most accurately captures the MH DIP statistic's behavior.
Three MH DIP parameters were compared with values of the MH DIP statistic in simulated and real data. Not surprisingly, of the three parameters investigated, the one that is most theoretically similar to the MH DIP statistic itself was found to best explain the statistic's behavior under a variety of conditions.