Theoretical Formula for Statistical Bias in CATSIB DIF Estimator Due to Discretization of the Ability Scale (CT-99-07)
by Louis A. Roussos, Ratna Nandakumar, and Julie Cwikla Banks, University of Delaware
As part of the Law School Admission Council (LSAC) research program to determine the advisability and feasibility of developing a computerized version of the Law School Admission Test (LSAT), a statistical procedure for detecting differential item functioning (DIF) in a computerized testing situation has been developed. The procedure, called CATSIB, is further investigated in the current paper.
In any testing situation, it is essential that items are fair to all subgroups of test takers. If one subgroup is performing better than another on an item, after both subgroups have been matched on ability, then such an item is said to display DIF. The item being investigated is termed the studied item. DIF analyses require comparing the studied item performance of test takers from different subgroups by carefully selecting test takers who have been matched on some measure of ability level. CATSIB (a modification of the SIBTEST DIF procedure) matches examinees on the estimated ability that is produced by a computerized test. Specifically, CATSIB divides the total estimated ability range into a discrete number of nonoverlapping intervals, and places test takers into the interval whose ability range contains their estimated ability. Then, within each interval, CATSIB compares the studied item performance of each subgroup of interest in the test-taking population, taking an average over all the intervals. The estimated difference in performance on the studied item between two specified subgroups comprises the DIF estimate.
To ensure accurate DIF estimation, the number of intervals to use with CATSIB has to be carefully considered. The use of too many intervals could result in too few test takers per interval, causing unstable statistical estimation. On the other hand, too few intervals could result in overinflated DIF estimation when subgroups differ in their average scores on the test as a whole. The original CATSIB procedure limited the minimum number of intervals to be 20, assuming that fewer than 20 would result in unacceptable bias in the DIF estimation.
Performance of CATSIB was evaluated in a previous study by Nandakumar and Roussos through a simulation study. The simulation results showed that CATSIB was generally very effective in accurately estimating the simulated amounts of DIF in studied items. However, there was one instance where the degree of DIF was severely underestimated for an item that was very difficult.
The purpose of the present study is to begin to investigate this conjecture of Nandakumar and Roussos. Rather than merely repeating all or a part of their study with a smaller minimum required number of cells, the present study develops, for the first time, rigorous theoretical equations for predicting the amount of bias in a DIF estimator that is caused by the choice of the number of intervals. Furthermore, we verify the accuracy of these equations by conducting a modified version of the Nandakumar and Roussos simulation study.
The research presented here will be extended in several directions. First, to justify the use of as few as 7 intervals in performing CATSIB DIF analyses, it is not enough to merely show that the bias in the DIF estimator is small. The use of a small number of intervals could conceivably cause a small amount of bias in the DIF estimator but still cause a large amount of bias in the standard error estimator for the DIF estimator. Thus, the current research will be extended by deriving theoretical formulas for the expected amount of bias due to the number of intervals in the standard error estimator. A simulation study similar to that presented in the current paper will be conducted to verify the accuracy of the theoretical formulas.