CATSIB: A Modified SIBTEST Procedure to Detect Differential Item Functioning in Computerized Adaptive Tests (CT-97-11)
by Ratna Nandakumar, University of Delaware and Louis Roussos, Law School Admission Council
The Law School Admission Council (LSAC) is in the midst of a research program to determine the advisability and feasibility of developing a computerized version of the Law School Admission Test (LSAT). In any testing situation, it is essential that items are fair to all subgroups of test takers. If one subgroup is performing better than another on an item—although both subgroups have been matched on ability—then such an item will be cause for concern. This phenomenon is known as differential item functioning or simply DIE. Even though reliable statistical procedures have been developed for detecting DIF items on paper-and-pencil tests, computerized adaptive tests (CATs) pose obstacles that require the development of new procedures.
DIF analyses require comparing the performance of test takers from different subgroups by carefully selecting test takers who have been matched on some measure of ability level. With a paper-and-pencil test, the matching criterion is typically the number-right score on all the items, including the item being studied for DIE The inclusion of the score on the studied item helps control for statistical error due to impact (group-average ability differences). However, with a CAT different test takers take different items according to their measured ability levels. Thus, they all get similar number-right scores, and number-right score cannot be used as a criterion for ability level matching. Hence, a new matching criterion must be developed. Also, a new method to control for statistical error due to impact must be developed.
The current paper proposes a new DIF procedure for CATs which overcomes these obstacles. The procedure is called CATSIB as it is a modification of the SIB TEST DIF procedure that is used with paper-and-pencil tests. CATSIB matches test takers on estimated ability level, an estimate that a CAT produces for each test taker. To control for statistical error due to impact, a correction is applied to these ability estimates-a correction that is based on a similar correction used with SIBTEST.
To evaluate the performance of the new procedure, a simulation study was conducted. The simulated testing situation consisted of test takers receiving 25 adaptively administered operational items (from a pool of 1,000) and 16 linearly administered pretest items that were evaluated for DIE The simulated pretest items were statistically designed to display varying known amounts of DIF, and CATSIB was applied to these data to see how well it could detect and estimate these known amounts of DIE Also, various levels of impact were simulated to see how well CATSIB could control for impact-induced statistical error. The simulation results showed that CATSIB was very effective in controlling statistical error due to impact, even for large average group ability level differences. CATSIB also performed well in detecting and estimating the simulated amounts of DIF in the items, exhibiting detection rates of over 90% for sample sizes of 500 in each subgroup and over 60% for sample sizes of 250 in each subgroup. Future research is planned to further improve CATSIB performance.