Impact of Local Item Dependence on Item Response Theory Scoring in CAT (CT-98-08)
by Lynda M. Reese, Law School Admission Council

Executive Summary

In computerized adaptive testing (CAT) an attempt is made to select items for individual test takers that are appropriate for their ability level. This adaptation of the difficulty level of the test to the ability level of the test taker is made possible through the application of item response theory (IRT). IRT is a mathematical model that relates the probability that a test taker will answer a single test item (i.e., test question) correctly to the ability level of the test taker and specific characteristics of the test item. In applying IRT, a formal assumption of local item independence is made. This assumption states that once the ability level of the test taker is accounted for, the responses of test takers to individual items on the test should be statistically independent.

In a test-taking situation, many circumstances arise that cause the local item independence assumption to be violated to some degree. For instance, if a test section is especially difficult, fatigue may adversely affect the performance of test takers on the items at the end of the section. In this case, the difficulty level of the items found at the beginning of the section affect performance on later items, and so these items are said to exhibit some degree of local item dependence (LID).

The impact of LID on various applications of IRT within the paper-and-pencil mode of testing has been evaluated. Depending on the particular test design, a computerized test may rely more heavily upon IRT for such procedures as item selection and ability estimation, and so the assumptions of the model become even more important. This study represents a first evaluation of the impact of LID for IRT scoring in CAT. As such, the most basic CAT design and a simplified design for simulating CAT item pools with various degrees of LID were applied. The results indicate that, for certain types of scoring, an extreme amount of LID may adversely impact the final score attained by the examinee (i.e., test taker). The estimated precision of the test was also affected by the extreme LID level studied here. For the medium level of LID, structured to display the amount of LID typically displayed by the LSAT, the effects of the LID were not troublesome.

Future research in this area should focus on some of the computerized testing designs that are currently being evaluated for the LSAT. Also, future research should be carried out to evaluate LID levels that represent situations likely to arise in building an item pool for computerized testing. For example, the effect of 100 items displaying an extreme level of LID within a medium LID CAT pool should be evaluated.

Impact of Local Item Dependence on Item Response Theory Scoring in CAT (CT-98-08)


Research Report Index