Using Patterns of Summed Scores in Paper and Pencil Tests and CAT to Detect Misfitting Item Score Patterns (CT-02-04)
by Rob R. Meijer, University of Twente, Enschede, The Netherlands

Executive Summary

In computerized adaptive testing (CAT), a mathematical model called item response theory (IRT) makes it possible to select items for administration to individual test takers that are matched to their ability level. IRT also allows us to determine the probability that a test taker of a particular ability level will answer individual test items correctly. In general, a test taker will have a 50% chance of correctly answering items that are matched to his or her ability level, with easier items being answered correctly with a higher probability and more difficult items being answered at a lower probability.

Because of the ability to determine the probability that an individual test taker will correctly answer a particular test item, IRT may be applied to evaluate the item-score patterns of test takers to determine if they are responding in the manner that would be expected, given their ability level. Such investigations are commonly called person-fit analyses. Aberrant item score patterns may indicate that the test taker has attempted to copy answers from another test taker or may indicate a problem with the test administration, such as a faulty answer key. Most person-fit analyses that may currently be found in the literature are based on item-score patterns.

In this paper, person-fit statistics based on the likelihood of the number-correct scores on subsets of items in the test are studied. Person-fit statistics for application in paper-and-pencil (P&P) tests and CATs were studied. Application of these statistics in CAT is possible if the item-selection algorithm selects testlets (small bundles of items) rather than individual items from the pool of items. The most significant result was that it is important to take the ability level of the test taker into account when number-correct scores on subtests are compared.

Using Patterns of Summed Scores in Paper and Pencil Tests and CAT to Detect Misfitting Item Score Patterns (CT-02-04)

Research Report Index