Exploring New Methods to Detect Person Misfit in CAT (CT-99-13)
by Rob R. Meijer and Edith M. L. A. van Krimpen-Stoop, University of Twente, Enschede, The Netherlands
A test score may not reveal the impact of undesirable factors operating in the behavior of the examinee such as misunderstanding of the test instructions, knowledge of correct answers due to item preview, or time pressure during the test. Nevertheless, such factors do sometimes exist and may result, for example, in classification errors in testing for employment or educational admission.
Most effects of undesirable factors operating in a test reveal themselves by a lack of fit in the item response model used to calibrate the items in the test to the responses by the examinee. Several statistics for detecting nonfitting score patterns (person misfit) in paper-and-pencil tests have been proposed. Their use in computerized adaptive tests (CAT) has hardly been explored, except for a few studies that showed they did not apply well to CAT data.
In this study, new person-fit statistics were proposed and critical values for their application in statistical tests to detect person misfit in CAT were derived from statistical theory. All statistics proposed were designed to be sensitive to unexpectedly large runs of correct or incorrect responses in an examinee response vector. Some of them were based on all responses in the response vector (post hoc person-fit analysis); others were based on responses to items in subsets collected during the test (online person-fit analysis). Also, all statistics were based on a comparison between observed and expected values collected in a cumulative sum (CUSUM) procedure. The theoretical and empirical distributions of the statistics were compared, and their detection rates of true person misfit were investigated by computer simulation. The results showed that the empirical rates of false positives in the statistical tests based on these new CUSUM statistics agreed well with the predictions derived from statistical theory, provided the number of items in the subsets of items in online person-fit analysis is not too small. Also, the tests were quite powerful; their rates of detecting true person misfit were superior to those for all person-fit statistics studied earlier for application in CAT by the authors.