The Use of Statistical Process Control-Charts for Person-Fit Analysis in Computerized Adaptive Testing (CT-98-12)
Rob R. Meijer and Edith M. L. A. van Krimpen-Stoop, University of Twente
A person’s item response pattern may reveal such undesirable behavior as faking responses on personality tests, guessing, or knowledge of the correct answers due to test preview. These undesirable behaviors may result in inappropriate test scores and may have serious consequences for practical test use, for example, they could result in decision errors in testing for job and educational selection.
In analyzing Law School Admission Test (LSAT) data operationally, item response theory (IRT) is applied. IRT is a mathematical model whereby the probability that a test taker will answer an item correctly is related to the ability level of the test taker and the characteristics of the item. To detect item response patterns that do not fit regular test taking behavior under an IRT model several person-fit statistics have been proposed. Nearly all statistics are based on the difference between the observed item scores and those expected under the IRT model. When the distribution of a statistic under the hypothesis of regular response behavior is known, item response patterns can be classified as fitting or nonfitting the model. To date, almost all fit statistics have been proposed for conventionally administered paper-and-pencil (P&P) tests. With the advent of computerized adaptive testing (CAT), research is needed with respect to the application of person-fit statistics in CAT. In earlier research, the empirical distributions under CAT of a frequently used fit statistic, lZ and an adaptation lZ* were studied. These authors found that for simulated P&P data the empirical distribution of lZ* was more in agreement with the standard normal distribution than the distribution of lZ. For CAT data, however, there was a large discrepancy between the empirical and standard normal distribution for both statistics. Consequently, inadvertent application of these person-fit tests to CAT data may result in inaccurate decisions.
In this paper, several new fit statistics especially designed for CAT were studied. These statistics were based on the theory of statistical process control (SPC). In industrial applications, a (production) process is in a state of statistical control if the variable being measured has a stable distribution. One technique from SPC is based on Shewhart control charts. These charts are used to determine if a process is in statistical control by examining past data. An example of a Shewhart control chart is the X-chart, where the observed averages of the variable being measured in a sample of size (n) are calculated over time. X-charts are very effective in detecting large shifts in the quality of a production process. Another technique from statistical process control is the cumulative sum procedure. This paper shows that this procedure could be successfully applied to detect test takers in a CAT whose response behavior is not in accordance with the IRT model. For example, suppose an examine becomes more and more careless during the test because of fatigue. As a result, in the first part of the CAT responses will tend to alternate between correct and incorrect reponses, whereas in the second part of the test more and more items are answered incorrectly due to carelessness. The statistics studied in this paper were designed to be sensitive to such changes in response behavior. The power of these statistics with respect to a large variety of nonfitting response behaviors was determined. Recommendations on the appropriate choice from these statistics for CAT programs were made.