Detecting Items That Have Been Memorized (CT-99-05)
Lori D. McLeod and Deborah L. Schnipke
Because scores on high-stakes tests influence many decisions, tests need to be secure. Decisions based on scores affected by preknowledge of items are unacceptable. New methods are needed to detect the new cheating strategies used for computerized adaptive tests, because item pools are typically used over time, providing the potential opportunity for test takers to share items with future test takers. Because of the serious ramifications of accusing someone as being a user of item preknowledge, it may be more useful for operational computerized adaptive test developers to focus on item security rather than the behavior of individual test takers. As the Law School Admission Council (LSAC) investigates implementing a computerized version of the Law School Admission Test (LSAT), the risk to test security and tools for protecting item pools should be explored.
This research explores the development and use of a fit index to detect items that have been memorized so these items may be removed from the item pool while leaving secure items in the pool. The design was based on a basic CAT and applied a real-world approach to simulate item preknowledge through a two-stage process. First, the design sent in 2 or 6 average or high proficiency sources who memorized test items from a 25-item test. (The item pool consisted of 250 items.) These sources combined their item lists and provided them to some of the test takers (beneficiaries) in the second stage. (Some overlap was observed among the item lists.) The beneficiaries memorized the items provided by the sources then took a 25-item test from the same item pool. If the beneficiaries were administered any of the memorized items, they answered them correctly.
Simulated beneficiaries were generated at 13 proficiency values, ranging from low to high proficiency. By varying the number of sources and the sources' proficiency levels, we indirectly manipulated the number and difficulty of items memorized. The impact of item preknowledge on the overall testing program was also manipulated by varying the percent of beneficiaries in the test taking population (10 or 25 percent).
The odds ratio index was developed to detect the items that were compromised and it was evaluated by applying it to the simulated data. In general, the index showed more power when the sources had higher proficiency, when the percentage of beneficiaries was higher, and when there were more sources. The results from this initial demonstration are promising for detecting items gathered by high proficiency sources. Future refinements are planned. It is hoped that this work will enable testing programs to more effectively determine how long to leave an item pool (or specific items) in the field.