Fixed-Weight Methods of Scoring Computer-Based Adaptive Tests (CT-97-12)
by Bert F. Green, Johns Hopkins University
Computer-based adaptive tests (CATs) are beginning to replace traditional paper-and-pencil tests. Several major standardized tests are now given as CATs, and the Law School Admission Council (LSAC) is investigating the feasibility and potential benefits of computerizing the Law School Admission Test (LSAT). Not only are CATs more efficient than traditional tests, but they are more popular among test takers. CAT gains its efficiency by tailoring the difficulty of the items given to a test taker, based on that test taker's response to earlier items on the test. Scoring a CAT is complex; the number of items answered correctly is not an appropriate test score because the difficulty of the items must be taken into account. Indeed, most test takers answer about the same number of items correctly. Currently available CAT scoring algorithms use either maximum-likelihood or Bayesian procedures, which are statistically elegant, but extremely difficult to explain to a nonstatistician. The present research project examines a simpler way of scoring a CAT.
Current methods of scoring CATs, including maximum-likelihood estimation and equated number-right, estimate the score (or proficiency) of the test taker as a weighted combination of item scores. The weights for the items for these methods depend not only on item characteristics but on the proficiency of the test taker. A correctly answered item can weigh heavily in one test taker's score, and the same item, also correctly answered, can have very little weight in another test taker's score, depending on the other item responses given by those test takers. Such differential treatment is statistically optimal, but it is not easily explained to test takers. The methods proposed here give a fixed-weight to each item, based only on item difficulty, and will be called fixed-weight scores.
Computer simulations of CATs were done in order to compare the fixed-weight scores with the more sophisticated scores (e.g., maximum-likelihood and equated number-right). Two different item pools were used: a "flat" pool with item difficulties uniformly distributed across the scale, and a "special" pool with item parameters more closely matching those found in LSAT item pools. For each of 26 different levels of proficiency, 2,500 simulated test takers responded to a 30-item CAT. Each CAT was scored by each method, and the results were compared.
All scoring methods, including the new fixed-weight scoring methods, provided statistically unbiased estimates of test taker proficiency. The precision, as assessed by the root mean squared error, was somewhat poorer for the fixed-weight scores than for the statistically efficient scores provided by maximum-likelihood and equated number-right scoring. Errors were about 20% larger for the fixed-weight scores. A fixed (nonadaptive) test of the same length, scored by maximum-likelihood, has a measurement error of about 60% higher than a CAT score by maximum-likelihood. That is, most of the advantage of adaptive testing is retained if the simpler fixed-weight scoring system is used. Since the fixed-weight scores are very highly correlated with the maximum-likelihood and equated number-right scores, one can consider maximum-likelihood and equated number-right scores to be statistically refined versions of the proposed fixed-weight methods.