Small Sample Estimation in Dichotomous Item Response Models: Effect of Priors Based on Judgmental Information on the Accuracy of Item Parameter Estimates (CT-98-06)
by Hariharan Swaminathan, Ronald K. Hambleton, Stephen G. Sireci, Dehui Xing, Saba M. Rizavi, University of Massachusetts Amherst
It is well established that the efficiency of testing can be considerably increased if test takers are administered items that match their ability or proficiency levels. In this adaptive testing scheme, items are administered to test takers sequentially, one at a time or in sets. The item or set of items administered is usually chosen in such a way that it provides maximum information at the proficiency level of the test taker. The feasibility and advisability of computerized adaptive testing is currently being studied by the Law School Admission Council (LSAC).
For adaptive testing to be successful, it is important that a large pool of items be available with items whose item characteristics are known. The recent experiences of testing programs have clearly demonstrated that, without a large item pool, test security can be seriously compromised. One way to maintain a large pool of items is to replenish the pool by administering pretest items to a group of test takers taking an existing test and calculating the statistics for the items. However, administering new items to a large group of test takers increases the exposure rate of these items, compromising test security. One obvious solution is to administer a set of pretest items to a randomly selected small group of test takers. Unfortunately, this solution raises a serious problem: estimating the necessary item-level statistics using small samples of test takers.
Typically in computerized adaptive testing, a mathematical model called item response theory (IRT) is used to describe the characteristics of the test items and the ability level of the test takers. The item-level statistics of this model are commonly referred to as item parameters. In general, large samples are needed to estimate parameters. An issue that needs to be addressed is that of estimating these item parameters using a small sample of test takers. Several research studies have shown that, by incorporating prior information about item parameters, not only can item parameters be estimated more accurately, but estimation can be carried out with smaller sample sizes. The purposes of the current investigation are (i) to examine how prior information about item characteristics can be specified, and (ii) to investigate the relationship between sample size and the specification of prior information on the accuracy with which item parameters are estimated.
The best a priori source for information regarding the difficulty of items in a test is content specialists and test developers. A judgmental procedure for eliciting this information was developed for this study. Once this prior information was obtained, it was combined with data obtained from test takers and the item parameters were estimated.
Since the primary objective of this study was to investigate how incorporating prior information improves estimation of item parameters in small samples, the factors that were investigated were sample size and type of prior information. These two factors were examined with respect to the accuracy with which item parameters were estimated. In order to investigate the accuracy with which item parameters in the Law School Admission Test (LSAT) are estimated, the item parameter estimates were compared with the known item parameter values. By randomly drawing small samples of varying sizes from the population of test takers, the relationship between sample size and the accuracy with which item parameters are estimated was studied. Data from the Reading Comprehension section of the LSAT was utilized.
The results indicate that the incorporation of ratings of item difficulty provided by subject matter specialists/test developers produced estimates of item difficulty statistics that were more accurate than that obtained without using such information. The improvement was observed for all item response models, the evaluated, including the model that is currently used for the LSAT.
This study has demonstrated that using judgmental information about the difficulty of test items can produce dramatic improvements in the estimation of item parameters. This improvement may be sufficient to warrant the routine use of judgmental information in item parameter estimation. However, obtaining judgmental information is time-consuming and costly. The question that arises naturally is whether using some other form of prior information can result in savings and lead to estimates equally as accurate as those obtained by using judgmental information. Several other forms of prior information were used in this study to examine this issue.
While using judgmental information produced the most accurate estimates, differences between those estimates obtained using judgmental information and other forms of prior information were not substantial. In order to determine if differences that result from using different forms of prior information are substantial, the effects of using various forms of prior information for item calibration on the routing procedure in an adaptive testing scheme and the estimation of test taker ability need to be investigated. Only through such a study can the improvements offered by incorporating judgmental data as demonstrated in this study and other forms of prior information be fully understood.