The Relationship of Item-Level Response Times With Test Taker and Item Variables in an Operational CAT Environment (CT-98-10)
Kimberly A. Swygert
The feasibility of implementing the Law School Admission Test (LSAT) as a computerized test (CT) or computerized-adaptive test (CAT) has been under investigation. One of the advantages of creating a computerized LSAT is that the item-level response times will become available for study; the paper-and-pencil (P&P) version does not allow for the practical collection of these response times. Issues such as speededness (when all test takers do not finish the test), differential speededness (when certain test-taker groups are more likely to be speeded than others, with groups differing in ways not related to the ability being measured), and the variability among test takers in response times are as important in a CAT format as in a P&P format; with item-level response times being available, these traditional issues may be studied in new ways.
It is now possible to look at test taker behavior by observing response times across the test; when test takers begin to run out of time, they often begin responding rapidly to the final items. One question that may be asked is whether the tendency to respond rapidly near the end of a test is independent of ability. Many existing studies based on operational and simulated computerized tests (CTs) assume that independence exists. Even if this assumption were true, it cannot be assumed that all variability in test-taker response times is independent of ability. It is possible that, for some items, speed is part of the underlying construct being measured (even if that was not the intention of the test developers). Studies that have examined the relationship between ability and response time suggest that, under unspeeded or untimed conditions, no relationship may be apparent, but when speededness is present, a relationship may materialize. Given that most computerized tests are administered under a time limit, and that a computerized LSAT may well be also, the relationship between ability and response time should be examined rather than assumed to be nonexistent.
Another important question involves the relationship of item characteristics to mean (average) item response time. Both item difficulty and item serial position within the test may be related to response time. If difficulty is predictive of response time, this may need to be taken into account when items are administered, so that test takers who are receiving difficult items are not handicapped by the time limit for the test. If item serial position is related to response time, that may be another measure of overall test speededness, particularly if the response times greatly decrease near the end of the test.
In this study, data from an operational CAT were examined in order to gather information concerning item response times in a CAT environment. The CAT included multiple-choice items measuring verbal, quantitative, and analytical reasoning; only discrete, stand-alone items were used, to avoid the confounding of item response time with reading time for passages. The analyses included the fitting of regression models describing how well the variability in the item-level response times is predicted by item-response-theory-based (IRT-based) item parameters and serial position. All of these analyses were performed for CAT data collected under two different conditions: the first, in which the test takers were only required to answer 80% of the items to receive a score, and the second, in which the test-taker score was proportional to the number of items answered. The availability of two versions of the same test, differing only by the number of items the test taker is required to answer, was a bonus for the study, as it was possible to compare the results for the two datasets to see if the change in scoring rule produced a change in the relationships among the variables.
One issue that arose with the use of response times as a variable in model fitting was the issue of estimating response time means. A CAT data array will have missing values in nonrandom ways, and this had to be taken into account when estimating the mean item response times. Godfrey’s solution to mean estimation with missing data, the square combining table method, was employed to give better estimates of mean response times to be used in the fitting of the regression models. Also-to account for the relationship between ability and item difficulty-the effects due to items, the effects due to test takers, and the residual effects were all isolated, and the regressions were performed separately for each set of effects.
The results show that ability is predictive of item-level response time for items on the verbal section for both datasets, while item difficulty is predictive of item-level response times for certain sets of quantitative and analytical items. In each case, the regression equations explained more of the variability in the item-level response times when the data were administered under the proportional adjustment scoring rule, which produced a more speeded testing situation than did the 80% scoring rule. Thus, it appears that test takers with low verbal abilities may be more affected by the testing time limits, and this conclusion agrees with earlier research on differential speededness. Also consistent with earlier research are the conclusions that (1) more difficult quantitative items take longer to do, even if the test taker knows how to do them, (2) the analytical section is speeded for test takers of all abilities, and (3) despite the speededness of the analytical section, it appears that those items require more time as they become more difficult, and low-ability test takers may be working less quickly on this section than high-ability test takers. Due to using only discrete items in the analyses, the generalizability of these results to the LSAT is limited, as the LSAT contains no quantitative section and contains only passage-based items on the reading comprehension and analytical reasoning sections. However, the results from the analytical reasoning section of this study do generalize to the logical reasoning section of the LSAT, because the item types are similar.