RTD Central Tenet/Mantra

Valid items elicit evidence of the targeted cognition for the range of typical test takers.

Obviously, there is more to Rigorous Test Development than can be fit into a single statement. And we have many aphorisms that we frequently repeat (e.g., incremental progress is progress (and all sustainable progress is incremental progress)). But the most important ideas in RTD are found in our central tenet (or mantra), valid items elicit evidence of the targeted cognition for the range of typical test takers.

Valid items: Items are the building block of assessment and it is important that they report on what they purport to report on.

Elicit evidence: The importance of items is in how test takers respond to them, not merely in how they exist on the page or screen. We judge items based the observable evidence they elicit — be it in elements of an essay, aspects of a performance or the selection of an answer option.

The targeted cognition: Items must focus on the standard targeted (or selected KSAs that make up that standard). Items that elicit evidence of other cognition — either affirmative or negative evidence — are not valid.

The range of typical test takers: It is not enough for items to elicit valid evidence for the average test taker. It is not enough for them to work for the kinds of test taker that the CDPs (content development professionals) once were. Items are routinely encountered by a range of test takers — even a predictable range. Items that do not elicit valid evidence across that range simply are not valid.

If you think that our mantra just means that items should be aligned with their standards and that fairness matters…well, we can’t disagree. Yeah. Fairness matters. And items must be aligned.

If you think that all we are doing is restating the fact that validity is the most important consideration in developing and and evaluating tests…well, yeah. The second sentence of the first chapter of the the 2014 Standards for Educational and Psychological Testing does say, “Validity is, therefore, the most fundamental consideration in developing tests and evaluating tests.”

We think a lot about what it means for an item to be aligned. We think a lot about the implications of taking fairness seriously in content development. But we do not think that we are bringing new values to the table, here. It is just that when we try to put it all together, we get valid items elicit evidence of the targeted cognition for the range of typical test takers.