RTD’s Small Theory of Evidence
The quality of evidence produced by an item or test is inversely proportional to its ability to produce or support Type I and/or Type II errors.
While we truly love ECD (evidence centered design) — it inspired the entire RTD project — we see that one of its major shortcomings is its failure to offer a theory of evidence. ECD does not explain what evidence is or how to think about it. ECD is clear that inferences should be based on observable evidence and that tests (and items) should produce that evidence, but the evidence itself is rather a black box.
As Rigorous Test Development is item-centric and evidence is produced by individual items, we recognized that we had to offer a theory of evidence as part of RTD. This theory is grounded in the ECD idea that test takers produce a work product in response to each item and that that work product contains the evidence in question. To that, we simply add the idea of Type I and Type II errors.
Ideally, the evidence that test takers produce can be taken at face value.
Successful responses to items should constitute high quality affirmative evidence, thus indicating that the test taker possesses appropriate mastery of the targeted cognition.
Unsuccessful responses to items should constitute high quality negative evidence, thus indicating that the test taker does not possess appropriate mastery of the targeted cognition.
Unfortunately, evidence can also be misleading.
When a test taker responds successfully to an item without engaging in a task that appropriately relies on the targeted cognition, the evidence falsely indicates that the test taker possesses appropriate mastery of the targeted cognition. This a Type I error, more commonly known as a false positive result.
When test takers responds unsuccessfully to an item because of a problem or stumble with a KSA (knowledge, skill or ability) other than the targeted cognition, the evidence falsely indicates that the test taker does not possess appropriate mastery of the targeted cognition. This a Type II error, more commonly known as a false negative result.
In this context, these Type I and Type II errors do not necessarily lead to false inferences about the test taker’s proficiencies. Rather, they provide erroneous contributions to the discussion, making it far more difficult to ultimately determine what the test taker’s actual proficiencies are.
True Evidence | False Evidence | |
---|---|---|
Successful Response | Test taker possesses appopriate mastery of the targeted cogntition | Type 1 Error Usually caused by an alternative task |
Unsuccessful Response | Test taker lacks appopriate mastery of the targeted cogntition | Type II Error Usually caused by inappropriate additional KSAs or low quality distractors |
This small theory of evidence gives CDPs (content development professionals) a framework within which to think about the evidence that test takers produce in response to items. All evidence fits in one of these four categories.