Item Validity

 

What is item validity?

The meaning of item validity can be found in our mantra, valid items elicit evidence of the targeted cognition for the range of typical test takers.

Why does item validity matter?

Items are the building blocks of tests. They form the foundation of everything that is done with assessments. Test takers interact with items. The data that psychometricians analyze and process comes from test takers’ responses to items. The inferences that are made are made upon a foundation of items. The uses to which test results are put rest upon a foundation of items.

If items are not valid then every inference, use and purpose to which tests are put rests on too weak a foundation. Rigorous Test Development is item-centric because items are the foundation and basic building blocks of assessment.

Does item validity only matter for standardized tests?

No. Item validity matters for every type of test, measurement and assessment.

However, teacher-made classroom assessments are more forgiving because test takers have had days, weeks and/or months to learn how their teachers communicate and teachers have had just as long to get to know each student. There’s more room for “youknowudtImean?” kinds of understanding between them. Furthermore, classroom assessments are weighed in the context of a multitude of other information before decisions are made.

Is item validity a silver bullet to fix standardized tests?

No, not even remotely. Individual assessment targets can be hit, but can be poorly selected in the first place. Tests can designed without appropriate distributions of targets and content. Results from individual items can be poorly combined. Scores can be misleadingly reported. And tests can be used for invalid purposes.

High quality tests put to valid uses require the whole chain to be built of strong links. But we believe that once the content domain is well selected and modeled — which often comes before tests are developed — item validity is the most important link.

The Standards for Educational and Psychological Testing do not say anything about item validity.

That is correct. We are not redefining “item validity” to mean anything new because it is not a term used by others. If someone claims that “item validity is not a thing,” they just mean that they have never heard of it.

The Standards for Educational and Psychological Testing (AERA, APA, NCME, 2014) opens its first chapter with saying, “Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests. Validity is, therefore, the most fundamental consideration in developing tests and evaluating tests." Thus, the chapter begins by referring to test validity and being clear that what really matters is the validity of the inferences made based upon test taker performance. Throughout the chapter, it addresses what is necessary for anyone to have confidence in test scores’ interpretation and use.

The Standards focus on test use and interpretation. They do not address items or what validity might mean at that level of analysis. They briefly mention alignment, but (unfortunately) do not consider the implications of taking fairness (e.g., the range of typical test takers) seriously for thinking about alignment.