Alignment & Validity

What is validity?

The first chapter of The Standards for Educational and Psychological Testing begins:

Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests. Validity is, therefore, the most fundamental consideration in developing tests and evaluating tests

Test validity is the alpha and the omega. It is the most fundamental consideration. Everything else is secondary. At the end of the day, we should evaluate tests based on their validity.

This means that when tests are put to appropriate use and tests actually support the inferences made from them, they are valid. When tests do not actually provide valid evidence of their claims or they are used for some other claims, they are not valid.

What is item validity?

We believe that too little attention is paid to item quality. We do not see how a test can validly support claims if the items on these tests themselves do not measure what they claim to measure. No amount of architecture can save a structure built of rotten materials. This is why we coined the term item validity, to highlight the importance of item quality.

We see three elements to item validity, and they are listed in the RTD Mantra: Valid items elicit evidence of the targeted cognition for the range of typical test takers.

See the Item Validity page for more information.

What about reliability?

Reliability is not the same thing as validity. Anyone who tells you otherwise is simply wrong — and far too many practitioners make this mistake.

Reliability is about the consistency of measurement. Does it measure the same thing across groups, across time, etc.? However, validity is about whether it is measuring the right thing.

People more focused on reliability point out that without reliability, you cannot have validity. After all, if you are not measuring consistently, you really aren’t measuring anything in particular at all. And they are correct about this. Some amount of reliability is certainly necessary. But you can have perfect reliability without having any validity at all.

Efforts to maximize reliability — which can be quantified and examined statistically — may come at the expense of validity. As a practical matter, in a world of limited budgets and limited testing time, reliability often can come at the expense of validity. It shouldn’t, but it can.

What, then, is alignment?

Alignment is an old term, with many definitions. In 1997, Norman Webb wrote about six different criteria for alignment, including ideas that have some resemblance to item validity. He also addressed construct representation (i.e., whether the whole test samples appropriately from the subject matter of the test) and other ideas. Alignment has focused on whether a test assesses what it is supposed to assess. Item validity takes some of those concerns (and some others) and focuses them on individual items.

Alignment differs from test validity because test validity looks beyond the test itself and considers whether the uses and purposes of tests are appropriate. For example, a test that is aligned to 3rd grade math curriculum might serve many valid uses when given to 3rd graders. However, that same test likely has few valid uses when given to sixth graders — or even no valid uses. Alignment does not envision misuse of tests, but test validity is very focused on this problem.