The Importance of Test Purpose

We love to quote the beginning of the first chapter of The Standards for Educational and Psychological Testing.

Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of a test. Validity is, therefore, the most fundamental consideration in developing tests and evaluating tests. (p. 9)

We usually do this to focus on the the second sentence, as we make the case for the importance of item validity. We really believe that item validity is the true foundation of all validity claims. (Well, Except for a nice MLT, a mutton, lettuce and tomato sandwich, where the mutton is nice and lean and the tomato is ripe. They're so perky, We love that.) But — as The Standards lays out – the quality of a test must be judged against the “proposed uses of a test.”

That is, a tests’ purpose is the lens through which its quality must be evaluated.

Debates that Miss the Point

As is so often the case when participants in a discussion or debate appear to be talking past each other or to be simply ignoring each other, what is really happening is that each side is taking some set of values for granted and simply cannot understand what the other is saying because it makes no sense when examined through the lens of their own values. In debates about standardized testing, what is usually missed is the purposes that different sides think test ought to be put to. Of course, this relates to the audience for whom a test might be for.

Tests intended to be useful for teachers in their work are necessarily different than tests that are intended for to be useful for policy-makers in their work. Parent are a different group, as are school board voters. We have been told by school principals that local real estate agents pay a lot of attention to standardized test scores, and they have their own purposes in mind.

As is so often the case, the most valuable thing that one can do in a discussion or debate is to listen for the values that others are speaking from and speaking towards. To understand a discussion, disagree and/or a debate, you have to discern the values that animate each participant. To understand what people who argue in factor or against standardized tests mean, you have to discern the audience and purpose that they have in mind for those tests.

Purpose Drives Test Design and Development

If the purpose of a test is to deliver comparable individual scores for teach test taker, the test must be designed and developed with particular features. If those comparisons are to some standard of performance, the test should be designed and developed with particular features, and if the comparisons are to be between individual test takers, the design and development efforts will necessarily be a bit different. If the test is meant to show learning/growth, it must be designed in a particular ways, and if it is merely meant to show performance level, it can be designed differently. If the test is meant to be be given across many years to different groups of test takers, still other features and concerns must be addressed.

Virtually any purpose or audience or expected context for a test will alter something about the test design and/or development efforts.

Swiss Army Knives Suck

One of us used to carry a small Swiss army knife on on their keychain in their pocket. (Guess which one!) It had a knife, a pair or screw drivers and tiny pair of scissors. It had a tiny flashlight, tweezer and plastic toothpick. It was horrible. The knife was tiny and fragile, the scissors were weak, the flashlight was dim, the tweezers had a weak grip and the toothpick was…just a gross idea. Every single tool was bad. The only saving grace was that is was convenient to have at hand when nothing else was available.

This person also had a larger Swiss army knife – a gift from a friend – with many tools, kept in their computer bag just in case it might ever be useful. It wasn’t.

Swiss army knives may be cool, but they are bad. Every tool they include is awkward to use and perhaps barely effective at all. They may even be dangerous to use, with grips so poor and the constant risk of folding back in half.

A set of appropriately designed tools that are well suited to their purposes and function are infinitely more valuable that a single tool that does nothing well and most things unsafely. No tool can well serve to many purposes, and small cheap multi-purpose tools are often actually dangerous.

Respecting Other People’s Purposes

Many who decry standardized tests — including us — see that they are are not useful to teachers in their work. They simply do a lousy job of informing instructional decision making. This is not to say that they could not be useful for this, but such a purpose would call for different types of tradeoff and prioritization.

Many who support the existence of standardized tests — including us — are looking to different purposes than that. Some want to sort and categorize students. Others — including us – want to hold the educational system accountable for its performance. Of course, holding teachers accountable is different than holding school board members accountable. Holding schools accountable is different than holding state or federal departments of education accountable. Different purposes that each require different test design and development efforts.

If we are going to fix the mess of our standardized testing landscape — one in which absolutely no one is satisfied – then we have to respect the different purposes that different people want for tests. This is not so we can create horrible swiss army knives of tests, but rather so that we can create appropriate tests for the worthy purposes.

Obviously, there is more than one worthy purpose of a test, and therefore we likely need multiple tests — each of which some significant group of people will have problems with because it does not meet their preferred purposes.