The Difference Between Classroom Assessment and Standardized Assessment

The vast overwhelming majority of student assessment is not standardized assessment — at not in the K-12 school context. Rather, students are subject to innumerable so-called classroom assessments. Most people have a much better idea of classroom assessment than they do of standardized assessment. And yet, classroom assessment is so ubiquitous and taken for granted that even teacher are not consciously aware of important distinctions between classroom assessment and standardized assessment.

Defining Terms

For the purposes of this discussion, standardized tests refers to large scale, standards based, on demand, standardized tests. That is, the big even tests that are written and developed god-knows-where and are not scored by students’ own teachers. They developed so that all student take equivalent version, all students take the tests in equivalent conditions and students’ responses are all scored equivalents. These are expensively developed assessments designed to provide consistent types of evidence to support some use(s) or purposes of the test. They may be dominated by multiple choice items, but this is not required.

Classroom assessment refers to all the other big and small things that educators do to collect and interpret information on student learning, student progress and student needs. They may be big (e.g., final exams, projects) or small (e.g., problem sets, looking for quizzical expressions on students’ faces). They may be quick or long. They may be quite formal or quite informal. They originate from a textbook publisher, the teacher themself or some other resources. Different teachers make come up with different versions and may scores the same versions somewhat differently. They are the things that teachers do.

Major Differences Between Classroom Assessment and Standardized Assesment

There are numerous major

uClassroom assessment aims to contribute information to a rich, deep and nuanced understanding of each student by those who know and are invested in them – on a particular skill or subskill level and/or on integrated – within a known context and history.

uTo inform instruction, give feedback and report to parents/guardians

uStandardized assessment aims to collect comparable data on students (and sometimes groups of students) across a wide variety of instructional, cultural and geographic contexts.

u(for a variety of reasons)

standardized assessment

uNot customized
uNot contextualized
uNot situated
uNot discretionary

Amount of Information

Classroom Assessment

uFrequent
uFormal and informal
uCreated by teachers who do know students
uTaken by students who know the author (style, lingo, intentions)
uAnalyzed for each student in the context of many other sources of information

Standardized Assessment

uInfrequent
uOnly formal
uCreated far away by professionals who do not know all the students
uTaken by wide variety of students who do not know the authors
uAnalyzed for many (many many) students without other sources of information

Types of Inquiry

Classroom Assessment

uSmall milestones
uIntegrating skills
uCreated/selected in the context of a known curriculum
uRepeated opportunities to demonstrate developing skills & learning over time

Standardized Assessment

uMajor waystations
uIndividual discrete skills
uCreated without knowing the particular curricular context
uUsually a one-time opportunity to demonstrate developed skills
Classroom assessment
uMany reasons why classroom assessment can be:
uMore creative
uMore divergent
uLess precise
uEvaluated more holistically
uEvaluated more contextually
u
uProvides additional layers of information that contribute to rich understanding of each student

Standardized assessment

uChallenged by:

uWide variety of student backgrounds
uWide variety of instruction, curriculum and context
uLack of experienced familiarity with communication from these test authors
uVery limited access to any other information about students

uNecessary Response:

uFocused precision in each question
uTight care on language and presentation
uConcern with many kinds of efficiency
We love to quote the beginning of the first chapter of The Standards for Educational and Psychological Testing.

Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of a test. Validity is, therefore, the most fundamental consideration in developing tests and evaluating tests. (p. 9)

We usually do this to focus on the the second sentence, as we make the case for the importance of item validity. We really believe that item validity is the true foundation of all validity claims. (Well, Except for a nice MLT, a mutton, lettuce and tomato sandwich, where the mutton is nice and lean and the tomato is ripe. They're so perky, We love that.) But — as The Standards lays out – the quality of a test must be judged against the “proposed uses of a test.”

That is, a tests’ purpose is the lens through which its quality must be evaluated.

Debates that Miss the Point

As is so often the case when participants in a discussion or debate appear to be talking past each other or to be simply ignoring each other, what is really happening is that each side is taking some set of values for granted and simply cannot understand what the other is saying because it makes no sense when examined through the lens of their own values. In debates about standardized testing, what is usually missed is the purposes that different sides think test ought to be put to. Of course, this relates to the audience for whom a test might be for.

Tests intended to be useful for teachers in their work are necessarily different than tests that are intended for to be useful for policy-makers in their work. Parent are a different group, as are school board voters. We have been told by school principals that local real estate agents pay a lot of attention to standardized test scores, and they have their own purposes in mind.

As is so often the case, the most valuable thing that one can do in a discussion or debate is to listen for the values that others are speaking from and speaking towards. To understand a discussion, disagree and/or a debate, you have to discern the values that animate each participant. To understand what people who argue in factor or against standardized tests mean, you have to discern the audience and purpose that they have in mind for those tests.

Purpose Drives Test Design and Development

If the purpose of a test is to deliver comparable individual scores for teach test taker, the test must be designed and developed with particular features. If those comparisons are to some standard of performance, the test should be designed and developed with particular features, and if the comparisons are to be between individual test takers, the design and development efforts will necessarily be a bit different. If the test is meant to show learning/growth, it must be designed in a particular ways, and if it is merely meant to show performance level, it can be designed differently. If the test is meant to be be given across many years to different groups of test takers, still other features and concerns must be addressed.

Virtually any purpose or audience or expected context for a test will alter something about the test design and/or development efforts.

Swiss Army Knives Suck

One of us used to carry a small Swiss army knife on on their keychain in their pocket. (Guess which one!) It had a knife, a pair or screw drivers and tiny pair of scissors. It had a tiny flashlight, tweezer and plastic toothpick. It was horrible. The knife was tiny and fragile, the scissors were weak, the flashlight was dim, the tweezers had a weak grip and the toothpick was…just a gross idea. Every single tool was bad. The only saving grace was that is was convenient to have at hand when nothing else was available.

This person also had a larger Swiss army knife – a gift from a friend – with many tools, kept in their computer bag just in case it might ever be useful. It wasn’t.

Swiss army knives may be cool, but they are bad. Every tool they include is awkward to use and perhaps barely effective at all. They may even be dangerous to use, with grips so poor and the constant risk of folding back in half.

A set of appropriately designed tools that are well suited to their purposes and function are infinitely more valuable that a single tool that does nothing well and most things unsafely. No tool can well serve to many purposes, and small cheap multi-purpose tools are often actually dangerous.

Respecting Other People’s Purposes

Many who decry standardized tests — including us — see that they are are not useful to teachers in their work. They simply do a lousy job of informing instructional decision making. This is not to say that they could not be useful for this, but such a purpose would call for different types of tradeoff and prioritization.

Many who support the existence of standardized tests — including us — are looking to different purposes than that. Some want to sort and categorize students. Others — including us – want to hold the educational system accountable for its performance. Of course, holding teachers accountable is different than holding school board members accountable. Holding schools accountable is different than holding state or federal departments of education accountable. Different purposes that each require different test design and development efforts.

If we are going to fix the mess of our standardized testing landscape — one in which absolutely no one is satisfied – then we have to respect the different purposes that different people want for tests. This is not so we can create horrible swiss army knives of tests, but rather so that we can create appropriate tests for the worthy purposes.

Obviously, there is more than one worthy purpose of a test, and therefore we likely need multiple tests — each of which some significant group of people will have problems with because it does not meet their preferred purposes.