What Standardized Tests Ain’t Good For

Assessment’s incompatible triad

It is devilishly difficult — when not impossible — for an assessment to i) examine the deep and substantive aspects of meaningful learning goals, ii) do so in a standardized manner and iii) deliver timely results. It is not difficult for an assessment to do any two of these three, but expecting all three is…a path to disappointment.

The demands of instructionally informative assessment

Instructionally informative assessment should address the most valuable and important learning goals and should deliver results in a timely fashion. This is incontrovertible. If the results are not timely, they are not actionable and therefore simply do not inform instruction. If an assessment fails to address the most valuable and important learning goals, it can encourage a diversion of precious resources (e.g., time, attention) from those actually important and worthy learning goals. If as assessment does not address valuable and important learning goals, it should be ignored.

But, wait….

Yeah, standardized assessment simply ain’t good at informing instruction. It generally will focus on less deep and substantive learning goals or deliver results too late to make a difference to instruction.

What about formative assessment?

Unfortunately, people often confuse formative assessment with interim assessment. Formative assessment is assessment that informs instruction. Prototypically, teachers can return to reteach material if formative assessments show that students have not achieved the desired levels of proficiency — focusing their efforts on the particular skills and lessons that students need.

If coverage demands prevent this kind of revisiting of course material, the assessment was not formative. If the results of the assessment come back too late to act on, the assessment was not formative.

Assessments that are given through a course, at various points along the way, are interim assessments.

(We love formative assessment. This can include the most informal kinds of assessment (e.g., quizzical looks on students’ faces) and much more formal kinds of classroom assessment. Teachers use formative assessment all the time, and great teachers make careful use of many types of formative assessment. It is just difficult to use standardized tests for formative assessment.)

Why can’t standardized assessment do better?

The problem is timeliness and scoring. The kinds of items/questions that are most amenable to quick and/or automated scoring only assess some lessons well. The kinds of learning that really require students to show their work or explain their thinking in order to be assessed are not among them. When scoring calls on discretion and judgment by scorers, one of the most critical aspects of standardized testing is lost, but the kind of nuanced use of formative assessment that we so admire simply depends on justifyng that kind of professional judgment by teachers.

This is the difference between the simplest kinds of surface comprehension of reading, and the more substantive kinds of analysis and deeper understanding of the meaning of text. This is the difference between simple applications of simple ideas in math and science, and the more complex ideas that students should be working towards.

When assessments call for judgments and nuance in scoring student responses, timeliness is lost. Standardized assessment is often based on a distrust of local teachers scoring responses, themselves. This does better assure standardized scoring procedures and consistent decisions, but it simply takes time. Faster alternatives that do not depend on the classroom teacher to score either focus on more simplistic content, or — as with so much automated scoring – are incapable of providing the kind of specific and content-focused information that teachers need to make formative decisions.

What should teachers do, then?

Teachers should primarily rely on the multitude of sources of information they have on student learning. They have their own assessments of students’ work, including tests. They have the kinds of questions that students ask and the expressions on their faces. All of this information is contextualized and grounded in the relationships that tell teachers so much about their students.

Standardized assessment cannot compete with all of that. So long as teachers are gathering these sorts of information, they should already know more than standardized tests can tell them.

What about artificial intelligence (AI)?

Perhaps because of the attention that OpenAI’s ChatGPT was getting in the media, the 2023 meeting of the National Council on Measurement in Education had a recurring sub-theme of What about AI? This is timely question.

We are AI skeptics. We have not seen evidence that’s AI engines can provide accurate enough information about strengths and weaknesses about student work that teachers should trust it over their own judgment. In early 2023, Education Week asked ChatGPT to generate some essays in response to a question and then to give feedback on those essays. The feedback looked like real feedback, but it was not accurate. For example, it cited an essay for making good use of details that teachers agreed did not include good use of details.

Will AI ever be able to take the scoring role in standardized assessment so that it can deliver timely feedback about deep and substantive lessons in a standardized manner? Maybe. Sure, eventually. But when? Not today, and we do not think it is going to happen next year, either.