What is a Standardized Test?


”Standardized” is the key word

Many fields use consistent and standardized forms of measurement. The meaning of an inch, a meter, a kilogram, etc. is tightly defined. There is no universal set of measures or units to describe educational or psychological attributes.

“Standardized” refers to an effort to maximize comparability of test results, to better support certain type of inferences and purposes for a test. Thus, “standardized” tests may:

  • Contain the same items/questions in the same order for all test takers.

  • Be given/administered in the same conditions for all test takers.

  • Be scored in the same way for all test takers.

Test developers have techniques to equate different forms for a single test, to make results comparable in spite of the fact that different test takers may be given different questions. This is especially important to support consistency across the years that a test may be used.

What about “bubble” tests?

While multiple choice questions are widely used in standardized tests, standardized tests do not have to use multiple choice items, nor do they have to be Scantron or machine-scorable. Many standardized tests include so-call constructed response items, and some are comprised entirely of these items that require test takers to construct their response, rather than to select their response from a list of available options.

For example, New York State’s battery of Regents Exams — which have existed for many decades — include both selected response items and multiple choice items. Advanced Placement exams also contain both type of items. This does not make them any less standardized.

Then why do standardized tests include so many bubble items?

Multiple choice items are the easiest items to ensure will be scored the exact same way every time for every test taker. It is far more difficult to create a regime to score constructed response items so consistently. They require far more preparation for scorers/graders and take more time to score, and therefore cost much more money to score. Multiple choice items are faster and cheaper to score and are more consistently scored. Those are real advantages.

Of course, they come with disadvantages, too.

Are all multiple-choice tests “standardized” tests?

There is no definitive, official and enforceable definition of a “standardized test.” But almost no one would consider a test to be a standardized test simply because it uses multiple choice items.

Most people actually mean large-scale tests when they use the term “standardized test.” Large scale assessments strive to meet the requirements for comparability across reported test results, as comparability tends to be built into their purposes. There are many processes, practices, techniques and values that go into the development of large scale assessments, making them quite different than other tests that might use multiple choice items.

The RTD project is oriented to what we call large-scale, standards-based, standardized tests.

Doesn’t “standards-based” just mean “standardized?”

Educational learning standards are the learning goals that some official body has set. For our schools, they are usually ratified and endorsed though state legislation — meaning they are passed by legislatures and signed by governors. They are the goals — usually the academic learning goals — that teachers, textbooks and curricula should be aiming towards. Hence, they are the targets that large-scale assessments are trying to measure.

“Standards-based” refers to what these assessments are trying to measure. “Standardized” is a coincidentally similar word that refers to efforts to maximize the comparability of test results across test takers and across time.

What is “classroom assessment?”

“Classroom assessment” or “teacher-made tests” refer to assessments that are created with lesser efforts at standardization or comparable results. For example, the scoring/grading procedures are not usually not as fully and clearly documented. In practice, teachers often use their knowledge of individual students to influence how they interpret student work and the kinds of feedback they may give. This kind of customization can be quite valuable, instructionally. That is a different purpose than the comparability of large-scale standardized tests.

Furthermore, classroom assessments are combined with each other and with other forms of information by teachers when they make inferences about students or come to conclusions about them. Large-scale standardized assessments usually stand alone, with access to all that other information.

Hence, classroom assessment and large-scale standardized assessments are created very differently.