RTD Core Principles — Rigorous Test Development

Core Principles

Downloads

RTD New Core Principles and Pillar Practices (AERA 2024 paper)

Over ten years ago, closer to the beginning of the Rigorous Test Development Project, we offered a list of six Core Principles. (That original (OG) list is downloadable from the sidebar to the left.) We now offer the new Core Principles, though they are currently a work in progress. They will be finalized soon.

New Core Principle #1: Valid items are the foundation and building blocks of all assessments.

RTD is about building tests from which accurate inferences can be made – sometimes about individual test takers and sometimes about groups of test takers. However, no matter how sophisticated the statistical techniques or advanced the technologies used to deliver or score assessments, all tests and inferences rest on an assumption that individual items actually indicate what they are purported to indicate. That is, the item is the building block and low quality items simply cannot support the inferences that anyone wishes to make based upon them.

New Core Principle #2: Domain models (e.g., learning standards, job or role analyses) are the basis for tests and their items, not just the inspiration for them.

The RTD approach to item development acknowledges the importance of the work and the products of domain analyses and domain modeling. Moreover, it it acknowledges the authority of the organizations that adopt and/or ratify the resulting domain models. Hence, items must be aligned to elements of the domain model (e.g., a single learning standard). Alignment is not merely an aspiration or a continuum. Rather, tests are developed to provide inferences based on domain models, and therefore items must each be strongly aligned to their purported aligned standards. It is not for content development professionals to redefine the larger construct or the individual standard by submitting their own preferences in place of what is described in the domain model.

New Core Principle #3: Successful responses to items should provide significant affirmative evidence that the test taker has proficiency with the targeted cognition and unsuccessful responses should provide just as strong negative evidence.

Items must provide observable evidence of student proficiencies and an individual item can only provide evidence for rather focused targeted cognition. High quality items provide evidence that minimizes the layers of inference that must be made to get to conclusions about student proficiency. This means that correct responses should indicate proficiency with the targeted cognition – and not provide the false positives that arise from alternative successful paths through items. Just as importantly, incorrect responses should demonstrate specific mistakes or misunderstandings with the targeted cognition – and not provide the false negatives that result from stumbles with Additional KSAs (i.e., knowledge, skills and/or abilities that are not part of the targeted cognition).

New Core Principle #4: Standardized assessment has a fundamentally different approach to evaluating proficiency than does classroom assessment.

Everyone who has been through schooling intuitively understands that teachers assess students in variety of ways, both formal and informal. This includes a vast variety of integrated and ongoing tasks, and teachers combine this plethora of information to develop holistic, detailed and nuanced understandings of their students. On the other hand, large scale, on-demand, standards-based, standardized assessment has very few opportunities to assess students and is expected to deliver a degree of precision and reliability that is quite different than what is demanded of classroom assessment. Hence, the constraints and demands on standardized assessment necessarily result in very different looking assessments and very different processes to develop them. (To the extent that classroom educators might benefit from learning more about standardized assessment, it is not so that they can adopt the particular practices or forms of standardized assessment. Rather, there are principles that can be adapted for use in their own work, though they need to be presented and learned in the context of classroom assessment practice rather than through standardized assessment development.)

New Core Principle #5: Test items are precise, highly connected and delicate devices – like mechanical watches and poems.

Standardized test items are both written with a rare direct and focused purpose and read with a rare level of serious attention. They face an added challenge of brevity and clarity that other very precise kinds of writing (e.g., legal writing) simply do not. Therefore, the different parts of an item – particularly a multiple choice item – interlock very tightly and particularly. (For these purposes, the targeted cognition and any stimuli often act as parts of the item, even as they also exist independent of the item.) Thus, items require highly skilled crafting, even down to the interplay of individual words through each element of the item — even in ways that differ from content domain to content domain and from testing population to testing population.

New Core Principle #6: Test development requires being mindful of and managing competing tensions – at times even issues that do not at first appear to be in tension.

Like so much professional work and so many other kinds of projects, the work of test and item development cannot be reduced to simple checklists that provide absolute answers. Instead, CDPs must constantly exercise professional judgments to meet competing demands or find the best compromise between ideals or principles that do not themselves make allowances for each other. For example, balancing authenticity and construct isolation, or various desires for the uses to which a test may be put. This professional judgment is most needed just to recognize these tensions in the first place, as they are not always readily apparent.

New Core Principle #7: Test development requires approaching the work with humility

Humility is critical to learning and critical to all collaborative work. It requires being mindful of the limits of one’s own expertise and aware of the scope of the expertise of others. Humility requires every single person to recognize when they should be asking questions and/or listening, as opposed to answering questions and/or talking (i.e., confidence). Humility is about respect for others and appreciation for the value of doubt and uncertainty. While a shortage of appropriate humility can create problems and obstacles, so can an excess.

New Core Principle #8: Test and item development require deep knowledge of content standards/assessment targets and broad knowledge of their content domain context.

High quality tests and test items carefully target specific knowledge, skills and/or abilities (KSAs). This cannot be done merely by using or echoing the words of a learning objective or description of a (set of) KSAs. Rather, test developers must understand how the elements of the content domain fit together and the significance of a particular target. Otherwise, they cannot have any confidence that they are targeting the most important elements of the content domain, that they are targeting the most worthy facets of those elements or that their products might be successful in those efforts. This often includes understanding how the content is taught and learned, and even how the material is used beyond the classroom.

New Core Principle #9: Understanding the perspective of the range of typical test takers is the true heart of assessment development

Nothing is more important than understanding – anticipating – how test takers will understand test items and how their cognitive paths will take them from the item presented to them to their responses. Without this, items will not just suffer innumerable forms of bias, but they also will be fatally compromised in their ability to elicit evidence of the targeted cognition. Therefore, content development professionals must spend their whole careers learning more about the range of typical test takers and about how they think about and respond to items.