Knowing What Students Know
Recent advances in the cognitive and measurement sciences should be the foundation for developing a new system of student assessment.
Many people are simply puzzled by the heavy emphasis on standardized testing of students and eager to find out exactly what is gained by such activity. The title of the National Research Council study that I co-chaired states the goal directly: Knowing What Students Know. The concerns about student assessment are quite well known: misalignment of high-stakes accountability tests and local curricular and instructional practices, narrowing of instruction by teaching to tests with restricted performance outcomes, the frequent failure of assessments to provide timely and instructionally useful and/or policy-relevant information, and the failure to make full use of classroom assessments to enhance instruction and learning.
The goal of our committee was not to review all the alleged shortcomings of past or current tests but to take the opportunity to rethink assessment. During the past 30 years, there have been major advances in the cognitive sciences that help us to understand the ways that people learn, which in turn help identify more precisely what aspects of student achievement we should be trying to assess. During the same period, there has been equally rapid progress in the development of measurement techniques and information technology that enhance our ability to collect and interpret complex data and evidence. The committee brought together experts in measurement, cognitive and developmental psychology, math and science education, educational technology, neuroscience, and education policy to determine how these recent developments can be applied to assessment. The committee’s stated mission was to establish a theoretical foundation for the design and development of new kinds of assessments that will help students learn and succeed in school by making as clear as possible to them, their teachers, and other education stakeholders the nature of their accomplishments and the progress of their learning. Most assessments now in use fail to meet this objective.
We want assessment to be a facilitator of higher levels of student achievement. This will require a departure from current practice that must be guided by further research as well as policy changes. What we’ve learned from research enables and compels us to do a better job with assessment.
What is assessment?
Assessment is a process of gathering information for the purpose of making judgments about a current state of affairs. In educational assessment, the information collected is designed to help teachers, administrators, policymakers, and the public infer what students know and how well they know it, presumably for the purpose of enhancing future outcomes. Part of the confusion that I mentioned above stems from the fact that some of these outcomes are more immediate, such as the use of assessment in the classroom to improve learning, and others are more delayed, such as the use of assessment for program evaluation.
This means that in looking at any assessment, we have to keep in mind issues of context and purpose. Sometimes we’re looking for insight into the state of affairs in the classroom, at other times emphasis is on the school system. The particular focus could be to assist learning, measure individual achievement, or evaluate programs. And here’s the rub: One size does not fit all. By and large, this has been overlooked in the United States. As a result, we have failed to see that changes in the context and purpose of an assessment require a shift in priorities, introduce different constraints, and lead to varying tradeoffs. When we try to design an all-purpose assessment, what we get is something that doesn’t adequately meet any specific purpose.
Any assessment must meld three key components: cognition, which is a model of how students represent knowledge and develop competence in the domain; observations, which are tasks or situations that allow one to observe students’ performance; and interpretation, which is a method for making sense of the data relative to our cognitive model. Much of what we’ve been doing in assessment has been based on impoverished models of cognition, which has led us to highly limited modes of observation that can yield only extremely limited interpretations of what students know.
It does little good to improve only part of this assessment triangle. Sophisticated statistical techniques used with restricted models of learning or restricted cognitive tasks will produce limited information about student competence. Assessments based on a complex and detailed understanding of how students learn will not yield all the information they otherwise might if the statistical tools available to interpret the data, or the data themselves, are not sufficient for the task.
We should move away from the simplistic notion that a test is a test is a test. We have to make certain that any given assessment is designed for its specific purpose and that within that context all three legs of the assessment triangle are strong.
Scientific foundations
Advances in the sciences of thinking and learning need to be used for framing the model of cognition, which must then inform our choices about what observations are sensible to make. Developments in measurement and statistical modeling are essential for strengthening the interpretation of our observations. For a thorough description of recent developments in the cognitive sciences, I must refer you to another NRC study How People Learn. I can’t begin to do justice to what’s been accomplished in this large and dynamic field, so I’ll limit myself to the bare bones of what we need to know in order to develop sound assessments.
The most critical implications for assessment are derived from study of the nature of competence and the development of expertise in specific curriculum domains such as reading, mathematics, science, and social studies. Much is known now about how knowledge is organized in the minds of individuals–both experts and novices in a field. We know more about metacognition: how people understand their own knowledge. We recognize that there are multiple paths to competence; that students can follow many routes in moving from knowing a little to knowing a lot. One of my favorite insights is that we now know much about the preconceptions and mental models that children bring to the classroom. They are not blank slates, and we can’t simply write over what is on those slates. In fact, if we fail to take into account how they know things, we are very likely to fail in instruction. In addition, much of students’ knowledge must be understood as highly contextualized and embedded in the situation in which it was acquired. The fact that we have taught fractions does not mean that students have a broad and flexible knowledge of fractions.
Contemporary knowledge from the cognitive sciences strongly implies that assessment practices need to move beyond discrete bits and pieces of knowledge to encompass the more complex aspects of student achievement, including how their knowledge is organized and whether they can explain what they know. There are instructional programs and assessment practices for areas of the curriculum based on cognitive theory, but much more work is needed.
Interpreting data on student performance is neither simple nor straightforward. Researchers have been refining techniques for deciding how data can be filtered and combined to produce assessment results, as well as determining how much information on what types of tasks needs to be collected. Statistical models are especially useful in four situations: high-stakes contexts in which we want to make decisions that have major effects on students’ lives; when we know relatively little about the history and experiences of the students being assessed; for complex models of learning; and for large volumes of data.
How do advances in measurement enable us to go beyond simplistic models of general proficiency? Three general sets of measurement issues account for most of the discontent with current measurement models, and each concern can be accommodated by various newer models. The first concern is whether we need to capture performance in terms of qualitatively discrete classes or a single general continuum of performance. The second is whether we need to be evaluating single or multiple attributes of performance. The third concern is whether we are evaluating status at a single point in time or change and growth over a period of time. Dealing with these issues requires a progression of models and methods of increasing complexity.
Fortunately, we have a collection of new measurement methods that can deal with more complex measurement concerns, but they have yet to be applied completely to the practical work of assessment. Information technology can help in this effort, but it will require more thought and research to explore the fit between particular statistical models and varying descriptions of competence and learning. Doing so requires extensive collaboration among educators, psychometricians, and cognitive scientists.
Assessment design and use
In moving from the scientific foundations to the design and use of effective assessment, we encounter four major challenges: developing principles to guide the process of assessment design and development; identifying assessment practices that are connected to contexts and purposes; exploring feasibility questions, particularly the potential for applying technology to many new design and implementation challenges; and being prepared to consider the possibility of a radical vision for the transformation of assessment.
Assessment design should always be based on a student model, which suggests the most important aspects of student achievement that one would want to make inferences about and provides clues about the types of tasks that will elicit evidence to support those inferences. What this means for the classroom teacher is that assessment should be an integral part of instruction. Students should get information about particular qualities of their work and what they can do to improve. Along the way, they should also be helped to understand the goals of instruction and what level of performance is expected of them. Of course, this will become possible only when cognitive science research findings are expressed in a user-friendly language. The implication for large-scale testing is that an integrated approach holds promise for drawing more valid and fair inferences about student achievement. But in order to implement this approach, policymakers will have to relax the constraints that drive current practices. For example, states should be allowed to administer more complex tests to a sample of students to acquire more fine-grained information on what students can do or to embed assessments in the curriculum in subjects such as art. The goal is to replace single assessment tests with systems of assessment that cut across contexts and are more comprehensive, coherent, and continuous. A number of assessment systems have been developed, and they give us a good starting point for further development.
Computer and telecommunications technologies provide powerful new tools that are needed to meet many of the design and implementation challenges implied by merging cognitive models and measurement methods. Compared to a conventional paper-and-pencil test, a test designed for administration on a computer makes it possible to consider novel ways of presenting questions, including a richer mix of task designs and question formats, assessing a broader repertoire of cognitive skills and knowledge, recording and scoring complex aspects of behavior, and embedding assessments in learning environments. If we let our imaginations roam a little, we can see an assessment system that is radically different from the annual multiple-choice test on a narrow slice of material.
Rich sources of information about student learning can be continuously available across wide segments of the curriculum and for individual learners over extended time periods. Electronic tests can be far more easily customized to the individual student and to the particular purpose or context. As we consider how to apply the insights of cognitive and measurement science, we should not limit ourselves to tinkering with testing as we know it. We should imagine assessment that is truly useful for a number of purposes, most importantly to aid in student learning.
Road map for action
This vision of an assessment system designed to help students learn will become reality only if we continue making research progress, reform policies to allow innovation in testing, and take the steps necessary to make changes in practice. The major research challenges include developing models of cognition and learning that are particular to each area of the curriculum while also developing statistical models designed to match the cognitive theories. To improve the fairness of the tests, we need to find reliable ways to incorporate what we know of the individual student’s instructional history. And to ensure that the new assessments will work in practice, we should form multidisciplinary design teams that include practitioners as well as researchers.
Practitioners will have to assume a key role in research designed to explore how new forms of assessment can be made practical for use both in classroom and in large-scale contexts and how various new forms of assessment affect student learning, teacher practice, and educational decisionmaking. This will entail developing ways to assist teachers in integrating new forms of assessment into their instructional practices, exploring ways that school structures (such as class period length, class size, and opportunities for teacher interaction) affect the feasibility of effectively implementing new types of assessment.
Developers of educational curricula and classroom assessments should create tools that will enable teachers to implement high-quality instructional and assessment practices that are consistent with modern understanding of how students learn and how such learning can be measured. But simply delivering these tools to teachers will not be enough. The teachers themselves will need to be familiar with the cognitive science from which the new tools and approaches have emerged. We need to develop training materials that will enable teachers to understand how new research should be incorporated into their classroom practice. Instruction in how students learn and how learning can be assessed should be a major component of teacher preservice and professional development programs. Standards for teacher licensure and program accreditation should include training in assessment. This clearly is an enormous task and another area in which new information technology can play a critical role. Computer- and Internet-based training materials could make it possible to reach a large number of teachers quickly and at relatively low cost.
The country is apparently committed to the need for large-scale assessments of all students, but that need not be a commitment to the type of tests we use today. We can develop assessment systems that examine the broad range of competencies and forms of student understanding that research shows are important aspects of student learning. Although this can involve some of the familiar forms of testing, it must also incorporate a variety of other approaches, such as matrix sampling of items for test administration and curriculum-embedded measures.
We also need to devote more attention to explaining our new approaches to students, teachers, and administrators. All participants should be helped to understand what the learning goals are and how the assessment process advances those goals. This extra effort should also extend to the reporting of results. Too often the test score is communicated simply as a verdict with which one must live. We should develop reporting mechanisms that enable teachers and students to use the results to focus their efforts and improve educational outcomes.
We need to work with policymakers to help them recognize the limitations of current assessments and to convince them to support the development of new systems of multiple assessments that will improve their ability to make decisions about education programs and the allocation of resources. We must disabuse them of the dangerous belief that important decisions can be based on single test score. An accurate picture of student achievement can be attained only through a mix of assessment measures, from the classroom to the national level, that are coordinated to reflect a shared understanding of how learning occurs. Policymakers also need to understand that assessment is a movie, not a snapshot. What we really want to know is how students are progressing over time, not where they stand on a particular day.
Finally, educators and researchers need to work in cooperation with the media to improve public understanding of assessment, particularly as more complex assessment systems are implemented. Parents and voters need to be able to use this information to guide their decisions. Without sound information in accessible form, they will become even more puzzled by the mysteries of assessment.
If we all work together, we can replace assessments that are often narrow one-shot tests of skills divorced from the school curriculum with an assessment system that is comprehensive in merging a mix of measurement approaches, coherent in its link to sophisticated models of learning and its alignment with the curriculum, and continuous in linking each student’s performance over time.