Will simulations be the next form of standardized testing?
There has been much talk in recent years about the use of simulations
and gaming in education, both for children and adults. The best
educational simulations and games —we are told— embody ‘active
learning’ (learning by doing, or the formation of knowledge through the
subjective cognitive experiences of the learner as opposed to the
passive consumption of information or facts). They also provide a safe
environment for testing problem-solving techniques without the risks
that we encounter in the ‘real’ world.
Talk about the use of simulations as a method of assessment is more
prevalent in the corporate training world than in K-12 education (see this example from Microsoft), but the application of simulations for testing seems to be an obvious one:
Simulations are expanding the computer-based testing horizon. They’re delivering benefits across the board that are ushering in the next generation of testing. Test-takers benefit from simulations because simulations assess skills, not just knowledge. Further, simulations provide a higher level of test security because the exam is not simply constructed with multiple-choice questions that may be memorized and exposed. (Wenck, 2005: Simulations: The Next Generation of Testing)
I would like to explore some of the implications of using simulations
as a means of assessment. While simulations are often presented as the
antithesis of old methods of evaluation, I would like to warn against
uses of simulations that merely replicate, with some modifications, the
norms of traditional testing. Specifically, I want to examine the way
in which both traditional testing and simulations shape the learning
process by normalizing values and creating expectations of how things
ought to work outside of the learning environment.
Any type of learning or assessment activity conditions in some way our understanding of the world. In other words, tests and simulations predispose us in a particular way towards reality. Testing that asks students to select a ‘best option’ (whether simple multiple choice quizzes or sophisticated computer simulations) can be seen as the last step in confirming that the learner has assimilated the worldviews suggested by the options in the test itself, and that she implicitly accepts the test as the only viable method for evaluating knowledge in that instance (from this perspective, it is inconsequential whether the student fails or passes the test, as long as she is exposed to the kinds of expectations that the test creates). I want to question what role technology plays in this process, specially in regards to simulations. But before doing that, I want to explore the notion that tests come with an attached worldview a bit further.
Testing as indoctrination
Testing normalizes attitudes towards the world. This is perhaps most visible in problem-based learning, where a situation from the ‘real’ world is used to measure skills learned in class. I am suggesting that the function of testing is not only skill evaluation, but the standardization of a worldview embedded in the test, making the situation represented in the problem seem natural. Consider the following example: Mahmood Mamdani (2004) relates how in the 1980’s and 1990’s the University of Nebraska, with a $50 million grant from USAID (a federal agency), developed textbooks for children in Afghanistan. Some of the test questions in thiese textbooks are worth looking at:
A third-grade mathematics textbook asks: “One group of maujahidin [guerrillas backed by the U.S., later to become the Taliban] attack 50 Russians soldiers. In that attack 20 Russians are killed. How many Russians fled?” A fourth-grade textbook ups the ante: “The speed of a Kalashnikov [a machine gun] bullet is 800 meters per second. If a Russian is at a distance of 3200 meters from a mujahid, and that mujahid aims at the Russian’s head, calculate how many seconds it will take for the bullet to strike the Russian in the forehead” (p. 137, my notes in brackets).
This example, I assume, is shocking to most of us because it is such a transparent attempt at indoctrination (although it might be less or more shocking depending on one’s knowledge of U.S. foreign policy). But what about the tests our students take everyday? What kind of indoctrination is going on there? The process might be too transparent for even the test designers to notice, but this does not mean our tests don’t have a worldview to push.
What do tests really measure?
Testing not only normalizes attitudes but, like I said earlier, it requires the implicit acceptance from the learner (and society) that the test is the most reliable method to measure how well knowledge can be applied in that instance. There is only one worse thing than flunking the test, and that is to refuse to take the test at all, as there are often no alternatives to certification. But are tests really an accurate indicator of competency?
There is currently a lot of debate over this point. The No Child Left Behind (NCLB) act (which, as with many policies of the current administration, does exactly the opposite of what its name suggests) has placed great importance on standardized testing as a way of determining the success rate of students, teachers, and schools. However, what administrators think the tests measure and what they actually measure (referred to as the validity of a test) might be two very different things. Unfortunately for students, this discrepancy is resulting in many of them being precisely ‘left behind.’
The measurement validity of a test is an extremely important concept. Measurement validity simply means whether a test provides useful information for a particular purpose. Said another way: Will the test accurately measure the test taker’s knowledge in the content area being tested? … If tests are going to be used to determine which students will advance and what subjects schools will teach, it is imperative that we understand how best to measure student learning and how the use of high-stakes testing will affect student drop-out rates, graduation rates, course content, levels of student anxiety, and teaching practices. (Appropriate Use of High-Stakes Testing in Our Nation’s Schools)
Of course, measurement validity is something that needs to be assessed for every test. But at the macro level, the issue is not only whether individual tests are valid or invalid, but also how the increasing emphasis on testing (an emerging culture of testing, so to speak) is creating an environment in which testing itself determines what students should learn. As W. James Popham (2001) describes:
Because today’s educators are under such intense pressure to raise their students’ scores on high-stakes tests [tests which determine whether a student advances to the next year, for example], we are witnessing a nationwide diminishment of curricular attention toward any subject that isn’t included on a high-stake test. As many beleaguered educators will comment, “If our job is to raise test scores, why waste time teaching content that’s not even tested?” (p. 19; my notes in brackets)
Enter the Simulation
This situation, which is bad enough as it is, may not be necessarily corrected by the use of simulations for assessment purposes, even while educators may think that by using simulations they are breaking from the shackles of traditional testing. This is because simulations are, after all, a form of testing. Simple simulations provide a limited number of options from which the user must choose. More advanced simulations provide more options, but all simulations —even those in which options are generated through some sort of AI algorithm— have a limited universe of options. Some of those options lead to outcomes that are more favorable than others. The goal of the person going through the simulation is to find which combination of choices, in response to the variables presented by the simulation, lead to the desired outcome.
Furthermore, simulations replicate the less obvious characteristics of traditional tests I have outlined so far:
- Simulations normalize attitudes. Even the most sophisticated simulations limit the number of possible responses, and in thus doing shape a view of the world in which the application of knowledge is limited to those responses.
- Simulations demand implicit acceptance as valid instruments. Whether a simulation meets the requirements of measurement validity or not is a moot point once it is being used as the main or only method of certification.
- Simulations determine curriculum and teaching practices. Instead of teaching to the test, teachers may begin teaching to the simulation.
There is another important aspect to this issue: teachers (most often) don’t design simulations, software companies do. Take, for instance, the following list of evaluation criteria for tests that Popham (2001) prescribes:
- Curricular Congruence. Would a student’s response to this item, along with others, contribute to a valid determination of whether the student has mastered the specific content standard the item is supposed to be measuring?
- Instructional Sensitivity. If a teacher is, with reasonable effectiveness, attempting to promote students’ mastery of the content standard that this item is supposed to measure, is it likely that most of the teacher’s students will be able to answer the item correctly?
- Out-of-School Factors. Is the item essentially free of content that would make a student’s socioeconomic status or inherited academic aptitudes the dominant influence on how the student will respond?
- Bias. Is the item free of content that might offend or unfairly penalize students because of personal characteristics such as race, gender, ethnicity, or socioeconomic status? (p. 94)
What opportunities might teachers have to make corrections to address these issues in simulations that they themselves have not created, and have no opportunity, copyright or skills to modify?
Rules of Reality: The tester’s mindset
If testing normalizes attitudes, why might this be a bad thing? My argument is that simulations perpetuate the mechanistic, reductionist and linear (cause-effect) thinking that traditional testing institutes. Problem solving (assessed through tests or simulations), requires the kind of mindset that Peter Bentley (2003, The meaning of code [PDF], in Ars Electronica 2003) associates with the skill of writing computer code: “Code is so literal, so unambiguous, that it takes a while to train a mind to think in the same way,” states Bentley, and I would argue that in fact this type of testing is part of the preparation for developing these particular skills:
You become used to breaking down problems into smaller, easier parts. It becomes natural to think in this way, whether working out how to build a robot, or how to climb down from a tree. Good programmers are natural problem-solvers, for this is how we write code. But code can also dehumanise a person. There is no subtlety, no humour, no scope for emotion in code. (Bentley, 2003)
Put simply, testing is a way to ‘leave behind’ those who cannot think like problem-solvers, or at least a particular kind of dehumanized problem solver. Can it really be that those meant to succeed in our educational systems are those that manage to unlearn subtlety, humor and emotion?
Reality Rules: Alternative use of simulations
I am not trying to suggest that there is no room in learning for computer simulations. Instead, I have so far warned against the use of simulations for testing purposes only. Now, I would like to take my argument a step further and suggest how I think simulations should be used in learning.
In essence, I believe that learners should be builders, not consumers of simulations. Students using simulation authoring software (like STELLA) may not be able to produce simulations as sophisticated as those sold by software companies, but the learning that happens in the process might be more meaningful. In fact, the point of having students build their own imperfect simulations is precisely that the simulations should fail. Why? Because simulations are approximations of reality, and in realizing how they fail to capture the complexity of reality, we arrive at a more meaningful understanding of it. Breaking down a problem into parts that can be simulated can indeed be a useful learning activity, but the learning process should not stop there. An assessment of how any collection of variables fails to approximate reality, and a discussion of why and how that is, should be the final and most important part of a simulation or game design activity.
[I would like to insert another comment about the perceived benefits of off-the-shelf simulations. These are often said to provide a ‘safe environment’ in which the learner can experiment with making decisions without costly consequences. This seems to me to be an expensive waste of time. What we should be teaching students is how to communicate better to create that ‘safe environment’ in real life. We are all asked to make difficult and important decisions that no amount of simulations can prepare us for. Instead of thinking of ourselves as individual actors making those decisions in isolation (just like we do in simulations), we should prepare individuals for participating in collaborative processes that difuse the danger of individualist decision making.]
Where to go from here? (technology and school change)
Most public schools are currently dealing with the problems of standardized high-stakes testing, and the use of simulations for testing is not yet an imminent threat. Corporate training and higher ed is probably where evaluative simulations are being used the most, but even there the cost of producing them has prevented widespread use. So why am I making such a big fuss?
I see evaluative simulations as a logical next step in the history of educational technology and testing. Part of the reason standardized testing has taken off the way it has is because technology greatly facilitates the administration and grading of tests, and the tabulation and aggregation of scores. An important consequence of this (as I hinted above) is that, as schools are made to do more with less resources, technology has been put at the service of testing, and assessment decisions have been taken out of teacher’s hands. Consequently, as we have seen, curricular decisions are made based on what the test covers. With simulations, decisions about teaching practices could be equally constrained (not just what should be taught, but how). Is it that hard to imagine a future where teachers of failing schools, as determined by the NCLB act, are stripped more and more of teaching responsibilities and become mere monitors of students sitting in front of government-approved simulations (developed by the same companies that now develop standardized tests)? Given the current emphasis on standardized testing, cost-savings and efficiency, I am afraid this is not such an outlandish scenario.
Unfortunately, our fascination with technology may sometimes divert from our efforts to improve learning and change things at schools. To talk about computer simulations and video games in education is trendy. But all the talk of ‘everything bad is good for you’ seems to focus attention on the role of students as consumers, not producers. People in education who want to seem cutting edge feel obliged to make a nod to computer games and the increasing technological savvy of students. Indeed, there is much that is good about the new technologies, but this should not lead us to adopt an uncritical stance when it comes to incorporating technology into the learning process.
In a recent interview, Deneen Frazier Bowen talked about the results of a research project she undertook at Bell South:
Our report showed that although teachers increased their technology skills and technology integration in the curriculum, students saw no changes. For students, using more technology made no difference; the difference they sought was at the design and access levels. Teachers still designed the learning task and only provided access to those technologies with which they were comfortable. Students seek a change in process, not just the automation of a traditional one. (Morrison & Frazier Bowen, 2005, Taking a Journey with Today’s Digital Kids: An Interview with Deneen Frazier Bowen)
Simulations are not going to motivate students if all they do is replace traditional testing. Learning activities that involve building simulations or computer games can be a way to involve students in curricular design, but this type of activity needs to be contextualized by an analysis of how the simulations or games we create fail to approximate the complexity of reality. Only if this is achieved will we be preparing students for a more meaningful engagement with the world.
Offline References (all others hyperlinked within the text):
Mamdani, M. (2004). Good muslim, bad muslim: America, the cold war, and the roots of terror (1st ed.). New York: Pantheon Books.
Popham, W. J. (2001). The truth about testing: An educator’s call to action. Alexandria, Va.: Association for Supervision and Curriculum Development.
MIT OpenCourseWare: Computer Games and Simulations for Investigation and Education
Centre for Advanced Learning Technologies: Simulation & Games for Education