Standardized tests offer schools an uncomplicated way to gather information on large numbers of students.
The tests may provide a wake-up call to many schools and teachers who have not adapted significantly in recent years to meet the needs of today's students. Low scores ("hard" data) can jump-start new programs and heighten staff development in the area of reading. We have known for years that many students are poor readers. But without the low test scores (regardless of whether the test is valid), reading is not a priority for many school boards.
Of course, the tests are convenient and cost much less compared to other forms of assessment. Machine-scored tests avoid several things: the need for time-consuming grading by hand, the cost of hiring people to grade the tests, and the subjectivity of different graders in their assessment of written (authentic) answers. Unlike authentic assessments, standardized tests are given under the same conditions for all students and use mainly "cut and dried" right and wrong answers. The scores of students can be objectively compared against each other, whereas authentic assessment depends much more on the subjective evaluation of, and often the relationship with, the person doing the assessment. Portfolios, for example, have no solid “ruler” by which to measure quality or increments of growth. Most students are judged to be average or above. Portfolios often compare a student's current performance with earlier performance.
Standardized tests prepare students for future standardized tests, such as the SAT, which are used for college admissions.
Limitations of standardized tests
The price our students pay in the name of high-stakes testing convenience and accountability is extremely high. Limitations include:
Useful Information
They do not provide useful reading information for individual students. Percentile scores (which are often simply added up to equal one score) on a student’s abilities in Synonyms, Interpretation, and Process Strategies do not tell us anything. The questions can never be reviewed and useful information, such as why the student chose one answer over another, is not provided.
Authenticity
When real people read, they do not do so for the purposes of answering tricky multiple choice questions. They read for enjoyment, to learn something new, or to accomplish a task. Many higher-level thinking tasks that are triggered by reading, such as synthesizing information and problem-solving, cannot be tested effectively in the easy-to-score multiple choice format.
Validity
Any reading assessment can only scratch the surface of the complexity of reading. Yet high-stakes tests somehow have been enthroned as the assessment device for reading. We assign numbers to the different dimensions of reading for the purposes of convenience, uniformity, and comparison. Our society’s need for the “standard” student blinds us to the reality that students learn to read in different ways and have multiple ways of using, processing, and responding to what they read.
Lower-order thinking
Particular reading skills and sub-processes that are easier to measure are given inflated importance, while the harder-to-define processes are less valued. For instance, grammar errors and vocabulary knowledge are relatively easy to assess with multiple choice tests. Yet the more comprehension-related processes, such as making inferences and interpretations, self-questioning, cognitive flexibility, and analysis, are more difficult to assess, and are therefore considered too unreliable to include in many tests.
"Teaching to the test"
As standardized tests gain importance in student and school evaluations, many educators arrive at the conclusion that they must “teach to the test.” Teachers are torn between teaching authentic reading and bowing to the pressures of administration and parents to raise test scores. Efforts to teach to the tests have often led to curricula based on transmittal of isolated and decontextualized facts, promoting student passivity and rote memorization.
Control over curriculum
Teaching to the test also takes the control of instruction and curriculum design out of the hands of local schools, students, teachers, and parents, and puts it into the hands of policy makers, government officials, and the large, commercial test companies.
What standardized tests miss
Tests cannot reveal students' ability to construct a proposal, analyze issues, synthesize ideas, hypothesize, or compose many things that our society values. Other missed factors important in education are behavior, perseverance, attitude, attention, social skills, attendance, and communication skills. Now, with the High School Exit Exam (CA), will attendance or behavior or social skills matter at all?
Higher-order thinking,
Working well with other people, dedication, complex problem-solving, and patience are not on most tests. Few multiple choice tests are used in the real world of work to determine whether a worker is fired or promoted. Ongoing performance reviews, however, are commonplace. Why can’t education learn from the real world in this sense?
Motivation
Then again, quite a few students just don’t care. When you come to a passage that is much longer than it is interesting, why not bubble in a few b’s and c’s just to make it look as if you tried? Some of the brighter students, the ones that would do well on the tests, are bright enough to know that the scores will not affect their grades or job security.
Standards
Another limitation is the extent to which the standardized test actually matches the standards of a student’s classroom, school, and state. Standards and their interpretation can and do differ greatly across state lines, between schools, and within schools.
Standardized biases
Many of the students who have difficulty comprehending passages in the tests also have cultural and linguistic backgrounds that differ significantly from the instructional and assessment paradigms that exist in U.S. schools (Walqui, 1999). Most standardized tests are full of cultural and linguistic biases which, while they may be subtle, are strong enough to throw a student off the right answer track. Several biases are listed below.
Language
Language use varies widely depending on context. Because individuals often make different senses of language depending on their personal and cultural histories, test makers make tenuous assumptions that all students will construct the same meaning from the language found on the test. Those who speak different dialects of English are also at a disadvantage because the way in which they communicate a thought may not be the correct way according to the test makers. English learners face severe challenges on tests because they must outperform many native English speakers in order to score at or above the mean. This is difficult to do on tests that are so dependent on language skills. Test content and procedures often suffer from content bias because they reflect the dominant culture’s standards of language function and shared knowledge and behavior.
Culture
The norming process leans toward mainstream culture since national probability samples underrepresent minority group samples. Moreover, the items on which low-scoring students score comparatively well tend to disappear from the final versions of most tests. The correct answer on a test often contradicts the values, practices, and beliefs prized by the English learner’s culture. It is unfair to evaluate students who were socialized into different cultural and linguistic practices, norms, and beliefs on their ability to use another culture’s practices, unless they have explicitly been taught these practices.
The attempt to validate the existence of linguistic and cultural bias in tests is difficult, due to the invisible quality of many central aspects of culture. Standardized assessments often combine all minority students in one group category, entitled “nonwhite children,” without regard to experiential, cultural, language, or dialect differences. If we are encouraged by the state and teacher education programs to provide culturally sensitive teaching, what about culturally sensitive assessment?
Consequences
Another type of bias is that of consequence, referring to the effects of the test on the lives of the students. In addition to the stress and humiliation of taking such challenging tests in another language, students tend to be labeled by school and government officials who base their assessment of students’ educational progress, reading abilities, and even intelligence, mainly on standardized test scores. As a result, English learners are overrepresented in lower-track courses, leading to cycles of low expectations and low achievement.
Grammar focus
A great number of grammar-based test items are answered correctly by native speakers simply from their sense of “It sounds right.” This sense comes from many years of listening and thinking in a language. Spelling, in the same way, comes from many years of reading and seeing the words spelled correctly. Diverse students, however, lack these many years of reading and listening that engraved correct nuances of English in a long-term language subconscious storage. The rules, which we often try to teach in preparation for tests, do the students little good, because there are too many rules and too many exceptions for a brain to handle consciously. And even if the rules are learned, most of the exceptions end up on the test.
The SAT-9 test is surprisingly weighted in the area of grammar. Apart from the 50-question section on comprehension, the remaining five sections focus on vocabulary knowledge, spelling, grammar use, paragraph analysis, and study skills. An error in spelling or grammar is easy to test, but fails to assess the most important dimensions of reading, some of which are found in the Dimension List.
A student's reading ability has little to do with his or her recall of the countless grammatical nuances and quirks of a language such as English. One may understand a difficult passage in English, but then not be able to recognize subtle mistakes or word meaning differences. Idiom use is another often-tested area that takes years to develop, requiring much semantic flexibility (e.g. "on the back burner") and significant cultural knowledge for appropriate use. Similarly, multiple-meaning words and their use in proper contexts are learned after many years of listening and reading.
Background Knowledge
It has been shown repeatedly that readers with higher prior knowledge that is consistent with the content of the text tend to recall and comprehend better than those with less consistent knowledge. Other studies indicate that good and poor readers’ abilities to recall and summarize did not differ significantly if the groups were similar in their levels of prior knowledge (Recht & Leslie, 1988, quoted in Leslie & Caldwell, 1995). Background knowledge is also strongly linked with socioeconomic status. Poorer families usually have fewer books, games, resources, and academic support at home.
Background knowledge is also linked with interest and motivation. If knowledge is low concerning a certain topic, reading will be more difficult and perhaps less worth the mental energy than a more familiar topic. In addition, some of the test content may not correspond to the content of a student’s classes in school. It may be dependent on knowledge gained from activities outside of school, such as participating in community events, watching TV, playing games, reading, or listening to the radio. If a diverse student lacks these opportunities to learn outside of school, his or her scores may suffer.
Schema
Another concept within the realm of prior knowledge is that of schema. Schemata are mental frameworks that store information about situations and events in life. In our minds, we have schemata for birthdays, restaurants, court cases, games, stories, etc., upon which we depend in order to supply information to make sense of a text or events in life. Students from other backgrounds often have very different schemata of daily events, text structure, and even story structure. Research supports the idea that the absence of the content schemata appropriate to a particular text can cause processing difficulties.