We Are Not Ready to Assess History Performance

We Are Not Ready to Assess History Performance

By Richard Rothstein

Americans have never considered learning history to be an end in itself. Instruction about history and the development of political institutions (civics) has always been justified as an exercise that would produce better citizens, however blandly defined. Yet educators have never successfully explained how the content of history or civics curricula promotes the stated goal of good citizenship. Even when educators duck this problem of means and ends and treat history and civics instruction as an end in itself, they have no realistic expectations of what students should learn. Most definitions of student proficiency are corrupted by nostalgia for alleged past achievement levels that never existed.

What is more, educators cannot make up their minds about whether history instruction (and hence its assessment) should be “broad” or “deep.” This is also true in other fields (it has become commonplace for experts to say, for example, that mathematics curricula are “a mile wide and an inch deep” and are thus flawed), but the problem is compounded in history because ideological and political disputes distort pedagogy. All partisans insist that they want both depth and breadth, but in practice the Left prefers depth and the Right prefers breadth. Since nobody is satisfied by assessments that inevitably take sides in this dispute, these conflicts cannot be resolved by educators alone.

In this essay, I will discuss these problems—lack of clarity about the purpose of history instruction, fanciful definitions of student proficiency, and debates about historical facts versus historical thinking—in that order. I conclude that we are not ready to assess history performance.

Standards and Outcomes

In 1994 Congress adopted eight education goals for the year 2000. They included having all young children ready to learn, raising the high school graduation rate to 90 percent, and becoming “first in the world” in mathematics and science. In social studies, students were to “demonstrate competency over challenging subject matter in civics and government, economics, history, and geography, so they may be prepared to exercise the rights and responsibilities of citizenship.” The naïveté (or cynicism) of the endeavor was breathtaking. Unsurprisingly, the goals had been barely approached by the deadline. With regard to preparing youth to exercise citizenship, voting participation of recent graduates remains embarrassingly low. Observers also bemoan low levels of participation in the voluntary institutions of civil society. 1

Predictably, assessing the fulfillment of such unrealistic goals has also been flawed. States have set high academic standards with little consideration of how they relate to desired outcomes. For example, what evidence is there that students who meet a state’s history standards will be more likely to exercise the rights and responsibilities of citizenship? Partly because states never face up to this question, their tests that purport to uphold high standards actually measure lower levels of achievement. States whose standards call for students to assess multiple perspectives regarding historical controversies administer tests that require little more than recall of names and dates or the placement of history passages in context. And state accountability systems cannot fairly distinguish whether students are making progress even toward the lower objectives that tests implicitly adopt.

Imprecision in thinking about the goals of history and civics instruction has its own historical roots. In the mid-nineteenth century, the education reformer and public school advocate Horace Mann argued that if children could appreciate political history, they would later exercise their franchise only for the common good and without “caprice, wantonness, malice or revenge.” But in Mann’s era as today, adults did not agree about constitutional propriety or about the motives of historical figures. Mann worried that if schools taught political history, partisan indoctrination could result. His solution was glib—schools should teach only interpretations on which everyone agreed—and he never wondered whether such a sanitized and boring curriculum would motivate active and responsible citizenship. This conundrum still haunts history and civics instruction today. We expect teachers to discuss highly charged issues without straying from consensus viewpoints. Our inability to improve on Mann’s solution helps explain why contemporary curricula and assessment of them fail so miserably. 2

Mann’s instrumental view has been a recurring theme; state constitutions and statutes now require that good citizenship be a central goal of public education. In 1893, the National Education Association (NEA), an organization of public school teachers, appointed the Committee of Ten that made sweeping recommendations for higher academic standards, including increased time for history and government courses to prepare students for “the exercise at maturity of a salutary influence upon national affairs.” In 1918, a new federal government commission reacted to the Committee of Ten’s perceived elitism, announcing “cardinal principles” for adapting schools to a more socially inclusive student body. But the cardinal principles concurred with the earlier view of history instruction, adding a political objective, namely, that knowledge of history ought to breed loyalty: “While all subjects should contribute to good citizenship, the social studies—geography, history, civics, and economics—should have this as their dominant aim…. History should so treat the growth of institutions that their present value may be appreciated.” 3

In the American High School Today (1959), the former Harvard University president James Bryant Conant urged the tracking of talented math and science students into a college preparatory program. But for history and civics (he called it social studies), Conant insisted that all students should discuss and debate in heterogeneous classes, to become “intelligent voters, [who] stand firm under trying national conditions, and [who will] not be beguiled by the oratory of those who appeal to special interests.” Certainly, one observer’s special interest is another’s commonweal; as Mann anticipated, history instruction can easily become a form of political indoctrination. 4

In more recent memory, the Reagan administration’s report, A Nation at Risk, insisted that three years of high school history and other social studies “is requisite to the informed and committed exercise of citizenship.” 5

Nonetheless, instruction in history and civics, at least as they have been taught for the last one hundred fifty years, has had little apparent impact on the quality of citizenship. No research ties any curricular program to subsequent voting participation. A scholarly consensus is that “the formal civics curriculum or its equivalent is all but irrelevant to
citizens’ knowledge of and engagement with politics.” Yet domestic and international tests show that students who are actually taking a civics course, or took one recently, do better on tests of civic knowledge, have more tolerant attitudes toward dissent and minorities, and express greater intentions to participate in politics as adults. But the effects are weak and fade rapidly. Students who previously took civics but are not currently enrolled in a class seem to do no better on civics exams than students who never studied the subject. 6

Some educators believe that the solution is to include in civics instruction not only the history and structure of political institutions but also a direct involvement of students in public life. Community service programs produce better civic attitudes and knowledge, but here too positive effects may fade quickly. Some studies find longer-term impacts but cannot detect independent effects: students who take community service classes may also be those more likely to score well or to participate as adults, absent a school program. Even if service learning does have sustained effects, these may not be politically neutral. For example, implicit in most service learning is a preference for individual charity, not political action. Students are more likely to collect food for the poor than to campaign to increase the minimum wage. 7

Similarly, students whose teachers encourage them to express opinions in class have more positive attitudes toward participation in politics than students whose teachers mostly lecture. But whether students whose classrooms were participatory actually participate more as adults is unknown. 8

Extracurricular activities—student government, for example—are also associated with greater political knowledge and confidence in the ability to influence public life. Adults are more likely to participate in civic, service, church, and professional groups if they were members of 4-H or similar service clubs in high school. And if adults participate in voluntary organizations, they are more likely to vote. So there is at least an indirect correlation between participation in extracurricular activities and voting behavior. But adolescent club membership may not have influenced later participation. Attributes that dispose students to participate in 4-H and similar clubs may also be those that lead to adult voting and other public engagement. 9

Because it is impossible to assign causality in these assessments, the only sure way to answer questions about the efficacy of school programs is to conduct scientific experiments, and we have not done so. In a field trial of community service, for example, some schools might be randomly assigned to receive funds for community service while others receive none. Researchers could then track graduates to see if they participated differentially, as adults, in public life. Similar randomized experiments could also be performed that test alternative history curricula and pedagogies. Careful experiments would require studies of a decade or more to track the long-term effects of school treatments. Yet we have wondered about the efficacy of history instruction, civic education, and community service for decades—if we had undertaken such experiments twenty or thirty years ago, we would have much better information by now.

Although there is no evidence that history knowledge leads to better or more loyal citizenship, the relationship seems plausible. But there are reasons to be cautious. Consider that white students get higher scores on tests of history than black students. But black students are more than four times as likely to discuss the national news with their parents as white students are. Which is a better measure of the effectiveness of instruction: test scores or an inclination to discuss public affairs outside the classroom? 10

Further, good citizenship and loyalty are, to some extent, not objective characteristics but attributes defined by partisan objectives. Blacks, who score lower on tests of history knowledge than whites, tend to vote for more liberal candidates than do whites. Children from affluent families score higher on tests of history than do children from working-class families. But upper-income voters tend to be more socially liberal and economically conservative than working-class voters. Are the wealthy better citizens? Participation and political values are related. It is inconceivable that educators can promote the former while maintaining neutrality about the latter. 11

Two reports from 2003 illustrate how politicized debates about history and civics curricula must inevitably be. First, the Carnegie Corporation issued The Civic Mission of Schools, which called for more “instruction in government, history, law, and democracy” to enhance “young people’s tendency to engage in civic and political activities over the long term.” The report also called for more student participation in community service programs, extracurricular activities, school government, and “simulations of democratic processes and procedures.” But Chester E. Finn Jr., of the conservative Thomas B. Fordham Institute, attacked the report for placing too much emphasis on preparing students for “influencing public policy and engaging in political activity” as opposed to nonpolitical volunteer work. 12

Several months later the Albert Shanker Institute issued “Education for Democracy,” a manifesto signed by such notables as Sandra Feldman, president of the American Federation of Teachers; Lee Hamilton, director of the Woodrow Wilson International Center for Scholars and former U.S. congressman; president of the Core Knowledge Foundation E. D. Hirsch; the former U.S. secretary of education Richard Riley; and Ben Wattenberg, senior fellow of the American Enterprise Institute for Public Policy Research. The statement lamented American students’ lack of history knowledge and found that this shortcoming led to what it regarded as insufficient national unity after September 11, 2001. Yet the manifesto acknowledged that “terrorist movements such as Peru’s Shining Path and Italy’s Red Brigade were drawn heavily from the ranks of university students and the professoriate. Students were part of Hitler’s vanguard…. most of the leadership of al Qaeda are university graduates, many of them educated in the West.” 13

In any nation—the United States is no exception—political extremists may know more history and achieve higher test scores than a random sample of their age peers. “We do not have all the answers to this complex phenomenon,” the Shanker Institute concluded, “and we may never have.” Lynne V. Cheney, wife of Vice President Dick Cheney and former head of the National Endowment for the Humanities, has been prominent in urging more study of history so that young people will “appreciate how greatly fortunate we are to live in freedom” and, by implication, be more supportive of the response of George W. Bush’s adminstration to the terrorist attacks. Many of us who are highly educated share Mrs. Cheney’s conceit that if only young people had more historical insight, they would share our political viewpoints. But she could be more cautious
—sophistication sometimes evolves into cynicism. At any rate, the study of history needs better than patriotic justification. 14

In sum, there is no agreement among educators about what good citizenship means, and no satisfactory record of measuring how any definition of citizenship is affected by history or civics instruction. Instead we have chosen to evaluate only the achievement of short-term goals that may have little relation to the outcomes we claim truly to want.

Measurement of Knowledge

In place of measuring the impact of academic courses on adult citizenship habits, we try to measure the immediate knowledge students gain from history and civics courses. Such assessment is made difficult by preconceptions about performance derived from unsubstantiated folk wisdom about the past.

In 1987, Chester E. Finn Jr. and the education historian Diane Ravitch published What Do Our 17-Year-Olds Know?, a book that continues to influence debates about teaching history. Examining results from the National Assessment of Educational Progress (NAEP), Finn and Ravitch concluded that students knew little history—most seventeen-year-olds, for example, said the Civil War took place before 1850. Lynne Cheney has cited similar embarrassments—such as the students surveyed in 1999 who believed that Ulysses S. Grant had commanded troops at Yorktown. The Shanker Institute’s “Education for Democracy” asserted that there has been a deterioration of history knowledge, evidenced by the most recent NAEP test, where, for example, few eighth graders understood the principle of checks and balances. But the institute offered no evidence that eighth graders had a better understanding of this principle in the past. The statement simply repeated an anecdotally substantiated claim from the historian David McCullough that students “have less grasp, less understanding, less knowledge of American history than ever before.” 15

Ignorance of history, the document states, shows the need to “strengthen schools’ resolve to consciously impart to students the ideals and values on which our free society rests.” But disappointment that contemporary students know so little history is not new. A 1917 Journal of Educational Psychology article mourned that high school students could answer only a third of the historical questions that teachers said every pupil should be able to answer. In 1943, the American Historical Association, the Mississippi Valley Historical Association, and the National Council for the Social Studies jointly surveyed the American history knowledge of high school students, soldiers, social studies teachers, and adults, including members of business clubs and religious organizations, and a sample of people listed in Who’s Who in America? Like many today who believe that the September 11 attacks make the study of American history more urgent, these social studies organizations were motivated by a belief that greater knowledge of American history would enhance the national unity needed for victory in World War II. The organizations gave a multiple-choice test of 65 questions. High school students who had not studied American history got a median score of 21.3 questions correct. Those who had studied American history did only a tiny bit better; their median score was 22.8 questions correct—not a ringing endorsement of the value of history classes, at least as such classes were taught in the 1940s, in stimulating factual knowledge. The military group, made up of soldiers who were also enrolled in college training programs, got a median of 29 questions correct. The Who’s Who group did better with 44 questions correct. But even this group had embarrassing lapses—70 percent of them thought Thomas Jefferson helped frame the Constitution. The survey concluded: “The fact is that many well-informed, useful, successful, and even distinguished persons cannot answer 75 percent of the items.” 16

As for college freshmen during World War II, all of whom had presumably studied American history in high school, an exam revealed that “a large majority” could not adequately identify Abraham Lincoln, Thomas Jefferson, Andrew Jackson, or Theodore Roosevelt. This, reported the New York Times, revealed a “striking ignorance of even the most elementary aspects of United States history.” In 1944, the Times won a Pulitzer Prize for this exposé. But memories are short. Today, we call those students “the greatest generation.” 17

The NAEP now finds that only 11 percent of high school seniors are proficient in history. But if “proficiency” is a standard that has never been met, domestically or internationally (U.S. students’ knowledge of constitutional principles and government practices is on par with comparable knowledge of students from other industrialized nations—and better than most), is it meaningful to claim that only 11 percent of American students are proficient? 18

In the United States, the National Assessment Governing Board (NAGB) defines proficiency, using citizen panels to decide which NAEP questions students should be expected to answer correctly. But if informed opinion is detached from reality when it comes to judging performance in earlier eras, we should have little confidence that panelists can be trusted to decide what students can reasonably be expected to know today.

In 1993 the General Accounting Office (GAO) charged that the federal government utilizes these standards of proficiency, despite their lack of technical validity, for political reasons: “the benefits of sending an important message about U.S. students’ school achievement appeared considerable.” Confirming the GAO’s conclusions, a study panel of the National Academy of Education, a prestigious organization of notable scholars from a variety of education-related fields, concluded that the procedure by which achievement levels had been defined was “fundamentally flawed” and “subject to large biases,” and that standards by which American students had been judged deficient were set “unreasonably high.” 19

The most recent NAEP history report includes a disclaimer, urging the public not to take proficiency levels too seriously and acknowledging that scientific panels have concluded that NAEP results should “be interpreted with caution.” But ignoring its own warning, the report proceeds to announce, without further qualification, that only 18 percent of fourth graders, 17 percent of eighth graders, and 11 percent of twelfth graders are proficient in history. These “facts,” not their unscientific basis, become part of our folk wisdom about student performance. 20

Such conclusions about miserable performance, in history and in other subjects, are not consistent with other data on student achievement. For example, the NAGB concludes that only 1 percent of U.S. twelfth graders achieve “advanced” status in U.S. history. Yet last year, about 3 percent of all American youths who were old enough to be eleventh or twelfth graders passed Advanced Placement exams in U.S. history, designed to measure college-level mastery. Only a minority of high schools e
ven administer advanced placement exams. If more did so, the share of the total youth population with advanced scores would likely be much higher. 21

Every state now sets its own proficiency standards in most subjects, using methodologies similar to that of the NAGB. Consequences can be ludicrous. In Massachusetts, only 28 percent of eighth graders were deemed proficient on the state science exam in 1998, but the state’s science achievement was as good as or better than that of every nation polled except Singapore. In North Carolina, only 20 percent of the state’s eighth graders were deemed proficient on the NAEP math test, but over two-thirds of the students were called proficient on a state-administered exam. The difference was in the subjective judgments about proficiency made by different groups of panelists, using almost identical definitions. 22

Depth Versus Breadth

With no information about whether instruction produces good citizenship, and with no valid benchmarks for acceptable performance, clichés substitute for choices about history and civics curricula and assessment. Everyone seems to agree that students need factual background to provide historical context plus the ability to think critically and creatively about the problems that Americans and people in other nations faced in prior eras. It is not possible to craft standardized tests that assess the latter skill, so we assess only the former. Such assessment can be done cheaply.

Critics who stress the importance of basic facts in history instruction want students to emerge with more patriotic attitudes. Federal law now endorses teaching “traditional American history,” but even critics acknowledge that older, conventional history texts glossed over events of which Americans are no longer proud—slavery, annihilation of Native Americans, racial segregation, discrimination against immigrants, women, and minorities, colonialism in Asia and Latin America, and the conduct of the Vietnam War. Proponents of “traditional” history cannot help but recognize that students who are taught a sanitized version of history may become cynical rather than patriotic. But the solution proposed by these critics solves little. They want history teachers to emphasize how successfully Americans have overcome these shameful events, by teaching, for example, about the victories of Martin Luther King Jr. This is simply a more nuanced triumphalism; it does little to help educators decide how much emphasis to give to blemishes that were erased and how much to give to those that withstood reform.

There is also no consensus on the issue of depth versus breadth. Should teachers delve deeply into a few historical problems? Or should they acquaint students with key events across the span of history? The Shanker Institute, for example, “reject[s] the educational theory that emphasizes ‘learning skills’ over content” but then goes on to insist that history teachers should avoid “just one long parade of facts.” 23

We cannot fairly assess history knowledge without resolving those curricular problems, because assessment should be aligned with what students are actually taught. There is no right way to teach history. Students will not remember facts without a context but cannot comprehend context without knowing facts. Most of us learn facts only in the process of answering questions we find important. So good instruction draws students into historical debates and controversies that they cannot solve unless they master the details of historical actors and events.

This renders standardized assessment impossible. Students can be drawn into only so many controversies. Even with good instruction, students will learn only those testable facts that underlie the specific controversies that were studied. A national authority could decree, for example, that all students should investigate the motivations of Boston Tea Party participants, making time for this lesson by ignoring, say, the question of whether the abolitionist John Brown’s intervention in Kansas was justified. Then, a national test could be constructed that was aligned with that curricular decision. Without a detailed national curriculum, however, there is no mechanism for making such a decision, and even with such a mechanism, it would be hard to imagine reaching agreement on what topics must be covered or ignored.

So despite superficial consensus that history instruction should have depth as well as breadth, a time-limited test cannot be faithful to this consensus. Teachers who delve into selected controversies will fail to prepare students for standardized tests that expect superficial familiarity with all controversies. Testing inevitably creates incentives to teach history as a succession of relatively meaningless facts. Students who score well on such tests have a certain kind of skill, but it may not be the skill that historical consciousness requires; such students may well forget the facts soon after being tested. Even professional historians rarely remember the details that state standards demand from high school students.

Tests typically pay homage to “higher-order thinking skills” by including questions that ask students to interpret excerpts from historical documents. But because no student can be expected to have studied any particular controversy in depth, requests for interpretation usually require little or no background knowledge. Masquerading as history questions, such items truly assess only reading comprehension. Nonetheless, state education authorities claim they have adopted high history standards (that is, that they expect students to explore complex historical controversies) and that state tests are aligned with those standards.

But properly aligned tests and standards must meet two distinct conditions; states typically fulfill only one. First, every test question should assess a skill actually found in the standards. This often happens, because factual knowledge is included in the standards. Second, every standard should be assessed, and skills should be tested in the same proportion as they are found in the standards. It is this condition that states have mostly failed to fulfill because it is so expensive to test the complex skills that standards demand.

The best state standards, for example, expect secondary students to know how to prepare a research paper that detects bias in sources. This activity requires depth of knowledge; students must apply background knowledge to textual clues. But checking whether students meet this standard requires specialists to grade multiple drafts of research papers. It also requires trusting teacher evaluations of class projects. Instead, states test literacy in history by having students infer what decontextualized document passages mean.

These assessment dilemmas will endure as long as the nation is wed to evaluating instruction by means of standardized tests taken by individual students that are inexpensive to grade. Successful assessment of student performance combines such test data with the judgment (sometimes idiosyncratic) of teachers. Successful assessment of school performance adds to test data the judgment of expert visitors (for example, members of accreditation teams) who can interpret test data in light of a school’s program.

Without more agreement about what history and civics students should know, and, most important, why they should know it, assessment can disclose little about student proficiencies. What is needed now is tolerance for the multiple alternative approaches to teaching those subjects, and an unwillingness to standardize assessment until we better understand what is worthwhile to assess.

Richard Rothstein is a research associate of the Economic Policy Institute and a visiting lecturer at Teachers College, Columbia University. 


