Grading Education: Getting Accountability Right

Teachers College Press and EPI Book

ISBN-10: 0-8077-4939-5
ISBN-13: 978-0-8077-4939-5
Paperback, $19.95, 277 pages, 6″ x 9″

Published by the Economic Policy Institute and Teachers College Press (October 20, 2008)

Read table of contents
Read introduction
Read news release

Table of contents

Introduction

Chapter 1 The outcome goals of American public education

Chapter 2 Weighting the goals of public education

Chapter 3 Goal distortion

Chapter 4 Perverse accountability

Chapter 5 Accountability by the numbers

Chapter 6 Early NAEP

Chapter 7 School boards, accreditation, and Her Majesty’s Inspectors

Chapter 8 An accountability system for schools and other institutions of youth development

Appendix 1 Schools as scapegoats

Appendix 2 A Broader, Bolder Approach to Education

Appendix 3 Goals survey methodology

Appendix 4 Teacher accounts of goal distortion

Endnotes

Bibliography

Acknowledgments

Index

Introduction

Except for the military, Americans devote more resources to elementary and secondary education than to any other activity we undertake in common. Nearly 15% of all tax dollars go to support public schools.¹ We depend on public schools to narrow social and economic inequalities and to ensure that all youth contribute to the health of our democracy and to the productivity of the economy. We expect educators to pursue these ambitions competently.

So it is entirely reasonable, indeed necessary, that citizens should hold educators accountable for effectively spending the funds with which they’ve been entrusted. In this book, we use the term “accountability” to describe the techniques by which citizens and their elected representatives control the activities of those who administer, teach, and serve in public schools and other institutions of youth development. It is not sufficient to insist that spending of public funds is honest and properly reported. It is not sufficient to insist that all the inputs – the resources – be adequate to the task. Nor is it sufficient to insist that educators profess good intentions, or that parents be satisfied with their children’s schools. In education, “accountability,” as described here, requires schools and other public institutions that prepare our youth to pursue the goals established by the people and their representatives through democratic processes, and to achieve these goals to the extent possible by using the most effective strategies available. During the last two decades, the design of such accountability has become the focus of public debates about education.²

We’ve wound up, however, adopting accountability policies based almost exclusively on standardized test scores for reading and mathematics. This book demonstrates why such narrow test-based accountability plans cannot possibly accomplish their stated intent, which is to tell the states and nation whether schools and related public institutions are performing satisfactorily and to support interventions that ensure improvement. To hold schools and other institutions of youth development accountable, information from tests of basic skills must be combined with a wide array of information from other sources, including tests of reasoning and critical thinking and evaluations by experienced and qualified experts who observe schools, child care centers, health clinics, and after-school and summer programs, to determine if they are performing satisfactorily.

In 2002, the federal government adopted an accountability system for public schools, the No Child Left Behind Act (NCLB). It defined the result for which schools should be held accountable as student proficiency in math and reading, and gave states a dozen years to improve their school systems so that all students, including those from disadvantaged minorities, could achieve such proficiency. States revised or adopted their own accountability systems to conform to the federal requirements.

NCLB demanded that schools bring all students to proficiency regardless of how well families and other socioeconomic institutions prepared children to learn. The law required states to test all third to eighth graders annually in math and reading to demonstrate that they were making consistent progress toward full proficiency by 2014. It also required testing once in students’ high school years. In cases where any subgroup – including minority students, low-income students, or those with disabilities – failed to make such progress, the law required districts to permit students to transfer to schools where adequate progress was being made, and it required districts to spend some of their federal aid on tutoring, often provided by private contractors. If subgroups still failed to make sufficient progress, the law required states to intervene in more drastic ways, including closing the schools involved and re-opening them with new teachers and administrators.

No Child Left Behind was an utter failure, and in 2007 and 2008 Congress refused to reauthorize it in anything like its original form. Parents, teachers, school administrators, school board members, and state legislators were vocal about their contempt for NCLB’s consequences. Although many policy activists admired the law’s requirement that schools be held accountable for the performance of minority as well as middle-class white students, few believed that the law succeeded in improving American education – and many concluded that the law did great harm.

Yet despite widespread dissatisfaction with NCLB, Congress has been unable to devise a reasonable alternative and so, as of September 2008, NCLB remains on the books. There have been many proposals for tinkering with the law’s provisions – extending the deadline for reaching proficiency, measuring progress by the change in scores of the same group of students from one year to the next (instead of comparing scores of this year’s students with scores of those in the same grade in the previous year), adding a few other requirements (like graduation rates, or parent satisfaction) to the accountability regime, or standardizing the definitions of proficiency among the states. Yet none of these proposals commanded sufficient support because none addressed NCLB’s most fundamental problem – although tests, properly interpreted, can contribute some important information about school quality, testing alone is a poor way to measure whether schools, or their students, perform adequately.

Many critics have denounced NCLB and similar state accountability policies that are based exclusively on quantitative measures (test scores) of a narrow set of school outcomes (basic math and reading skills). Critics have described how accountability for math and reading scores has inaccurately identified good and bad schools, narrowed the curriculum by creating perverse incentives for schools to ignore many important
purposes of schools beyond math and reading test scores, caused teachers to focus on some students at the expense of others, and created opportunities for educators to substitute gamesmanship for quality instruction. And the critics have described why conventional proposals to “fix” NCLB perpetuate these problems. But some who hear these critiques, while acknowledging their merit, respond: “Do you mean that schools should not be held accountable? What do you think we should do about failing schools? If you don’t like narrow test-based accountability, what is your alternative?”

These are appropriate questions, and this book endeavors to answer them by describing how the public should hold schools and other institutions of youth development accountable for adequate performance.

In one sense, we’ve always had democratic accountability in education. Communities elect school boards, or mayors, who appoint superintendents and who, in turn, appoint other staff and teachers to carry out policies set by the elected offi cials. Voters can and do re-elect or replace board members and mayors. But we’ve become dissatisfi ed with this form of accountability, skeptical that elected school boards are capable of raising performance for all students in all states and localities. Sometimes, even within districts under the supervision of a single school board, some schools are left to struggle while others continue to shine.

Partly, the dissatisfaction is unavoidable. As school districts grew, it became more difficult for elected board members to make judgments about whether schools they supervised were performing well. Most superintendents now supervise such large organizations that they cannot themselves evaluate principals’ or teachers’ effectiveness.

Elected board members, especially those on the first step of a political career ladder, may be more interested in burying bad news about schools than in correcting problems, and so they defer to educators’ interests in preserving established ways of doing things. Low voter participation in school board or municipal elections sometimes gives disproportionate influence to teachers’ unions, other employee organizations, or advocacy groups with narrow agendas, because such groups can contribute campaign funds to board members and municipal officials and then turn out their members to vote.

The mere fact that school board members must seek a mandate from voters in elections has provided an inadequate assurance that boards will hold educators accountable for high student achievement.

This book proposes to supplement school board governance with a public accountability system that avoids both the haphazard and undisciplined oversight of elected board members and the simplistic and corrupting accountability of predominantly test-based systems. A well-designed accountability system can forestall unproductive practice by schools and other institutions of youth development. It can also spur higher-quality practice when expert evaluators advise well-intentioned educators and other youth service professionals about how to improve student outcomes.

The accountability system proposed in this volume has these principles:

• Accountability properly conceived has roles for both federal and state government. The federal government should ensure that states have the means to generate adequate outcomes and should provide sophisticated testing and survey data of representative state-level samples of youth and young adults to show the extent to which schools and supporting institutions have been successful. State leaders can then use this information to identify areas in which the achievement of youth in their state is lacking and, using tests as well as qualitative inspections and evaluations, hold school districts, schools, and other local institutions of youth development accountable for appropriate improvements in performance.

• Because student outcomes are the joint product of families, schools, and other institutions – such as public health agencies, early childhood services, community development, after-school and summer programming – an accountability system should be designed to ensure that all public institutions make appropriate contributions to youth development. When schools are integrated with supporting services, they can substantially narrow the achievement gap between disadvantaged and middle-class children.

• No experts can yet say with assurance how much of a narrowing in the achievement gap is feasible, even with a coordinated effort by schools and their supporting institutions. Thus, an accountability system should avoid establishing absolute outcome goals that the children of all subgroups should meet. Rather, with appropriate supporting services, schools should be expected to improve their outcomes so that students achieve as well as better-performing students elsewhere who have similar background characteristics. Such aspiration ensures continuing improvement, because few schools are so good that they cannot improve to the level of comparable but better-performing schools.

• Adolescents should enter young adulthood with many cognitive skills and non-cognitive qualities – not only strong academic knowledge and skills, but also the ability to reason and think critically, an appreciation of the arts and literature, preparation for skilled work, social skills and a strong work ethic, good citizenship, and habits leading to good physical and emotional health. State accountability systems should ensure that schools and supporting institutions promote all these traits in a balanced fashion, because accountability for only some outcomes will create incentives to ignore others.

• The federal government is too distant from the provision of educational services to be primarily responsible for holding schools and other institutions of youth development accountable. State governments can and should be the vehicles for doing so.

• The federal government nevertheless has an essential role in making accountability for high achievement feasible. It is unreasonable to expect adequate outcomes for youth if their states do not have the funds to provide good schools and other high-quality youth development institutions. Appropriate funds are no guarantee of adequate outcomes; funds can be spent foolishly or inefficiently – that’s one reason why accountability is needed. But appropriate funding is a necessary if not sufficient requirement. For those states that have too few taxable resources to properly support schools and other institutions of youth development, the federal government should provide subsidies so that accountability can be effective.

• The federal government should also develop the capacity to measure the degree to which students and young adults in each state achieve competence in all of the important cognitive and non-cognitive domains. This measurement does not require a federal assessment of every student; rather, it requires a sophisticated sampling system that can generate accurate state-level results, including disaggregated results for minorities and disadvantaged youth. Appropriate state-by-state comparisons of subgroup performance require data on important demographic characteristics so that, for example, states can know whether low-income children whose parents did not finish high school perform better or worse than similar children in other states. With information on how their states perform relative to others, governors and state legislatures can design systems that enable schools and other institutions of youth development to improve, and then hold them accountable for doing so.

• The federal government’s state-by-state measurement of youth outcomes in all important cognitive and non-cognitive domains should be widely publicized. Voters and advocacy organizations in each state require information on the relative performance of their state institutions.
With such information, they can fulfill their responsibility to hold state leaders – governors and legislatures – accountable for improvement where performance is inadequate. And these state leaders can then likewise hold school districts, schools, and other youth institutions similarly accountable. Although the federal government has the unique ability to collect comparative information on the performance of schools and other services across the states, only state leaders can effectively implement the policies needed to improve that performance. There is no practical way for the federal government to ensure standardization of outcomes across states, and there is no reason to believe that Congress or the president are more highly motivated to improve education and youth development than are state leaders. If there was one lesson to be learned from No Child Left Behind, this was it.

• High-quality standardized tests of academic skills could provide necessary information to states about student performance. But not all traits for which schools should be held accountable can be measured by paper-and-pencil tests. A proper accountability system must also have ways to review the total body of student work by, for example, observing young people demonstrating their skills and surveying the extent to which young people engage in the behaviors – good citizenship, for example – for which they are being trained.

• Although an accountability system should ideally be concerned only with outcomes – whether, upon entering young adulthood, adolescents have developed the knowledge, skills, and traits upon which their success as well as the nation’s depends – such accountability for outcomes alone is impractical. If, for example, state leaders discover that young adults have emerged from schools with poor academic knowledge and skills, or poor health and citizenship habits, it will be too late to correct the early childhood or elementary schools that failed, many years earlier, to lay a proper foundation for these young adults’ outcomes. Therefore, state accountability systems must have ways to assess whether teachers and other youth development professionals are engaged in practices that are likely to lead to adequate outcomes many years later. Like the observation of student performance, such assessment requires a corps of expert evaluators who can judge, on behalf of the public, whether schools and other institutions of youth development are embarked upon strategies that are likely to succeed in their missions.

One reason, perhaps the most important, why No Child Left Behind and similar testing systems in the states got accountability so wrong is that we’ve wanted to do accountability on the cheap. Standardized tests that assess only low-level skills and that can be scored electronically cost very little to administer – although their hidden costs are enormous in the lost opportunities to develop young people’s broader knowledge, traits, and skills. A successful accountability system, such as this book proposes, will initially be more expensive, requiring a sophisticated national assessment of a broad range of outcomes, and a corps of professional evaluators in each state that can devote the time necessary to determine if schools and other institutions of youth development – early childhood programs, health and social service clinics, for example – are following practices likely to lead to adult success. But while such accountability will be expensive, it is not prohibitively so. Sophisticated school accountability could cost up to 1% of what we now spend on elementary and secondary education. If we want to do accountability right, and we should, this level of spending is worthwhile.

In the long run, accountability is cost-effective. We now waste billions of dollars by continuing to operate low-quality schools, because narrow test-based accountability can neither accurately identify them nor guide those it identifies to improve. And we waste billions by forcing good schools to abandon high-quality programs to comply with the government’s test obsession. We cannot know how much money could be saved by more intelligent accountability, but it is probably considerable.

In developing these proposals and writing this book, I enjoyed the collaboration of two talented students who, in the course of the project, became valued colleagues. Rebecca Jacobsen is now an assistant professor of education at Michigan State University, and Tamara Wilder is a postdoctoral fellow at the University of Michigan. They each assisted with the entire project, and provided me with important advice and counsel regarding its major themes. Without question, their collaboration rose to the level of co-authorship.

With their assistance, this volume proceeds as follows:

Chapter 1 recognizes that a successful accountability system must first define the outcomes that schools and other institutions of youth development should achieve. It reviews how American leaders have historically thought about educational goals, and it summarizes these traditions by defining eight broad goal areas for which accountability is necessary: basic academic knowledge and skills, critical thinking, appreciation of the arts and literature, preparation for skilled work, social skills and work ethic, citizenship and community responsibility, physical health, and emotional health.

Chapter 2 reports on surveys we have commissioned to determine how the American public and elected representatives today value each of the goals that have been part of our national consensus. These surveys allow us to estimate how relatively important each of the eight broad goal areas should be in a welldesigned accountability system. Our conclusion is that a little more than half of the weight of an accountability system should be devoted to matters that might broadly be termed academic – basic knowledge and skills, critical thinking, appreciation of the arts and literature, and the acquisition of occupation-specific technical skills – while the balance should be devoted to citizenship, social skills, and other physical and emotional health behaviors that we expect young adults to exhibit.

Some policy makers acknowledge that schools should turn out youth with this broad range of knowledge, traits, and skills, but they also say that we are now in a temporary crisis that requires putting all else aside to rapidly develop the math and literacy competence of our youth. This crisis, they say, is one of international competitiveness and, unless Americans’ math and reading test scores improve, we will lose out in a race for economic survival with nations whose test scores are higher. Therefore, they conclude, we should hold schools exclusively accountable for math and reading scores, and wait until later to develop more sophisticated accountability.

This argument has little foundation in economic reality, despite its widespread acceptance. Yet until Americans understand how little foundation it has, there is little hope of mobilizing their support for the kind of accountability system that this book proposes. After all, if the nation is about to be defeated economically because of a basic skills crisis, holding schools accountable for developing social skills, good citizenship, or an appreciation for the arts can only be considered a luxury. So Appendix 1 describes why school performance, however inadequate it may be, is not responsible for the nation’s economic woes.

Chapters 3 and 4 describe why narrow test-based accountability systems (such as state testing programs required by NCLB) have not only failed to improve American education but have caused great harm. Readers who have been immersed in debates about education will find some material in these two chapters familiar, and may choose to skim or skip them. Chapter 3 describes how holding schools accountable for math and reading test scores has created incentives for educat
ors to pay less attention to curricular areas for which they are not held accountable – other academic subjects such as science and social studies, as well as physical and health education, the arts and music, character development, cooperative behavior, and civic responsibility. To confirm and illustrate the pervasiveness of this goal distortion, I asked a New York City schoolteacher (and former student, Jessica Salute) to interview teachers from around the country about how accountability policies have affected their instructional practices.³ Appendix 4 includes excerpts from many of these interviews.

Chapter 4 analyzes other flaws in test-based accountability. It describes the false hope that achieving “proficiency” can simultaneously be a challenge and a minimum standard for all students. The chapter demonstrates that the normal range of student abilities, even with the best of instruction, will inevitably result in a wide range of cognitive and non-cognitive outcomes. By ignoring this inevitability of human variation, the designers of contemporary accountability policies have developed such fanciful proficiency definitions that even the highest-scoring countries in the world don’t come close to realizing them. The chapter describes how an accountability system organized around achieving a fixed proficiency point leads to excessive concentration on students whose performance is slightly below that point and ignores students who are either above or far below it. Test-based accountability also creates incentives for educators to game the system, for example by manipulating suspension policies or special education assignments for the sole purpose of demonstrating artificial gains on accountability tests.

Chapter 5 shows why we should not have been surprised that test-based accountability plans have corrupted education. Social scientists and business theorists have long argued that public and private enterprises should avoid accountability systems that are based solely or largely on easily measured short-term outcomes like test scores. Certainly, exclusively quantitative incentive and evaluation plans in other sectors have sometimes improved outcomes, but these improvements have usually been offset by more serious adverse impacts. The chapter reviews accountability plans in public and private fields – health care, job training, welfare, criminal justice, and others –and concludes that an accountability plan in any institution will corrupt that institution if the plan relies primarily on quantitative short-term measures without substantial qualitative evaluation. Many errors in the design of test-based accountability plans at the state level, and of NCLB at the federal level, could easily have been avoided if education policy makers had considered the experiences of these other sectors.

Chapter 6 shows that other errors might have been avoided if policy makers had recalled how, 45 years ago, the federal government first designed its measurement of student performance, the National Assessment of Educational Progress (NAEP). In its early years, NAEP focused on long-term outcomes by assessing young adults as well as schoolchildren, and it measured behavioral as well as cognitive results of schooling. The model of early NAEP suggests some elements of a sophisticated accountability system. But because NAEP’s complex sampling methodology can only generate results at the state level (or for very large urban districts), its value is in the information it provides to each state’s governor, legislators, and other policy makers about how young people in that state perform relative to young people in other, comparable states. NAEP cannot tell these policy makers how individual schools, child care centers, public health programs, or after-school programs are contributing to these results. Standardized test scores at the school level provide some information, but only on basic academic skills; supplementing this information requires actual inspection of schools and other institutions of youth development. Easily accessible inspectorate models are available from which to draw.

Chapter 7 begins by recounting the evolution of American school boards and argues that they have lost sight of their obligation to hold schools accountable for outcomes defined by the public through democratic procedures. The chapter then describes some existing arrangements from which elements of a democratic school accountability system can be adapted. Most important is the system of school accreditation that exists today in much of the country. The system is based on school inspections, but it does not presently provide an adequate means of accountability. Those conducting accreditation visits are volunteers with inadequate training and experience, accreditation agencies have no clearly defined expectation of youth outcomes, and accreditation’s mostly voluntary nature is inconsistent with the authority needed for accountability. Accreditation is a reasonably successful peer-review system leading to school improvement, but it falls short of an accountability mission.

Some other nations more successfully use school visitations for accountability purposes. Chapter 7 describes how inspection works in England, where Her Majesty’s Inspectors oversee comprehensive visitation that is focused on a broad range of outcomes, that supplements standardized testing in academic subjects, and that evaluates whether schools, early childhood centers, and other community institutions are making appropriate contributions to young people’s success.

Chapter 8 summarizes by proposing the outlines of an accountability system for American schools and other institutions of youth development. The federal government’s role in this system should include a funding distribution mechanism to ensure that states with limited fi scal capacity and relatively large numbers of disadvantaged children have the resources to generate the youth outcomes for which institutions in those states should be held accountable. It should also include a vastly expanded NAEP that could give state leaders the information they need to determine if their youth demonstrate balanced achievement in each of the eight broad goal areas. Chapter 8 then describes how state leaders, armed with this information, could supplement scores from welldesigned tests with a professional visitation system to hold school districts, schools, and other institutions of youth development accountable for producing these balanced outcomes.

Schools have an important but not exclusive influence on student achievement; the gap in performance of schools with advantaged and disadvantaged children is due in large part to differences in the social and economic conditions from which schoolchildren come.⁴ For this reason, schools can best improve youth outcomes if they are part of an integrated system of youth development and family support services that also includes, at a minimum, high-quality early childhood care, health services, and after-school and summer programs. The understanding of such an integrated system’s importance is widespread – recently, a group of experts with diverse political, academic, and organizational backgrounds began a campaign to convince policy makers of its necessity. Their analysis is reproduced as Appendix 2.

If the expanded NAEP proposed in Chapter 8 were to report that a state’s youth outcomes in any key cognitive or non-cognitive domain were inadequate, state policy makers would then have to decide whether the fault lies in the shortcomings of schools or of other institutions subject to state influence or control. The visitation system proposed in Chapter 8 should be designed to hold schools as well as supporting institutions accountable for high performance.

But first things first. Before detailing this accountability program, we have to ask, “accountable for what?” What are the goals of American public education? Certainly, good test scores are part of the answer, but
should schools be accountable for more – say, good citizenship, or good judgment? If so, is it possible to measure these broader school outcomes to know whether educators are performing satisfactorily? It is to these questions that we now turn.

– Richard Rothstein September 2, 2008

Endnotes

1. In 2003-04, public funds for elementary and secondary schools totaled $462 billion (Snyder, Dillow, and Hoffman 2007, Table 159). In that year, the U.S. defense budget was $380 billion, not including nuclear weapons development outside the Department of Defense, and not including over $100 billion for the Iraq war (DOD 2003, Table 1.1). In fi scal year 2004, total federal, state, and local tax revenues totaled $3.029 trillion (OECD 2007).

2. Helen F. Ladd (2007) distinguishes market-based accountability (parent satisfaction) from political accountability (to elected leaders) and administrative accountability (to government agencies). In a representative democracy, however, government agencies themselves are accountable to elected leaders, so the accountability discussed in this book requires participation of both elected leaders and administrative agencies. Accountability of professionals to their peers for carrying out professional norms is another important aspect, but not a topic of this book. For discussions, see Abelmann et al. 1999, O’Day 2004, and Meier 2002.

3. I use the term goal “distortion,” rather than the term more common in accountability literature, goal “displacement,” because test-based accountability systems not only result in displacement – the substitution of some goals for others (e.g., substituting drill in math for arts or physical education) – but also in corrupting the nature of instruction in the tested skills themselves.

4. The conclusions of many researchers and policy experts on this point are summarized in my book, Class and Schools (Rothstein 2004).

Grading Education: Getting Accountability Right

Click here to order

Track EPI on Twitter

Grading Education: Getting Accountability Right

Sign up to stay informed

Track EPI on Twitter