Preface by Daniel Koretz
Series editors Sean P. Corcoran and Joydeep Roy
Printed on acid-free paper
Manufactured in the United States of America
Table of Contents
Preface by Daniel Koretz
Introduction by Sean P. Corcoran and Joydeep Roy
Part I: Performance Pay In the U.S. Private Sector: Concepts, Measurement, and Trends
1.2 Types of Performance Pay
1.3 Potential and Pitfalls for Performance Pay
1.4 measuring Performance Pay: U.S. Incidence and Trends
1.5 Performance Pay as a Share of Compensation
Part II: The Perils of Quantitative Performance Accountability
2.2 Accountability by the Numbers
About the Authors
by Daniel Koretz
Accountability for students’ test scores has become the cornerstone of education policy in the United States. State policies that rewarded or punished schools and their staffs for test scores became commonplace in the 1990s. The No Child Left Behind (NCLB) act federalized this approach and made it in some respects more draconian. There is now growing interest in pay for performance plans that would reward or punish individual teachers rather than entire schools. This volume is important reading for anyone interested in that debate.
The rationale for this approach is deceptively simple. Teachers are supposed to increase students’ knowledge and skills. Proponents argue that if we manage schools as if they were private firms and reward and punish teachers on the basis of how much students learn, teachers will do better and students will learn more. This straightforward rationale has led to similarly simple policies in which scores on standardized tests of a few subjects dominate accountability systems, to the near exclusion of all other evidence of performance.
It has become increasingly clear that this model is overly simplistic, and that we will need to develop more sophisticated accountability systems. However, much of the debate—for example, arguments about the reauthorization of NCLB—continues as if the current approach were at its core reasonable and that the system needs only relatively minor tinkering. To put this debate on a sensible footing requires that we confront three issues directly.
The first of these critically important issues, addressed in the first section of this volume by Scott Adams and John Heywood, is that the rationale for the current approach misrepresents common practice in the private sector. Pay for performance based on numerical measures actually plays a relatively minor role in the private sector. There are good reasons for this. Economists working on incentives have pointed out for some time that for many occupations (particularly, professionals with complex roles), the available objective measures are seriously incomplete indicators of value to firms, and therefore, other measures, including subjective evaluations, have to be added to the mix.
And that points to the second issue, known as Campbell’s Law in the social sciences and Goodhart’s Law in economics. In large part because available numerical measures are necessarily incomplete, holding workers accountable for them—without countervailing measures of other kinds—often leads to serious distortions. Workers will often strive to produce what is measured at the expense of what is not, even if what is not measured is highly valuable to the firm. One also often finds that employees “game” the system in various ways that corrupt the performance measures, so that they overstate production even with respect to the goals that are measured. Richard Rothstein’s section in this volume shows the ubiquity of this problem and illustrates many of the diverse and even inventive forms it can take. Some distortions are inevitable, even when an accountability system has net positive effects that make it worth retaining. However, the net effects can be negative, and the distortions are often serious enough that they need to be addressed regardless. To disregard this is to pay a great disservice to the nation’s children.
The third essential issue is score inflation—increases in scores larger than the improvements in learning warrant—which is the primary form Campbell’s Law takes in test-based accountability systems. Many educators and policy makers insist that this is not a serious problem. They are wrong: score inflation is real, common, and sometimes very large.
Three basic mechanisms generate score inflation. The first is gaming that increases aggregate scores by changing the group of students tested—for example, removing students from testing by being lax about truancy or assigning students to special education. The second, which is a consequence of our ill-advised and unnecessary focus on a single cut score (the “proficient” standard), is what many teachers call “the bubble kids problem.” Some teachers focus undue effort on students near the cut while reducing their focus on other students well below or above it, because only the ones near the cut score offer the hope of improvement in the numbers that count.
The third mechanism is preparing students for tests in ways that inflates individual students’ scores. This mechanism is the least well understood and most controversial, but it can be the most important of the three, creating very large biases in scores. One often hears the argument: “our test is aligned with standards, and it measures important knowledge and skills, so what can be wrong with teaching to it?” This argument is baseless and shows a misunderstanding of both testing and score inflation. Score inflation does not require that the test contain unimportant material. It arises because tests are necessarily small samples of very large domains of achievement. In building a test, one has to sample not only content, but task formats, criteria for scoring, and so on. When this sampling is somewhat predicable—as it almost always is—teachers can emphasize the material most likely to recur, at the expense of other material that is less likely to be tested but that is nonetheless important. The result is scores that overstate mastery of the domain. The evidence is clear that this problem can be very large. There is no space here to discuss this further, but if you are not persuaded, I strongly urge you to read Measuring Up: What Educational Testing Really Tells Us, where I explain the basic mechanisms by which this happens and show some of
the evidence of the severity of the problem.
My experience as a public school teacher, my years as an educational researcher, and my time as a parent of students in public schools have all persuaded me that we need better accountability in schools. We won’t achieve that goal, however, by hiding our heads in the sand. This volume will make an important contribution to sensible debate about more effective approaches.
Daniel Koretz is the Henry Lee Shattuck Professor of Education at the Harvard Graduate School of Education, Harvard University, and is a member of the National Academy of Education.
by Sean P. Corcoran and Joydeep Roy
With recent research in K-12 education highlighting teacher quality as one of the most important school inputs in educational production, performance-based pay for teachers has been embraced by policy makers across the political spectrum. In the 2008 presidential campaign, for example, both Barack Obama and John McCain touted teacher pay reform as a necessary lever for raising student achievement and closing the achievement gap (Klein 2008; Hoff 2008b).
The use of performance pay in education is not new (Murnane and Cohen 1986). But this latest surge of interest differs from earlier waves in several key respects. First, we have much greater scientific support for investments in teacher quality. Recent research has found that teachers represent the most significant resource schools contribute to academic achievement, a finding that has sharpened policy makers’ focus on teacher effectiveness (Hanushek and Rivkin 2006). Second, today’s school administrators possess a wealth of achievement measures that can be easily linked to individual teachers. While initially intended for public reporting, these measures have quickly found their way into teacher evaluation and compensation systems. Finally, new and sophisticated statistical models of teacher “value added” have emerged that many believe can be used to accurately estimate teacher effectiveness (Gordon, Kane, and Staiger 2006; Harris 2008).
Proponents of performance pay in education frequently point to the private sector as a model. Where the traditional salary schedule fails to reward excellence in the classroom, it is argued, performance pay is a ubiquitous and powerful tool in the private sector. (Eli Broad recently asserted that he “could not think of any other profession that does not have any rewards for excellence” (Hoff 2008a)). Were schools to explicitly link pay to student achievement (measured through standardized testing), teachers would be incentivized to focus on results, and quality would rise in the long run as high-productivity teachers gravitate into the profession (Hoxby and Leigh 2004).
To be sure, private industry has a longer and richer history of pay-for-performance than public schooling. Not-for-profit and governmental organizations have also experimented with performance accountability systems for decades. But discussions of these experiences are notably absent in the current debate over performance-based pay in education. Is performance pay really ubiquitous among professional workers in the private sector? To what extent are private sector workers compensated based on individual or group measures of productivity? How should performance pay systems be designed? In what types of industries are performance pay systems most effective? How have past performance accountability systems fared in the public sector?
In the first of a series of reports intended to inform the debate over the use of performance-based pay in America’s public schools, we compile here two timely and informative papers on performance compensation and evaluation outside of education. In the first, Scott Adams and John Heywood conduct one of the first systematic analyses of the pay-for-performance practices in the private sector. Guided by a simple taxonomy of performance-based pay systems, Adams and Heywood draw upon several large surveys of workers and firms to estimate the overall incidence of performance-based pay in private industry. While they find that periodic “bonus” payments are relatively common (and growing) in the private sector, they represent a very small share of overall compensation and are generally not explicitly tied to simple measures of output. Formulaic payments based on individual productivity measures are rare, particularly among professionals.
In their analysis, Adams and Heywood draw upon several large surveys of workers and firms, including the National Compensation Survey (NCS), National Longitudinal Survey of Youth (NLSY), Panel Study of Income Dynamics (PSID), and National Study of the Changing Workforce (NSCW). While none of these data sources are ideally suited for this task, the conclusions that emerge from their combined analysis are remarkably consistent:
1. Pay tied directly to explicit measures of employee or group output is surprisingly rare in the private sector. For example, in the 2005 NCS, only 6% of private sector workers were awarded regular output-based payments. The incidence is even lower among professionals.
2. “Non production” bonuses, which are less explicitly tied to worker productivity, are common, and their use has grown over time. However, these bonuses represent only a very small share of overall compensation (the median share in the NCS and NLSY ranges from 2% to 3% of overall pay).
3. The incidence and growth of bonus pay is disproportionately concentrated in the finance, insurance, and real estate industries (true in the NCS, NLSY, and NSCW).
Additionally, male and non-unionized workers are much more likely to receive performance-based pay.
The low incidence of base or bonus pay tied to individual output does not, of course, imply that private sector compensation is unrelated to job performance. It may be that career trajectories—movements into, within, and between firms, for example—are what track worker productivity in the private sector. To the extent this is true, these private sector “career ladders” should be an important consideration for those designing competitive teacher pay systems.
Unfortunately, Adams and Heywood are unable to measure the relationship between private sector career trajectories and individual productivity in their data. But what they do convincingly show is that few professionals are compensated based on formulaic functions of measured output. While many private sector workers earn bonuses, these bonuses represent only a small share of total compensation, and are not necessarily tied to explicit measures of worker output. This result is not surprising. After all, most modern professional work is complex, multi-faceted, and not easily summarized by simple quantitative measures.
In the second part, Richard Rothstein reviews a long history of performance accountability systems in the public and private arena. He begins by recounting the work of social scientists Herbert Simon and Donald Campbell who long ago warned of the problems inherent in measuring public service quality and evaluating complex work with simple quantitative indicators. Through a series of historical examples he highlights countless examples of goal distortion, gaming, and measure corruption in the use of performance evaluation systems. Rothstein concludes that the pitfalls associated with rewarding narrow indicators have led many organizations—including prominent corporations like Wal-Mart and McDonalds—to combine quantitative indicators with broader, more-subjective measures of quality and service.
Rothstein argues that the challenges inherent in devising
an adequate system of performance pay in education—appropriately defining and measuring outputs and inputs, for example—surprise many education policy makers, who often blame its failure on the inadequacy of public educators. In fact, corruption and gaming of performance pay systems is not peculiar to public education. The existence of such unintended practices and consequences has been extensively documented in other fields by economists, management theorists, sociologists, and historians. Rothstein’s study undertakes the important task of introducing this literature from other fields to scholars of performance incentive systems in education. It reviews evidence from medical care, job training, policing and other human services and shows that overly narrow definitions of inputs and outputs have been pervasive in these sectors’ performance measurement systems, often resulting in goal distortion, gaming, or other unintended behaviors. Rothstein also discusses how these problems limit the use of performance incentives in the private sector, and concludes by showing that performance incentives run the risk of subverting the intrinsic motivation of agents in service professions like teaching.
Together, these authors’ work provide important context for the implementation of pay-for-performance in education: the incidence of performance pay in the private sector and the experience of performance measurement in both the private and public sectors. These studies offer lessons which will be crucial in the debate over whether performance pay is suited to education, and how we think about designing and implementing such a system. Later papers in this series will review the history and experiments with performance pay systems in U.S. education, critically analyze some of the most important merit pay systems currently in use by school districts across the country, suggest alternative frameworks for teacher compensation, and discuss how teachers themselves feel about pay-for-performance.
Gordon, Robert, Thomas J. Kane, and Douglas O. Staiger. 2006. Identifying Effective Teachers Using Performance on the Job. Policy Report, The Hamilton Project. Washington, D.C.: Brookings Institution.
Hanushek, Eric A., and Steven G. Rivkin. 2006. “Teacher Quality,” in E. A. Hanushek, and
F. Welch, eds., Handbook of the Economics of Education. Elsevier, 2006, pp. 1051-1078.
Harris, Douglas N. 2008. “The Policy Uses and ‘Policy Validity’ of Value-Added and Other Teacher Quality Measures,” in D. H. Gitomer, ed., Measurement Issues and the Assessment of Teacher Quality. Thousand Oaks, Calif.: SAGE Publications.
Hoff, David J. 2008a.Teacher-pay issue is hot in DNC discussions. Education Week. August 25.
Hoff, David J. 2008b. McCain and Obama tussle on education. Education Week. October 22.
Hoxby, Caroline M., and Andrew Leigh. 2004. Pulled away or pushed out? Explaining the decline of teacher aptitude in the United States. American Economic Review. Vol. 94, pp. 236-40.
Sean P. Corcoran is an assistant professor at New York University Steinhardt School of Culture, Education, and Human Development.
Joydeep Roy is an economist at the Economic Policy Institute. His areas of focus include economics of education, education policy, and public and labor economics.
EPI appreciates the Ford Foundation—and Fred Frelow in particular—for supporting the research in the Economic Policy Institute’s Series on Alternative Teacher Compensation Systems.
All of the authors of this volume want to express their gratitude to the Economic Policy Institute’s publications staff—department director Joseph Procopio, editor Ellen Levy, and designer Sylvia Saab—for their dedication and hard work in the launching of this new EPI book series.
Part I: Performance Pay in the U.S. Private Sector
The authors thank Daniel Parent for his helpful conversations and for sharing his estimates from the PSID. They also thank Patrick O’Halloran for assistance with the NLSY97. Anthony Barkume and Al Schenk deserve thanks for their efforts in explaining the BLS Employment Cost Index and for preparing several special tabulations used in this report. We acknowledge that those special tabulations have not passed the usual BLS procedures for guaranteeing quality and reliability. Heywood thanks both Michelle Brown and Uwe Jirjahn for histories of fruitful joint work on performance pay. Both authors also thank the readers of various drafts of their study, particularly Matt Wiswall, Marigee Bacolod, and Jason Faberman for their helpful reviews. None of those mentioned are responsible for the results or opinions expressed here.
Part II: The Perils of Quantitative Performance Accountability
Part of this study was prepared for presentation at the conference, “Performance Incentives: Their Growing Impact on American K-12 Education,” sponsored by the National Center on Performance Incentives at Peabody College, Vanderbilt University, February 27-29, 2008. Support for this research was also provided by the Campaign for Educational Equity, Teachers College, Columbia University. The views expressed in this chapter are those of the author alone, and do not necessarily represent the views of the Economic Policy Institute, the Campaign for Educational Equity, Teachers College, Columbia University, or the National Center on Performance Incentives, Peabody College, or Vanderbilt University.
I am heavily indebted to Daniel Koretz, who has been concerned for many years with how “high stakes” can render test results unrepresentative of the achievement they purport to measure, and who noticed long ago that similar problems arose in other fields. Discussions with Professor Koretz, as I embarked on this project, were invaluable. I am also indebted to Professor Koretz for sharing his file of newspaper clippings on this topic and for inviting me to attend a seminar he organized, the “Eric M. Mindich Conference on Experimental Social Science: Biases from Behavioral Responses to Measurement: Perspectives from Theoretical Economics, Health Care, Education, and Social Services,” in Cambridge, Massachusetts, May 4, 2007. Several participants in that seminar, particularly George Baker of the Harvard Business School, Carolyn Heinrich of the Lafollette School of Public Affairs at the University of Wisconsin, and Meredith Rosenthal of the Harvard School of Public Health were generous in introducing me to the literatures in their respective fields, answering my follow-up questions, and referring me to other experts. Much of this chapter results from following sources initially identified by these experts.
Access to literature from many academic and policy fields, within and outside education, was enhanced with extraordinary help of Janet Pierce and her fellow-librarians at the Gottesman Libraries of Teachers College, Columbia University.
Others have previously surveyed this field. Stecher and Kirby (2004), like the present effort, did so to gain insights relating to public education. But their survey has attracted insufficient attention in discussions of education accountability, so another effort is called for. Haney and Raczek (1994), in a paper for the U.S. Office of Technology Assessment, warned of problems similar to those analyzed here that would arise if quantitative accountability systems were developed for education. Two surveys, Kelman and Friedman (20
07), and Adams and Heywood (this volume) were published or became available to me while I was researching this chapter and summarized some of the same issues in a fashion which this chapter, in many respects, duplicates. Susan Moore Johnson reminded me about debates in the early 1980s about whether teachers’ intrinsic motivation might be undermined by an extrinsic reward-for-performance system.
A forthcoming Columbia University Ph.D. dissertation in sociology, contrasting “risk adjustment” in medical accountability systems with the absence of such adjustment in school accountability, should make an important contribution (Booher-Jennings, forthcoming).
This chapter cites studies from the business, management, health, and human capital literatures, as well as previous surveys of those literatures, in particular Baker (1992), Holmstrom and Milgrom (1991), Mullen (1985), and Blalock and Barnow (2001). I am hopeful, however, that this chapter organizes the evidence in a way that may be uniquely useful to education policy makers grappling with problems of performance incentives in education.
This chapter has benefited from criticisms and suggestions of readers of a preliminary draft. I am solely responsible for remaining errors and misinterpretations, including those that result from my failure to follow these readers’ advice. For very helpful suggestions, I am grateful to Marcia Angell, Julie Berry Cullen, Carolyn Heinrich, Jeffrey Henig, Rebecca Jacobsen, Trent Kaufman, Ellen Condliffe Lagemann, Lawrence Mishel, Howard Nelson, Bella Rosenberg, Joydeep Roy, Brian Stecher, and Tamara Wilder.
About the Authors
Scott J. Adams is Associate Professor of Economics and faculty member in the Graduate Program in Human Resources and Labor Relations at the University of Wisconsin-Milwaukee. He has previously worked as a Robert Wood Johnson Fellow in Health Policy Research at the University of Michigan. He worked at the President’s Council of Economic Advisers as senior economist responsible for education and labor. His research on labor issues has been published in the Journal of Human Resources, Journal of Public Economics, Economics of Education Review, and the Journal of Urban Economics.
John S. Heywood is Distinguished Professor of Economics and Director of the Graduate Program in Human Resources and Labor Relations at the University of Wisconsin-Milwaukee. He holds a concurrent position with the School of Business at the University of Birmingham in the United Kingdom. His research on performance pay has been widely published including in the Journal of Political Economy, Journal of Human Resources, and Economica. He also co-edited the recent text Paying for Performance: An International Comparison.
Richard Rothsten is a research associate of the Economic Policy Institute. From 1999 to 2002 he was the national education columnist of The New York Times. He was a member of the national task force that drafted the statement, “A Broader, Bolder Approach to Education” (www.boldapproach.org). He is also the author of Class and Schools: Using Social, Economic, and Educational Reform to Close the Black-White Achievement Gap (EPI & Teachers College 2004) and The Way We Were? Myths and Realities of America’s Student Achievement (1998). Rothstein was a co-author of the books The Charter School Dust-Up: Examining the Evidence on Enrollment and Achievement (2005) and All Else Equal: Are Public and Private Schools Different? (2003). A full listing of Rothstein’s publications on education and other economic policy issues, including links, can be found at epi.org/people/richard-rothstein/.