These pieces originally appeared as a weekly column entitled “Lessons” in The New York Times between 1999 and 2003.
[ THIS ARTICLE FIRST APPEARED IN THE NEW YORK TIMES ON DECEMBER 8, 1999 ]
In Judging Schools, One Standard Doesn’t Fit All
“Standards-based reform” has two contradictory meanings. Some policy makers want minimum standards representing what all students must know for promotion or graduation. Others want high standards as goals toward which all students should strive but not all may achieve. Schools need both, but one standard cannot do both jobs. This is why ill-conceived efforts by many states to force one set of standards to serve simultaneously as minimums and goals are putting the entire accountability movement at risk.
New York, for example, has adopted high standards, the Regents exams, a goal previously indicating academic readiness for college. If those become a minimum that all graduates must pass, the state must deny diplomas to a third (or more) of its adolescents, with disastrous social consequences.
To postpone this, passing scores were lowered this year. If they remain low so most students can pass, Regents exams will no longer challenge students who scored just below the old higher passing score and, with hard work and better instruction, might qualify for college. Preparing more students for college is a worthy goal. But by confusing this with the more dubious one of requiring that all qualify, “standards” have backed New York into a corner.
Even with major changes, normal differences in student abilities and teachers’ skill will cause a wide variation in achievement. Typically, scores are distributed around an average. Most are close to average, some far distant.
In graphs, scores look “bell” shaped. A middle bulge represents most students performing near average. Left and right “tails” represent the few who are far below or above it. But this simple statistical truth is nearly taboo because a widely publicized 1994 book, “The Bell Curve,” asserted that race determined academic potential. Many people hesitate to acknowledge normal bell-shaped distributions of ability, fearful of reinforcing the book’s discredited argument. But we can reject the racial claims and still recognize other normal variation.
Bell-shaped distributions are common, not unique to testing. You could, for example, graph batting averages of last year’s regular National League players. A bulge at the bell’s top represents average .277 hitters. The graph tails off at each end. Larry Walker (hitting .379) is at the far right, with Eli Marrero (hitting .192) at the left.
Most players are approximately average hitters, as most students are approximately average learners. The middle two-thirds are “typical,” bunched in the bell’s bulge.
Statisticians measure their “bunched togetherness,” using the term “standard deviation” for the point range of about one-third of all scorers whose achievement is just below (or just above) average. In batting last year, the standard deviation was 34 — surrounding the .277 average, about two-thirds of batters hit between .243 and .311. Another sixth were below or above this range.
Student achievement is similar. Consider the National Assessment of Educational Progress, given to a nationwide student sample. On most N.A.E.P. tests, fourth grade scores that are only one standard deviation above average (fourth grade scores at about the 84th percentile) are higher than average eighth grade scores.
We should push students to do better — raising the floor by tightening the left tail, and shifting the entire bell curve to the right. But even if we achieve these distinct objectives, all students will not perform identically. Many fourth graders above a new higher fourth grade average still are likely to overlap with many eighth graders below a new higher eighth grade average.
We simply cannot set one standard applicable to all. The passing point on any test will reflect abilities of students at that point along the bell curve, and can challenge only those whose abilities are just below that point. For the rest, it must either be impossibly hard (leading to unacceptable failure rates), or too easy (leading to little overall improvement).
New York once had two standards. Regents Competency Tests were a minimum, Regents exams a goal. We should gradually have raised the content of both. Instead, we have tried to make one test do both jobs, baldly asserting that if some students can reach high goals, all can. The result is that policy makers, educators and students have been embarrassed by high failure rates.
Expecting all students, with their wide variability, to aim for one goal is statistically foolish. It is like demanding that every ballplayer hit .277: After we sent all below-average batters back to the minor leagues, few major league teams could take the field.