Lessons—National Test Is Out of Tune With Times

These pieces originally appeared as a weekly column entitled “Lessons” in The New York Times between 1999 and 2003.

[THIS ARTICLE FIRST APPEARED IN THE NEW YORK TIMES ON MARCH 27, 2002]

National Test Is Out of Tune With Times

Test scores are rising nationwide. But with so much focus on testing, scores are bound to go up even if there is no real improvement. Pupils spend more time practicing for tests and as tests become familiar, teachers more often emphasize the types of problems tests include.

The government has a set of exams meant to solve this problem and show if performance is truly going up. The federal tests are the National Assessment of Educational Progress, called NAEP (pronounced nape), and they assess reading, math and other subjects. Because these exams are given only to a sample of students, each of whom takes only part of a test, no individual, school or district scores can be reported. With students and teachers facing no consequences for doing well or poorly, there is no incentive to prepare specially for the exam.

With different schools taking part each time, teachers do not alter how or what they teach as they become familiar with the test. So the combined scores of all students who take partial tests should be reliable guides to underlying achievement trends.

But the national assessment has run into problems, stemming from the failure of education officials to admit that we can never precisely know how achievement has changed – only rough estimates are possible. The government’s foolish effort to track precise trends has led it to use the same test for 30 years.

It may seem that if the same test is given repeatedly, score changes should reflect achievement trends. They do not because curriculums evolve. Unchanging tests eventually cease to reflect what students were actually taught.

It is obvious how this might occur in history or science. When teachers cover new events or discoveries, they must give less time to other topics. But this is also true for a subject as enduring as math. For example, today’s young people will use computers as adults. They will have to be good at estimation, so they can judge if computer calculations are plausible. Estimation was emphasized less in 1973 when the math assessment began, so the original exam would not be a good test of math proficiency today.

Reading passages on a test should also be replaced if they become outdated and unfamiliar, or if our standards evolve. In 1971, when the reading assessment began, we tested literacy mainly by asking multiple choice questions to gauge comprehension. Now, we emphasize written responses.

To keep tests current, new items must be added and old ones dropped. So test trends will never unfailingly indicate proficiency changes.

Educators should be guided by the handling of similar problems in other fields. For instance, the Consumer Price Index is never perfectly accurate. Obsolete products (like typewriters) must be dropped and new ones (cellphones) added. So changes in the index only approximate real trends in prices.

Likewise, when companies decline in importance they are dropped from stock indexes and new ones are added. When we say the Dow has risen by precisely 175 percent in the last 10 years, this is only an estimate because the Dow today includes different companies than before.

Federal education officials have tried to fudge this problem. Instead of adapting the NAEP to curricular changes, they added a second group of tests in the 1980’s, while keeping the old ones. The new tests, called the Main NAEP, reflect the most current curriculums but scores on them cannot be compared with those on older tests. The original tests, called Trend NAEP, are the same as those given 30 years ago but are less relevant to today’s classrooms.

This hodgepodge naturally causes confusion. Math scores on the Main NAEP have been rising, but are flat on the Trend NAEP. When the government releases test results, journalists report contradictory achievement data and even experts often do not distinguish which test was reported.

In 1996 the National Assessment Governing Board, a federal body that oversees the tests, voted to drop the Trend NAEP. But it backed down under pressure from those who feared we would lose the ability to know if schools were improving.

In May, the governing board will take up the issue again. It would be sensible to merge the tests, with a compromise like the one economists use. Price and stock indexes do not change whenever new products or companies emerge – rather, the indexes lag until new items have been around long enough for officials to be certain of their permanence.

A single NAEP should reflect similar balance and change slowly. Unlike the Main NAEP, it would not test the most up-to-date curriculum, to preserve a link to the past. But unlike the Trend NAEP, it would also not test an obsolete curriculum, so students could be assessed on what was actually taught.

Only then will we have a rough idea of how students compare with those in the past, and a rough idea is the most we can expect.

Return to the Education Column Archive