International tests show achievement gaps in all countries, with big gains for U.S. disadvantaged students

Corrections were made to this post on Jan. 30. For explanations of the corrections, see the full report summarized in this posting.

In a new EPI report, What do international tests really show about U.S. student performance?, we disaggregate international student test scores by social class and show that the commonplace condemnation of U.S. student performance on such tests is misleading, exaggerated, and in many cases, based on misinterpretation of the facts. Ours is the first study of which we are aware to compare the performance of socioeconomically similar students across nations.

Some critics, disturbed by the unsophisticated way in which policymakers and pundits use international tests to condemn American student performance, have commented that American students in relatively affluent states, like Massachusetts or Minnesota, or students in schools where few students are from low-income families, perform as well or better than average students in the highest scoring countries. But while such comparisons are well-intended, they can’t tell us much because a proper comparison would be between affluent states in the U.S., and affluent provinces or prefectures in other countries, or between schools with little poverty in the U.S. and schools with little poverty in other countries. Critics have not previously had data by which such comparisons can properly be made.

MORE: Authors’ response to OECD/PISA reaction to their report (PDF)
AUDIO: Authors speak with the press about their report (MP3)

Yet both of the major international tests—the Trends in International Mathematics and Science Study (TIMSS) and the Program on International Student Assessment (PISA)—eventually publish not only average national scores but a rich database from which analysts can disaggregate scores by students’ socioeconomic characteristics, school composition, and other informative criteria. Examining these can lead to more nuanced conclusions than those suggested from average national scores alone. 

Although TIMSS published average national results in December, it only plans to release its underlying database this week. This puzzling procedure ensures that commentators draw quick but ill-informed interpretations and that policy makers can offer inappropriate interpretations of the results without fear of contradiction. Analysis of the database takes time, and headlines from the initial release are sealed in conventional wisdom before scholars can complete more careful study.

For example, two years ago when PISA released its latest scores, U.S. Education Secretary Arne Duncan said they showed American students “are poorly prepared to compete in today’s knowledge economy. … Americans need to wake up to this educational reality—instead of napping at the wheel while emerging competitors prepare their students for economic leadership.” In particular, Duncan stressed results for disadvantaged U.S. students:  “As disturbing as these national trends are for America, enormous achievement gaps among black and Hispanic students portend even more trouble for the U.S. in the years ahead.”

Yet a careful analysis of the PISA database shows that the achievement gap between disadvantaged and advantaged children is actually smaller in the United States than it is in similar countries. The achievement gap in the United States is larger than it is in the very highest scoring countries, but even then, many of the differences are small.

What’s more, an examination of trends over the last decade, on multiple administrations of both TIMSS and PISA, shows that the achievement of the most disadvantaged U.S. adolescents has been increasing rapidly, while the achievement of similarly disadvantaged adolescents in some countries that are typically held up as examples for the U.S.—Finland for example—has been falling just as rapidly. Thus, while the reading achievement on PISA of the lowest social class students in the U.S. grew by more than 0.2 standard deviations from 2000 to 2009, it fell by an even larger amount in Finland. In math, the lowest social class U.S. students also posted substantial gains, while scores of comparable Finnish students declined. This is surprising because the proportion of disadvantaged students in Finland also fell, and we might expect this to make the task of devoting resources to them easier. Certainly, even for the lowest social class students, Finland’s scores remain higher than ours, but examination of trends as well as levels challenges the easy assumption that simply imitating Finnish education is a recipe for U.S. success.

Once aware of these data trends, it would be perverse for policy makers to conclude from international test comparisons that we should upend how we educate disadvantaged youngsters. While American schools can certainly do better with disadvantaged children, it seems that our educational system may have more serious problems with the more advantaged students, relative to other nations.

Since the last PISA release in 2010, we have been digging deeper into its database, as well as into the older databases for TIMSS and for both versions of our domestic National Assessment of Educational Progress (NAEP). We concentrated on scores of adolescents (8th graders on TIMSS and NAEP, 15 year-olds on PISA) in the U.S., in three top-scoring countries (Canada, Finland, and Korea), in three similar post-industrial countries (France, Germany and the U.K.), and in seven American states and three Canadian provinces for which trends are available because they voluntarily participated in TIMSS more than once.

The share of disadvantaged students in the U.S. sample was larger than their share in any of the other countries we studied. Because test scores in every country are characterized by a social class gradient—students higher in the social class scale have better average achievement than students in the next lower class—U.S. student scores are lower on average simply because of our relatively disadvantaged social class composition.

This social class driven distortion has been compounded because in 2009, PISA over-sampled low-income U.S. students who attended schools with very high proportions of similarly disadvantaged students, artificially lowering the apparent U.S. score. While 40 percent of the PISA sample was drawn from schools where half or more of students were eligible for free and reduced-price lunch, only 32 percent of students nationwide attend such schools.

Our report shows that if we make two reasonable adjustments to the reported U.S. average, our international ranking improves. The first adjustment re-weights the social class composition of U.S. test takers to the average composition of top-scoring countries. The other re-weights the distribution of lunch-eligible students by the actual intensity of such students in schools. These adjustments raise the U.S. international ranking on the 2009 PISA test from 14th to sixth in reading, and from 25th to 13th in mathematics. While there is still room for improvement, these are quite respectable showings.

A unique aspect of our report is our consolidation of trend data from all four assessments—PISA, TIMSS and the two forms of NAEP. From 2000 to 2006, U.S. math scores on PISA fell substantially, causing great alarm among U.S. policymakers and pundits. But few noticed that during roughly the same period, U.S. math scores on TIMSS were rising, as they did on the Main NAEP. We cannot attribute this to an alleged superior alignment of TIMSS to the U.S. curriculum because in the next period, the pattern was reversed: from 2006 to 2009, PISA and NAEP scores both rose, while TIMSS scores were flatter. We are aware of no reasonable explanation for these erratic patterns, but they suggest that caution is called for before drawing conclusions from any single test about international comparisons—or about anything else, for that matter.

Extensive educational research in the United States has demonstrated that students’ family and community characteristics powerfully influence their school performance. Children whose parents read to them at home, whose health is good and can attend school regularly, who do not live in fear of crime and violence, who enjoy stable housing and continuous school attendance, whose parents’ regular employment creates security, who are exposed to museums, libraries, music and art lessons, who travel outside their immediate neighborhoods, and who are surrounded by adults who model high educational achievement and attainment will, on average, achieve at higher levels than children without these educationally relevant advantages.

Aware of these relationships, education analysts in the United States pay close attention to the level and trends of test scores disaggregated by socioeconomic groupings. Indeed, a central element of U.S. domestic education policy is the requirement that average scores be reported separately for racial and ethnic groups and for children who are from families whose incomes are low enough to qualify for the subsidized lunch program. We understand that a school with high proportions of disadvantaged children may be able to produce great “value-added” for its pupils, although its average test score levels may be low. We know much less about the extent to which similar factors affect achievement in other countries, but we should assume, in the absence of evidence to the contrary, that they do. It would be foolish to fail to apply this understanding to comparisons of international test scores.

The database for TIMSS 2011 is scheduled for release soon, and next December, PISA will announce results and make data available from its 2012 test administration. Scholars will then be able to dig into TIMSS 2011 and PISA 2012 databases and place the publicly promoted average national results in proper context. We urge policymakers and pundits to await understanding of this context before drawing conclusions about lessons from future TIMSS or PISA assessments. We plan to conduct our own analyses of these data when they become available, and publish supplements to our  report as soon as it is practical to do so, given the care that should be taken with these complex databases.


Note: In Dec. 2012, we sent a draft copy of our report on international test score comparisons to Andreas Schleicher of OECD/PISA, and to Hans Wagemaker of IEA/TIMSS. Wagemaker commented on the report at a press conference announcing the report’s release. Audio of the press conference is available here. Schleicher sent written comments. We have posted his comments,  and our response to these, here.


  • dmagill3

    This is truly fascinating, and entirely hilarious. I have been blogging (www.edu-truth.com) for some time now, and regularly criticize the dogmatic usage of these dubious international tests, as if their results actually MEAN something.

    And now, in a simple internet search, I find a tremendously informative study backing up everything I’ve been saying the last two years.

    And I wasn’t guessing. It’s a pretty simple set of inferences anyone with an understanding of statistics and enough experience teaching would be able to make. Thank you for publishing this, and I hope you are able to interrupt the barrage of school-bashing that these worthless test scores are used to do.

    My favorite part is the “oversampling” of poor students in U.S. schools by PISA. Truly a priceless piece of information.

    -Seattle Teacher

  • janney8

    The results from disaggregating these tests make sense. While we could be better, we are working our butts of to reach our disadvantaged kids. I am tired of politicians using the test results to bash our system and its teachers.

  • janney8

    The results from disaggregating these tests make sense. While we could be better, we are working our butts of to reach our disadvantaged kids. I am tired of politicians using the test results to bash our system and its teachers.

  • janney8

    The results from disaggregating these tests make sense. While we could be better, we are working our butts of to reach our disadvantaged kids. I am tired of politicians using the test results to bash our system and its teachers.

  • janney8

    The results from disaggregating these tests make sense. While we could be better, we are working our butts of to reach our disadvantaged kids. I am tired of politicians using the test results to bash our system and its teachers.

  • FermisParadox

    There is quite a bit out there written about the PISA tests and dis-aggregating the data, it appears it does not significantly explain-away our poor results. Arguably the three most significant root causes are: “raw material” of the student, culture and the learning environment created by teachers. Cognitive ability consistently correlates strongly with academic achievement, it is largely heritable and stable, and it is a significant root cause of poverty in our modern, developed economy, not vice versa. Culture or the collection of values one acquires and retains contributes significantly to both educational achievement and economic success, not vice versa either (it is not poverty that makes one irresponsible, immoral, engage in anti-social behavior, or shirk education – look at China and India as examples). Sharp (i.e., raw material), motivated (i.e., culture/values) students can excel sitting under an oak tree using dog-eared test books and be exceptional given a capable teacher. And the entire debate about the comparative “poverty” in this country is wide open to debate. Please review how these often cited poverty metrics are determined, then investigate what “actual poor” (as classified by these metrics) “actually have” verses what poor in other countries have. It is shocking, we are a land of plenty.