Data Avengers

What school testing data really tells us

Shanghai has high test scores! What can we learn from Shanghai schools?

Why aren't American kids doing as well as their international peers? We need to revamp US curricula!

While it's never enjoyable to watch data blatantly misused, it's particularly painful to see the foolish conclusions people leap to each time student test scores are released. These comments are, after all, based in part on data about math exams. If only one could require a statistical literacy test for anyone pontificating about PISA, the Program for International Student Assessment that tests 15-year-old students in math, reading, science, problem-solving and financial literacy.

Alas, we can't. But is it too much to ask for those who offer analysis of educational testing data to have a even a rudimentary understanding of statistics?

There are some detailed analytical issues that serious statisticians have about PISA, such as methodology used to determine "plausible scores" of students who didn't answer all the questions. I don't expect most people forming opinions about educational theories based on PISA to delve into those.

But there's a more basic problem with using these results to compare educational systems -- one that anyone who's literate should be able to understand. If you're comparing two things and they're different in more than one way, you can't just randomly point to one difference as the cause for varying outcomes.

In statistics-speak, this is sometimes an issue of "confounding variables" -- other, non-always-obvious factors that are influencing your results. Perhaps a study finds that people who take frequent breaks at work die sooner. Does that mean taking breaks at work causes premature death? Or, more likely, does it mean that most people who take those breaks are heading outside to smoke?

Another classic example: Data show a strong correlation between ice cream consumption in a city and drowning deaths. Does that mean eating ice cream causes drowning? Of course not; what it means is that both things tend to happen when it's warm outside.

These sound silly when there's clearly no relationship between, say, buying ice cream and drowning -- and when the "confounding" factor is obvious, like summer weather. Unfortunately, otherwise sensible people who would never assume that ice cream causes drowning look at temptingly precise-sounding data on test scores and jump to the conclusion that the results are all due to the way students are taught.

There are many factors that go into student achievement (assuming that these tests even accurately measure achievement, which is in fact debatable -- but let's leave that aside for now). I'd argue that a student's family situation -- especially income and parental educational level -- has considerable influence in academic performance. So, I suspect, do a students' peers.

Here's a chart showing child poverty rates in Massachusetts for the 10 school districts performing best on that state's MCAS standardized tests and the 10 schools districts doing the worst. Best districts are on the left.

Top vs bottom Mass. districts on standardized test scores

Chart prepared by Massachusetts State Sen. Pat Jehlen and posted at the Blue Mass Group blog; republished with permission.

Notice a trend? After seeing this data, can you honestly conclude that high scores at the top-performing districts are entirely due to the teachers and administrators in those districts, or the methodologies they're using compared to the others, and have nothing to do with socioeconomic factors? I can't.

So here's the key point. Confounding variables can be excellent predictors -- even if they're not causing whatever results you see. That is, if for some reason you didn't have temperature data but could only get your hands on ice cream consumption stats, this would still help you forecast whether drowning rates are likely to be higher or lower -- as well as boating accidents and bee stings, for that matter.

Likewise, knowing that a student is going to school in Shanghai is probably a decent predictor of relatively high scores on the PISA tests -- but that doesn't tell us anything about whether it's the educational methods of those schools which are responsible for those scores. Not unless we can factor out all other possible confounding variables.

As it turns out, there are some other likely confounders. Tom Loveless, a senior fellow at the Brookings Institute, outlines a few:

How dissimilar is Shanghai to the rest of China? Shanghai’s population of 23-24 million people makes it about 1.7 percent of China’s estimated 1.35 billion people. Shanghai is a Province-level municipality and has historically attracted the nation’s elites. About 84 percent of Shanghai high school graduates go to college, compared to 24 percent nationally. Shanghai’s per capita GDP is more than twice that of China as a whole. And Shanghai’s parents invest heavily in their children’s education outside of school. . . .
[A]t the high school level, the total expenses for tutoring and weekend activities in Shanghai exceed what the average Chinese worker makes in a year.

Is it solely what happens within Shangai's schools that's responsible for its high test scores? What if we limited US testing to, say, the upper West Side of New York City, where parents are also wealthier than average and tend to invest in outside tutoring and other activities for their kids. I suspect we could see results as impressive as Shanghai's. Could we then jump to the conclusion that the rest of the nation should emulate what's being done within those particular public schools?

Now, it may well be a rational move locally for parents to move to communities with good standardized test results. The teachers and teaching methods may or may not be significantly better than average; but even if they're not, it's certainly possible that just being around other high-achieving kids in an environment that expects top academic performance makes it more likely that a student reaches his or her full potential.

But the bottom line here is this: You can't assume that high test scores are caused by a certain type of curriculum, teaching methodology or education policy simply by looking at raw standardize test results.

Do things like curriculum, teaching methods and policies matter in educational outcome? Of course they do. But unless your testing data can factor out things like students' family incomes and parents' education levels, standardized test results won't tell you.

If only everyone forming conclusions about standardized test results were required to pass a basic test on statistics of their own.

See more from the Data Avenger series.

Join the discussion
Be the first to comment on this article. Our Commenting Policies