Friday, December 4, 2009


Replication in ClimateGate

Irritatingly often I see comments on Climategate blog posts saying that economics and climatology aren't real sciences. I don't mind Econ not being classified as a science; rather, it is the scoffing tone that I don't like. Econ is not a science; it's better than science. But I won't argue that here.

Rather, the main issues in ClimateGate are not special to science. Peer review and intimidation of editors and other scholars is not. Close linkages with supposedly unbiased blogs and newspapers is not. Violating freedom-of-information laws is not. Sloppy scholarship is not. And, finally, the refusal to allow replication is not.

By that I don't mean to say that all these sins are common in every field. Far from it! But they are possible in every field.

Consider replication. The issue in ClimateGate is the temperature data series. The scientists started with raw data from hundreds of weather stations covering 150 years, and their end product is a monthly average temperature for every sector of the globe (and a global average too). They did not measure the temperatures themselves-- they used data thousands of other people collected over 150 years, 95% of which is publicly available, much of it on the web. Their task was to process the data. They had to choose which weather stations are reliable and average different weather stations within a sector, for example. If one station only existed from 1850 to 1917 and the next one in the vicinity lasted from 1935 to 2009, they had to figure out what to do. They had to worry about the Urban Heat Island effect--- what happens when a city full of hot air and concrete grows around a weather station that started out in an empty field. So there was a lot of processing.

What East Anglia would not reveal is which weather stations it used for what years, and how exactly they made the adjustments to get their sector averages. Thus, nobody can replicate their work. Indeed, they can't do it themselves--- they have admitted that they destroyed much of their input data, and the ClimateGate leak tells us that even if they had it, their computer code is too poorly written for anybody to understand, even themselves. Now, back to the general case. This is not a failure of the scientific method, especially. It could happen in any field with sufficiently low standards for publication, if any other such field existed. Analogies:

  1. A mathematician claims to have squared the circle. He gives us the axioms and the proposition, but keeps the proof secret. "I need to use some of the techniques for future research," he says.

  2. An economist claims to show that sales of Twinkies are a good predictor of recessions. He shows us a graph, and the results of many regressions that have high R2 and significant coefficients, but he keeps the Twinkies sales data secret. "The company that gave it to me did so on condition that I not reveal their sales to competitors," he explains.

  3. An English professor claims that contrary to what Mencken claims in his famous essay, the American South has produced more good literature than any similarly sized region in the world. He says there are 127 great novels from the South, but he doesn't say what they are or why they are great, or what other regions have produced. "This is the consensus of the people in my field, though I won't say exactly who because that is too personal, and the people in my field are very smart and have studied books a lot more than amateurs," he says.

