
CRU's "very artificial correction for decline" is dead code - yummyfajitas
http://www.jgc.org/blog/2009/11/about-that-cru-hack.html
======
DanielBMarkham
Yay! A story about politics, programming, and climate science, all wrapped
into one! It's like Christmas has come early here on HN.

And the story just keeps getting better and better, too. Lots of twists and
turns.

So now we have code comments which are incendiary put ahead of, as it turn
out, dead code. It's as if we found IRS code that had pieces in it like "This
part makes sure all environmentalists get audited" but then the next part
never runs. So it looks awful, but it does nothing.

So I guess that's the key for any kind of code analysis, right? What does it
do? You run the code. Has anybody taken the raw input and ran the code to make
sure this piece of code is currently representative of the code used at the
time?

No wait -- we can't do that, can we? Because _the data was lost/destroyed_ ,
right? So we have some _version_ of some abstract pieces of code with
incriminating comments in them that aren't being ran. On one hand, if you're
looking for some kind of solid proof of scientists gone wild, you're never
going to find it -- no matter how many times you add up nothing and nothing
you'll get nothing -- _because there are data and reproducibility problems
that we can never resolve_ So it's always going to be he-said, she-said. On
the other hand, that's been the problem with this issue all along -- data,
models, and reproducibility. This case just brings it all out into the
glorious sunshine for programmers like us to view.

Outstanding.

------
cwan
From Eric Raymond (ESR, in the comments of the original post that JGC responds
to) - <http://esr.ibiblio.org/?p=1447#comment-243042>:

"As other have repeatedly pointed out, that code was written to be used for
some kind of presentation that was false. The fact that the deceptive parts
are commented out now does not change that at all.

It might get them off the hook if we knew — for certain — that it had never
been shown to anyone who didn’t know beforehand how the data was cooked and
why. But since these peiple have conveniently lost or destroyed primary
datasets and evaded FOIA requests, they don’t deserve the benefit of that
doubt. We already know there’s a pattern of evasion and probable cause for
criminal conspiracy charges from their own words."

~~~
Tichy
"that code was written to be used for some kind of presentation that was
false. "

In other words, he doesn't know a damn thing. I can't help but feel that the
anti-warming crowd (which includes ESR if I remember correctly, he wrote some
long rant about peak oil?) is trying to blow a tiny mite up to be an elephant.
It's typical revisionist behavior, just like the creationists who find a small
bone that is not where biologists have predicted it would be, and they think
they have refuted Darwin.

In any case, this all seems to be very ad hominem (revisionists trying to
discredit scientists instead of providing data). Even if somewhere out there
in the vast internet a false chart is floating around, it does not refute
global warming. (Note: I don't claim that global warming is real - I don't
know. I just don't like the style of this "mite-gate").

The way to resolve this seems to create some proper charts off proper data.

~~~
yummyfajitas
_(revisionists trying to discredit scientists instead of providing data)_

If you claim P=NP, and I show a flaw in your proof, am I also required to
provide a proof of my own? Of course not.

This is how science should work; both scientists, their competitors, and
anyone else with a passing interest will carefully try to discredit existing
work. In this case, it looks like a specific attempt to discredit it failed.

You are correct that creating proper charts off proper data would be the best
solution. Unfortunately, much of the data has been "lost", making this
impossible.

~~~
Tichy
Sorry I missed how you mean a flaw in the proof. The thing about evolution
theory is that evolution is a stochastic process. So there tend to be not
precise predictions, just likelihoods. Also, it is being applied in thousands
(or millions) of cases to explain complex systems. If sometimes somebody who
applies it gets it wrong (like say some biologist trying to make sense of bone
findings and creating a theory of the evolution of some species or other), it
does not refute the whole of evolution theory. It only refutes that particular
application.

It is like physics: millions of buildings have been built employing Newtons
laws. Sometimes an architect gets it wrong and the building falls down. But
that doesn't refute the laws of physics - it is just some guy or a team of
builders making an error.

~~~
yummyfajitas
I'm not disagreeing with that. AGW may be true while the corrections to this
data series may be false, and this data series may be only a small piece to
the puzzle.

However, fraud and errors are different. Science is (to a great extend) based
on trust. When I referee a paper, I trust that Fig 1 is really generated by
Algorithm 2 if the caption says so. If it turns out that Fig 1 is
photoshopped, my report is meaningless. Peer review is meant to find mistakes
and evaluate importance, not eliminate fraud.

If a scientist cheats, we must consider all their work to be flawed until they
can prove otherwise (i.e., release all data, code, etc). We need to stop and
scrutinize all the fruit of their poisoned tree.

AGW may be true, and the science may be solid. But until we have more
openness, we will never really know.

~~~
Retric
_Science is (to a great extend) based on trust._

No, it's easier to assume trust, but science works best when there is almost
zero trust in any single authority. This is why it's expected for you to
present your methods, data, math, and then your conclusions. Ignoring malice a
tiny mistake can drastically change an experiments outcome. So an experiment
yet to be independently verified is next to worthless, but it's not sexy to
work like that and it does not help your reputation.

Edit: IMO, what's missing from this discussion is the idea that independently
verifying their results is a reasonable thing to do. This does not mean
plugging their numbers into their code and looking at the results but starting
from scratch using their methods and testing it yourself.

PS: One of the reasons that social science has such a poor reputation is the
willingness to accept minuscule sample sizes as representing real research. A
little less trust would go a long way to revitalizing those fields.
Unfortunately neuroscience seems to be falling for this trap.

~~~
yummyfajitas
It's true that in the platonic ideal, referees would not trust authors.

In reality, referees do trust that authors are making a good faith effort to
present their results accurately (except for a little bit of cherrypicking and
glossing over messy parts). As a referee, I'm simply not paid enough to search
for fraud.

~~~
Tichy
Well the anti-warming crowd does have enough of an incentive to search for
fraud, I would expect?

~~~
yummyfajitas
They have incentive, but not the means.

To search for mistakes, I skim a paper, think "yeah, that graph looks about
right", [1] and give an opinion.

To search for fraud, I would need the entire source tree for the paper: data,
programs, etc. I'll need to verify that the source code does what the paper
says, and that the output of the program is really the source of the graphs.
(This ignores the issue of fraudulent data, but that's really tricky to find.)

Since the data was destroyed (oops, I mean lost), this simply can't be done.

[1] I also check the proofs for mistakes, but that's irrelevant for this
discussion.

~~~
Tichy
Sorry, I just don't buy this. If there is no data, then the paper (which one?
is there one?) is worthless. Academia might be fucked up, but it is not THAT
fucked up. Otherwise I'd hand in my PhD thesis tomorrow, proving that I cured
AIDS - I just lost the data. So I don't see why we even discuss this case. It
reminds of a recent story where the cleaning lady threw away the rare mice
shit some PhD had been collecting for years for his thesis. Funny story for
the magazines, but nothing with any impact. No data, no paper, end of story.

Also I don't think this was the only data set in the entire world that relates
to global warming. So it hardly invalidates all climate research if some
random researcher formats his hard drive.

As for means, isn't the whole fossile fuel industry supposed to be behind the
warming skeptics? They should have shitloads of money. In fact I wonder if by
now they have their own research stations in the arctic, drilling for historic
ice. I mean, they should? They really can not afford to buy their own
thermometers?

I think they would probably have enough money to buy dozens of universities
dedicated to climate research.

~~~
yummyfajitas
The data is claimed to have previously existed, but has now been deleted
(links at bottom of this comment). In general, academia is not that fucked up.
Medical journals certainly won't let you get away with this.

Climate science appears to have a lower set of standards.

Some of the CRU emails criticize climate journals which were considering
adopting a "release all data" policy, and argued for not submitting papers to
those journals. Other CRU emails discuss deleting data to prevent Steve
McIntyre from getting it, and conspire not to release the data to him or other
skeptics. This is why climategate is such a big deal.

As for other data sets, there are a few others. However, the CRU is one of the
major sources of paleoclimate data. They aren't the only game in the world,
but they are a big game. Both their data, and various analysis which are based
on their data, are now inherently suspicious.

To make an analogy, imagine that a major open source regex library (maybe
PCRE) is completely tainted (e.g., a rootkit). How large an impact would that
have on open source software?

[http://www.timesonline.co.uk/tol/news/environment/article693...](http://www.timesonline.co.uk/tol/news/environment/article6936328.ece)

<http://news.ycombinator.com/item?id=966336>

------
jgrahamc
Yes, but as someone points out in the comments there's another file where it
is used: <http://di2.nu/foia/harris-tree/briffa_sep98_e.pro>

~~~
waterlesscloud
I await the explanation of this little detail.

~~~
spamizbad
See spoondan's reply.

------
joecode
Meanwhile, the Arctic is melting...

Maybe some of the climate change scientists got caught up in the politics of
the situation and inappropriately exaggerated matters. But consider this:
_every single_ national academy of science in the world agrees the basic
theory of global warming is correct. That's pretty overwhelming.

So there are two possibilities: it's all a big hoax by scientists, or it's
nearly certain, but an unwarranted level of doubt has been fueled by, well,
the industries that want to keep burning fuel... So what, really, sounds more
likely?

Let's say there's only an 80% chance the theory is correct. If it's not, then
we're screwed anyhow. If it is, we might just be able prevent a whole lot of
trouble if we act soon. How certain do we have to be?

------
Joeboy
Does anybody know what, if any, published work this data has made it into?

------
thras
Anyone searching for outright fraud in in these emails is going to be
disappointed. The problem is not fraud, but rather scientists who believe that
they are correct, and that their job is now not science but public relations.

That said, finding this was a big deal:
[http://camirror.wordpress.com/2009/11/26/new-the-deleted-
dat...](http://camirror.wordpress.com/2009/11/26/new-the-deleted-data/)

~~~
spoondan
The only thing that's a "big deal" about that is the laziness and mendacity of
its author. The data he found was published in 1998 _and_ 2000, a fact he
mentions without irony while arguing that the data was "hidden".

The linked article also claims that the NOAA "deleted" the post-1960 data from
a data set (briffa2001jgr3.txt) put together for an article _that doesn't use
these data_. In other words, what really happened, is that Briffa and his
collaborators created a data set only containing the data they used, and this
derived data set was archived by NOAA. That's hardly nefarious or even the
least bit worrisome.

Now to the heart of the issue: Why was the post-1960 not used after 1998/2000?
Why do all of the e-mails talk about "hiding the decline" and the code talk
about stopping in 1960 to "avoid the decline"? The answer, it turns out, is
right in the very Briffa 1998 paper your link mentions (but fails to cite).
It's title is _Reduced sensitivity of recent tree growth to temperature at
high northern latitudes_ and it's about how temperatures _reconstructed_ using
Briffa's MXD (maximum latewood density) data _diverge from actual temperature
readings_ after 1960. The implication of the word "reconstructed" is very
important to understand: Briffa's MXD data don't show an actual decline in
temperature. They show a decline in a particular measurement of tree-ring
density that had, until 1960, been a good indicator of actual temperature. All
that's going on here is that data that is known to be inaccurate is not being
used.

It's no big deal.

~~~
motters
How do we know that post 1960 tree ring data is bad, and needs to be adjusted,
yet pre 1960 data is good? What happened to tree growth after 1960?

~~~
thras
It's even better if you read the Briffa letter to _Nature_ that he cited:
[http://www.nature.com/nature/journal/v391/n6668/abs/391678a0...](http://www.nature.com/nature/journal/v391/n6668/abs/391678a0.html)

Something completely _unknown_ is making tree rings too small after 1960. And
if you correlate your tree ring data to post-1960 temperatures it makes
historical temperature look way too high.

Bizarro-Briffa sent a similar letter to Bizarro-Nature warning of an unknown
divergence problem and saying not to correlate tree rings to pre-1960
temperatures because it makes historical temperature look way too low.

~~~
spoondan
I don't know what you're trying to argue. Tree rings underestimate current,
not historical, temperatures. This means that, yes, if you calibrate to
current temperature, the past will look comparatively hotter. This is a causal
relationship.

~~~
waterlesscloud
At what other periods to tree rings not correctly correlate to temperature?
How do we know the last 50 years is the only time this has happened?

Since we do not know why the temperature has not correctly correlated over the
last 50s years, we can't really know when else that has been the case, can we?

So why do we rely on tree rings for any period?

They are _proven_ to be unreliable over a signifcant period, and we don't know
if or how often that's also true of the past.

~~~
spoondan
You can cross-check historical reconstructions to assess their accuracy for
given periods. I don't know how often this has been done and if other
significant, unexplained, and unaccounted for divergences have been found. I
haven't heard of any, but I'm not an expert. For complete answers to your
questions, I'm afraid you may have to review the literature.

~~~
foldr
If you're not an expert, please don't claim that it's feasible to cross-check
this data against historical reconstructions -- you clearly do not know
whether it is or not.

~~~
smcq
I'm not an expert in climate change either, but I'm a journeyman in statistics
and I support his claim that it can be correlated.

~~~
foldr
How can you support that claim without having any relevant knowledge? I don't
see that a general knowledge of statistics is particularly relevant here --
we're all well aware that things can, in general, be correlated.

~~~
gnaritas
Scientists don't rely solely on tree ring data for historical temperatures,
it's one of many methods used. Thus the tree ring data can be correlated
against other measurements to know it's accurate for past time periods. This
should be obvious.

~~~
foldr
> This should be obvious.

Thus, either (a) it has already been done, or (b) it is more difficult than
you would expect for non-obvious reasons. So what is your point? Have you even
bothered to find out if anyone has done it or not?

~~~
smcq
Fallacy of bringing absolutely nothing to the debate and asserting to the
point of ad homonim that scientists would have skipped basic statistics on
their data. Please see the post immediately above yours and go demonstrate
that the method is flaw before continuing this debate.

The standard of evidence to show fault is yours. Assuming that papers
published in freaking NATURE have sound statistics is a reasonable thing for a
layman to do.

~~~
foldr
>Fallacy of bringing absolutely nothing to the debate and asserting to the
point of ad homonim that scientists would have skipped basic statistics on
their data.

I thought that's what you were asserting. My assumption is that scientists
have correlated tree ring data with other sources and found it to be
unreliable. But of course, neither my nor your assumptions about what's
possible and impossible in this area are worth a damn, because we __don't know
what we're talking about__. That is my point.

Also, all this talk about fallacies, ad homonim, etc. etc., is very
"internet". Especially since you only seem to have a vague grasp of what these
words actually mean. Do you think we could maybe leave all that stuff out and
have a serious discussion?

~~~
smcq
Moving the goalpost.

------
icodemyownshit
"My sincere apologies, I'm not an IDL programmer and I jumped the gun."

~~~
kscaldef
It's worth pointing out, for those who may not have RTFA or RTFComments, that
this statement was a retraction from a commenter who attacked the claim that
the code in question was dead code.

