
How a word sparked a four-year saga of climate fact-checking - nl
https://theconversation.com/how-a-single-word-sparked-a-four-year-saga-of-climate-fact-checking-and-blog-backlash-62174
======
wolfgke
I see a large problem in the fact that (as it seems) the original data and the
computer code is not published, too. This is in my opinion a contradiction to
the scientific principle that any result or conclusion has to be able to
verified independently as much as possible. Though not publishing data and
computer programs (and ideally a virtual machine for independent execution of
the program code, too) is unluckily accepted practise in science.

If it were otherwise any critic of the result could reapply the computer
program to the non-detrended data and see that the conclusion will not change.
On the other hand, if the computer program and data is not published he has to
trust the conclusions that the researchers came to. So the only way for an
independent person to check the result is to find indirect evidence that the
conclusion could be wrong. And finding out that a different data set than
(wrongly) claimed was used is exactly such indirect evidence (and in my
opinion nearly as far as one is able to if one doesn't have access to the used
data and computer programs).

~~~
DasBlob
While I certainly share your sentiment that both, data and analysis software,
should be available, I do see a conflict of interest here that cannot be
easily resolved. On the one hand, we do assert (wrongly, IMO) that competition
is a necessary element to drive scientific innovation and consequently require
scientific actors to adhere to economic principles, i.e. publish more and
better-received papers. On the other hand, allowing researchers to keep their
competitive advantage, and having exclusive access to research data certainly
is, contradicts the principle of reproducability and unneccessarily restricts
further research that could be done based on that data. I don't see a way to
easily resolve that in the current scientific framework. One (less than
optimal) approach, that is frequently taken in astronomy, is to allow
exclusive access to telescope data to the members of the consortium or the
principal investigators for a certain amount of time, after which it is
released to public. As for the software, it's easier. if the analysis method
is sufficiently well documented (and it should be in any publication), people
with access to the original data can easily attempt to reproduce the results
and either way they'll get a publication out of it. So there is no incentive
to keep the analysis software but there is a huge incentive to keep the
original data.

~~~
tremon
_both data and analysis software should be available_

I think that goes too far. The data alone is sufficient for independent
verification, I don't think we should mandate that scientists publish their
tools. However, I would very much support scientists using open tools to begin
with. But if the tool is closed, that should not invalidate any results in the
paper.

~~~
elcapitan
Not sure if there is so much to support - they most likely run some archaic
Fortran code that only 5 people in the world understand, compiled for some
specific cluster, where it runs multiple days for one dataset, and is only
numerically stable in that particular environment.

~~~
wolfgke
This is why on
[https://news.ycombinator.com/item?id=12099720](https://news.ycombinator.com/item?id=12099720)
I also suggested to provide a virtual machine image if possible (unluckily
providing one is in my opinion not always possible). On the other hand if
another team ran the code independently and we really saw that it is
numerically unstable in another environment, this would be, in my opinion, a
strong case _for_ the importance of providing the source code.

~~~
dalke
I don't think you understand what elcapitan means by numerically stable.

Numeric stability is a function of the environment. If I take an algorithm
designed for 128 bit IEEE binary floats and implement it on a system with 64
bit floats, then it may be unstable.

That doesn't say anything about the correctness of the original method for the
original environment. It only means the method wasn't designed for the new
environment.

You are used to a world where computers are a commodity, likely based around
the Intel architecture, and almost certainly using IEEE floats.

What do you do with a computer program written for Anton, a specialized
computer for doing molecular dynamics built using specialized ASICs?
[https://en.wikipedia.org/wiki/Anton_(computer)](https://en.wikipedia.org/wiki/Anton_\(computer\))

~~~
wolfgke
First: IEEE 754 defines exactly (bit per bit) what the result of any floating
point operation in each of the five rounding modes as long as no NaN is
produced (but in this case only not all bits are defined exactly, but it will
still be a NaN).

Unluckily C allows some optimizations to be done to code containing floating
point code that violates this principle of bit-for-bit reproducability of
floating point code. If this causes problems the code can be considered as
numerically unstable.

In other words: Writing code that uses the IEE 754 defined "gold standard" in
its code should be the goal scientific results.

The correct way for independent checking, if we really need 128 bit floating
point numbers, is to wrap these operations by a software floating point
library.

> It only means the method wasn't designed for the new environment.

If this property (what kind of environment the algorithm depends on) of the
algorithm is not documented properly (and an explanation is given why we need
this property (say: size of floating points, in particular unusual floating
point sizes as 128 bit) for the algorithm to work correctly) any independent
reviewer should better not assume these properties to hold. If we indeed find
out that this causes problems, this is a strong sign to me that there might be
subtile errors in the code. In other words: We should be really careful of the
results that the algorithm gave.

> What do you do with a computer program written for Anton, a specialized
> computer for doing molecular dynamics built using specialized ASICs?

I gave the answer in the post above: "[U]nluckily providing [a virtual
machine] is in my opinion not always possible".

~~~
dalke
I don't follow the point of your first four paragraphs. I specifically said
the code was numerically stable on 128 bits - why are you introducing those
irrelevant facts, like how C compiler details can make some algorithms
unstable?

Really it comes down to what you mean by "another environment" and what your
case is for access to the source code.

Let me give a simpler example. I wrote code which expects ILP32. It's stable
under ILP32. However, it produces different and non-deterministic answers
under LP64. That is "another environment."

Does the numerical instability under a different environment cast doubt on its
validity in the original environment? Why is it a strong sign that there may
be subtle errors?

------
jkot
> _Meanwhile, our team received a flurry of hate mail and an onslaught of
> time-consuming Freedom of Information requests for access to our raw data
> and years of our emails, in search of ammunition to undermine and discredit
> our team and results._

> _The mammoth process involved three extra rounds of peer-review and four new
> peer-reviewers. From the original submission on 3 November, 2011, to the
> paper’s re-acceptance on 26 April, 2016, the manuscript was reviewed by
> seven reviewers and two editors, underwent nine rounds of revisions, and was
> assessed a total of 21 times – not to mention the countless rounds of
> internal revisions made by our research team and data contributors. One
> reviewer even commented that we had done “a commendable, perhaps bordering
> on an insane, amount of work”._

So one side you do this "mammoth" and "insane" task to review paper. On other
side you block people from doing independent audit?

Climate studies decide how billions (if not trillions) of dollars are spend.
People should be allowed to cross examine raw data and program source code.

~~~
bertil
My understanding is that for climate science, a lot of effort is spent to
debunk claims, by well-financed groups who will publish without independent
peer-review but with a clear goal in mind. Allowing those groups to access the
same data means that the researchers will have to spend a lot more effort
explaining why those claims are unfounded. That means the researchers are not
really encouraged into sharing the data.

There is a more general issue in science or any data-intensive practice
(Google ranking is another example) where the argument “People should be
allowed to cross examine raw data” fails to acknowledge that ‘People’
generally do not have the ability or know-how to, and those who can generally
have interested financial backing. In many cases, it is positive (Open Source
Map is a good example of that: my employer is interested in identifying, say,
bike parking spots) but there are examples where de-anonymisation, competitive
pressure, make “sharing data” a pragmatic question that needs more than
principles.

Will more people who can contribute positively to climate science help if they
had free access to those data? It is the case for astronomy; I am less
convinced for climate science.

~~~
jkot
Paranoid talks about "well financed groups" have no place in science.

Climate science is way more important than astronomy. It should have much
higher standards for transparency, not other way around.

~~~
XorNot
Get rid of copyright law, then everyone can have the data. People keep
ignoring the fact that you _can 't_ just release data like that all over the
place, because researchers often do not have the IP rights to do so in a
general way (whereas within the field other researchers probably do have IP
rights to use the same data).

~~~
wolfgke
> People keep ignoring the fact that you _can 't_ just release data like that
> all over the place, because researchers often do not have the IP rights to
> do so in a general way

While I agree that this is a problem in the current copyright law, there is
still a way for researchers to "circumvent" this copyright restriction without
violating the scientific principle of independent verifiability that I
outlined in
[https://news.ycombinator.com/item?id=12099720](https://news.ycombinator.com/item?id=12099720):

Let's say that the data is generally distributed by the owner in a standard
format, say, a zip archive, which we will name foo.zip here (for simplicity).
If you can't distribute the data itself, you write into your research paper a
short explanation of the license terms that disallow the distribution of
foo.zip. But you also add a notice what the SHA256 sum of foo.zip is. If some
person who has also access to the data (say, from another university) has
interest that the research result can be checked independently, they will leak
foo.zip somewhere (I can imagine that sci-hub would be willing to provide
server space for this).

This way access for independent verifiers can be "assured" without the
original researchers having to violate copyright law.

------
frankmcsherry
I'm personally inclined to support the climate change folks, being a scientist
and stuff, but it's a bit crap to claim that it was a one word typo when the
original text apparently read:

> For predictor selection, both proxy climate and instrumental data were
> linearly detrended over the 1921–1990 period to avoid inflating the
> correlation coefficient due to the presence of the global warming signal
> present in the observed temperature record. Only records that were
> significantly ([pre]p<0.05[/pre]) correlated with the detrended instrumental
> target over the 1921–1990 period were selected for analysis.

That isn't a typo, it is just being wrong. I'm glad they re-did the work to
confirm it was still true, but it's hard to argue with folks who didn't think
the claim was supported by the data.

~~~
jasonellis
The one word typo was in the code, not the paper. It was setting DETREND to
FALSE instead of TRUE. Their assertions were based on thinking they were
working with detrended data instead of the raw data that they actually used.

~~~
frankmcsherry
In which case, one shouldn't write

> Instead of taking the easy way out and just correcting the single word in
> the page proof, we asked the publisher to put our paper on hold

------
dzdt
Global warming denialists are attempting a denial-of-service attack against
global warming science. The goal is to keep the science unsettled so no major
policy changes are made while those making the most profits from climate-
changing activities continue to do so.

~~~
socialist_coder
Yup; just like tobacco, acid raid, DDT, and the hole in the ozone layer. Very
very sad. It shows you how powerful greed is, when people are willing to
sacrifice humans lives and even the future of our planet just to maximize
profits.

This is one of the main failures of our capitalist economy, IMO. The system
needs a built in way to assign a value on sustainability and human well being,
rather than just profits.

------
barney54
For another view of this issue:
[https://climateaudit.org/tag/gergis/](https://climateaudit.org/tag/gergis/)

------
patrickg_zill
General observation is that neither the mathematical models nor the data is
perfect.

For example, oceanic thermocline data, from what I understand, is not really
collected.

We have an imperfect model being fed incomplete data? Why be surprised at
debate over it?

People laugh today at ether and phlogiston debates, but such acrimonious
debate pushed forward science to the point where we had exactly the right
answer.

~~~
sxcurry
The problem is that we're not really having debate, as this article makes
clear. We are having scientists attacked by 2 groups: climate deniers funded
by large corporate masters, and random groups of crackpots. Real debate would
be good but I think we're well past that point.

------
SeanDav
No idea why this is a surprising, even controversial result. We are are on the
warming side of a recent (geologically speaking) ice age (actually glacial
period) and we expect long term temperatures to be trending ever upwards.

~~~
phaemon
No we're not. The complete opposite in fact: the planet would naturally be
cooling now were it not for the increased amount of greenhouse gases in the
atmosphere.

~~~
SeanDav
> _" Technically speaking, we’re living during an ‘Ice Age’ today – in the
> sense that we have a world with glaciers. However, the present time is a
> relatively warm phase within a period of geological time when glaciers and
> ice sheets have typically been larger and more extensive than now."_

Taken from:
[http://www.rgs.org/OurWork/Schools/Teaching+resources/Key+St...](http://www.rgs.org/OurWork/Schools/Teaching+resources/Key+Stage+3+resources/Glaciation+and+geological+timescales/Ice+Ages+and+geological+timescales.htm)

There are other sources, but this is one example that makes it clear we have
been in a warming phase the last few thousand years.

~~~
a2l
Also from the same site:

> "Since the industrial revolution in the 18th century, the concentration of
> carbon dioxide in the atmosphere (an important greenhouse gas) has risen
> significantly above the level it would be naturally. It now stands at over
> 400 parts per million (ppm) in the atmosphere. Evidence from ice cores (see
> Lesson four) tells us that the normal level for CO2 in the atmosphere during
> ‘interglacial’ times (such as the Holocene in which we now live) is 270 to
> 290 ppm, and that at no time over the last 800,000 (the time covered by ice
> cores) has the CO2 level been as high as it is now. (CO2 is thought to be at
> its highest level for three to five million years.)"

[http://www.rgs.org/OurWork/Schools/Teaching+resources/Key+St...](http://www.rgs.org/OurWork/Schools/Teaching+resources/Key+Stage+3+resources/Glaciation+and+geological+timescales/The+Ice+Age+postponed+-+Impacts+of+melting+ice+in+a+warming+world.htm)

~~~
SeanDav
You misunderstand me - I am not talking about man made climate change. I am
saying that having data that shows current temperatures are highest ever in
1000 years is not surprising, given that temperatures have been rising for
last several thousand years.

If one had a time machine and went back 1000 years, took temperatures and
compared them to those from previous thousand years, one would not be
surprised to see that on average they were higher.

~~~
socialist_coder
No one is misunderstanding you, you are just wrong.

Even if you were right and we were in a "warming" phase, the velocity of the
change should be on a geological scale; that is, very very very tiny. We are
seeing rates of warming orders of magnitudes higher, which cannot possibly be
anything but man made.

~~~
SeanDav
At no point have I mentioned scale of change or rate of change. That is
irrelevant to the point I am making: that during a warming phase, average
temperatures will be the highest they have ever been at the end of the time
period under consideration. So a result confirming that is hardly news.

4000 years ago there were still mammoths and far greater glaciation than now,
so clearly climate has been warming over the last few thousand years.

~~~
SamPhillips
Sure. But if someone is saying "the house is on fire" and you reply, "Ah, Yes,
clearly we should expect it to be getting warmer, after all it's springtime
now", it's not much of addition to the conversation.

