
Explaining to BBC Newsnight some of the problems with the CRU code - jgrahamc
http://news.bbc.co.uk/1/hi/programmes/newsnight/8395514.stm
======
motters
Having source code reviews via Newsnight is the ultimate confirmation in my
mind that all of this should have been open source from the beginning. It
would be trivial, and cost CRU nothing, to set up an open source project which
would render scandals such as this impossible.

A subject like global warming is potentially of such high importance that it
shouldn't be left to cowboy programmers to process the data. I could have said
"amateur" programmers, but there are many amateurs out there who produce very
high quality code.

~~~
cabalamat
All software used in science should be open source, because if it isn't, the
science isn't repeatable.

~~~
gloob
Not true. For instance, if a report is originally typed in a closed-source,
commercial text editor, that has no impact on the repeatability of the
science. Similarly, Mathematica is (to my knowledge) a closed-source program,
but calculations done by it can still be independently verified (so long as
those calculations are made public).

You're on the right track, but need to narrow down the wording some.

~~~
cabalamat
> You're on the right track, but need to narrow down the wording some.

You're right. I was actually thinking that as I typed it.

How about this: when scientists take raw input data and use complex processes
to convert it (by complex I mean more than you could do with a pocket
calculator in a minute), then the programs doing the conversion must be open
source, so that others may repeat the work done in processing.

I'm not familiar with Mathematica; how complex is the work that it does? And
how easy is it to verify that its results are correct?

------
tome
Great presentation John. You made your points very clearly, and it seems very
helpful for the non-experts watching.

------
jgrahamc
The only thing they got wrong was where I said I was 'shocked'. I'm not
shocked, I'm surprised, and disappointed by the quality of the code.

~~~
tome
No, I'm not shocked either. Mathematicians and scientists have no formal
training in programming (at least in the UK). They're expected just to pick it
up and get on with it.

They don't generally put in the necessary hours like the "self taught
programmer" does, because they're generally not interested in programming
itself. It's just a calculational annoyance that gets in the way.

~~~
Silhouette
> Mathematicians and scientists have no formal training in programming (at
> least in the UK). They're expected just to pick it up and get on with it.

That's an unreasonable generalisation.

For one thing, many university courses in science and mathematics do seem to
incorporate basic programming training these days, either as part of the
course, or via other facilities that the university makes available to those
taking such courses. Any programming skills acquired may or may not contribute
directly to the grade of degree awarded, but that is a different issue.

For another thing, I can attest that some scientists do make an effort,
because I have personally been paid by PhD students to tutor them in the
appropriate programming techniques to implement their models soundly.

~~~
tome
I'm talking from my experience of attending the most highly regarded
mathematics undergraduate programme in the UK. There was basic programming
training, but the emphasis was on the "basic". There was absolutely no
emphasis placed on writing maintainable code. We (in fact "they": I didn't
attend the classes) were simply taught some technical details of C. In fact I
think the number of hours of teaching time was single figures.

Now I can't speak for the engineers at the same university. As far as I know
they were given excellent training in C++. But simply, the mathematicians did
not take programming seriously, so I'm not surprised of the low quality of the
implemented models.

~~~
Silhouette
> I'm talking from my experience of attending the most highly regarded
> mathematics undergraduate programme in the UK.

That's a bold claim. Which undergraduate programme would that be, and by whom
is it the most highly regarded in the UK?

~~~
pmjordan
I don't know which university the parent is referring to, but I can confirm
that this was also the situation at the University of York, a top-10 UK
university, at least at the time. (90%+ were people who hadn't quite made it
into the top 2 - hence nicknames of "University of Dork" and "University of
Oxford/Cambridge Rejects")

The problem existed not only mathematics, where I ended up giving extra
programming classes to friends because the teaching was practically non-
existant. Also physics along with the computer simulation and theoretical
branches of physics, which I happened to be studying. In fact, we were only
taught rudimentary Fortran (!) syntax, and laughably, OpenMP and MPI later on.
Nobody who made it through the course alive did so without having either
learned to program earlier or investing a large amount of time in self-study
and then "winging it" - after all, the final project depended on being able to
deliver a simulation of whatever physical system was being investigated.

I started programming when I was 10, so I had no trouble; together with a
colleague who had a similar programming history as I, we introduced the rest
of our year to basic good programming practices and workflows (split your
functionality into stand-alone functions and files; use version control; avoid
global variables; the merits of code reviews; etc.). A lot of people had
already switched to experimental physics before we started doing this.
Obviously, expecting people to understand concurrent programming at that level
is pretty dangerous. In my experience, this sort of attitude breeds a view
that computer programs are magical things, and once they compile and don't
crash or hang, you can trust their results. This exact attitude is what I'm
reminded of when I read about this CRU incident.

I don't think any of the staff ever took the issue seriously, though. Even if
they weren't prepared to teach it, they should have told everyone to teach
themselves over the holidays and provide exercises or whatever. At no point
did anyone point out that programming was a difficult craft, and that mastery
of it would be expected.

For context to those unfamiliar with the majority of UK science degree
programmes: they are typically not as modular or flexible as in other
countries. If you failed a module, you had one chance to re-sit the exam a few
weeks later. There was no option of re-taking the module including lectures
during the next term/semester; each term had a pre-assigned set of modules,
there was no flexibility in when they could be taken, so you couldn't defer
your concurrency project until you'd taken some more programming classes or
so.

~~~
tome
_split your functionality into stand-alone functions and files; use version
control; avoid global variables; the merits of code reviews_

Yes, these are exactly the kinds of things I'm talking about that I have not
seen taught in undergraduate mathematics programmes, but they are _vital_ to
writing good quality scientific code.

 _At no point did anyone point out that programming was a difficult craft, and
that mastery of it would be expected_

Not only difficult, but even experienced programmers introduce serious bugs by
accident. Often they find them because they do extensive testing! This is not
done to such a degree for scientific code.

------
Luyt
The unveilings of the CRU materials now also caused Al Gore to cancel his talk
at Copenhagen. [http://www.examiner.com/x-11224-Baltimore-Weather-
Examiner~y...](http://www.examiner.com/x-11224-Baltimore-Weather-
Examiner~y2009m12d3-Climategate-emails-force-Al-Gore-to-cancel-talk-at-
Copenhagen)

~~~
motters
An inconvenient cancellation.

~~~
dantheman
Responses like this, while quite popular on reddit, hurt the overall quality
of HN.

------
cdavid
This has to be the stupidest critic I have seen yet on this whole thing. In
particular, complaining that the code is not commercial quality is laughable.

Most of the scientific code is awful, in climate science and elsewhere,
because the constraints are totally different than commercial code.
Maintainability is not so much a concern, and it is only a tool to obtain a
result, not an end in itself. People don't care about the code quality most of
the time.

There is another concern, which is that you rarely know what you exactly want
when programming for research. You don't always have the time to design
reasonable API, because the requirements keep changing - and keep in mind that
you cannot spend too much time on this because you have to write papers, do
some actual research, etc...

Even in my field, which is engineering and where people are expected to be
more "programming methods-aware", most of the code is awful. My own code for
research is awfully bad compared to what I release as open source for
scientific programming: when I release something, I easily spend several times
as much work on the thing, because of documentation issues, etc... Doing this
for all my code is simply not possible.

If you want to dismiss climate science because of code quality, be ready to
throw away the majority of science, from biology to IA through EE. Just look
at lapack code: the code is awful, using goto to avoid structured programming,
etc... (e.g. <http://www.netlib.org/lapack/double/dsgesv.f>). And it is used
by 99 % of code which does linear algebra, including commercial software.

~~~
AndrewO
That's a pretty cavalier attitude to have when so much depends on the results
(and I mean research programming in general, not just the CRU case).
Maintainability may not be a concern, but verifiability certainly should be.

~~~
cdavid
Verifiability in science is not a matter of having one program checked by
different people. The reality is much more complex than that. This so called
climate-gate says more about the perception of research than climate science
(although it has certainly became a PR-nightmare for climate science). I think
it is just another case of people have an idealistic, out of reality knowledge
of how science works. Verifiability is less a concern than most people meant
it to be for research. In an ideal world, it should be, but it isn't how
science has been done - I would argue it has never been the case.

Scientists are actually very conservative for most parts exactly for this
reason: in practice, it is just not possible to rely on consistent
repeatability, etc... When you have an established, widely shared view on a
topic, it takes great leaps to break it. Most research is not done to break
the consensus, but to reenforce it. That's why today, if you want to show that
climate is not caused by human activity, you will need to present data which
are not as good as the ones one has now, but which are much, much better. So
it is most likely true that if you want grants, etc... the easy way, going
along the consensus is the right path.

That may not be how people think science should be done, but that's how it is
done in practice in my experience. I think that for most researchers, all this
affair sounds like a storm in a teapot. The behavior may look shady at times,
the experiment is not always great, but that's exactly why for a new consensus
to be created, you need to have better proof than the existing consensus. And
yes, politics come into play, yes a lot of it is mean and some of it is not
ethical. But this does not invalidate science in general - it invalidates how
people think science works.

------
gsiener
Is this code available anywhere? Is there an effort to embrace and improve
this codebase?

