
Why we can ignore reviews of scientific code by commercial software developers - brunojppb
https://philbull.wordpress.com/2020/05/10/why-you-can-ignore-reviews-of-scientific-code-by-commercial-software-developers/
======
matthewdgreen
I read the critique in depth. Let me summarize the claims:

1\. The developers had a couple of bugs that cause non-deterministic outputs
in weird settings. It's a randomized model so this isn't a disaster, but
obviously it would be better to have all non-determinism derived from a single
seed. However, none of the bugs point to scientific issues. Still, these are
real bugs that should be fixed. And indeed: another team pointed out some
issues, they triaged the report, and it appears that they've been fixing these
issues.

2\. The code doesn't run well multi-threaded, and so the scientists run it on
a single core. This seems like a feature the developers have de-prioritized.
The critics make a HUGE deal about how using a single core is inefficient, but
at the end of the day: if scientists can get simulations done in a reasonable
time using a single core, this seems like a valid prioritization of limited
development effort. The critics know better, obviously.

3\. There aren't a bunch of unit tests. But as the authors point out code is
in development and being altered continuously to reflect new developments in
society and new knowledge about the virus, so it seems hard to develop unit
tests that won't become broken as the underlying components change.

4\. The critics seem to have a strong grounding in epidemiology. I'm just
kidding, they claim to be pure software engineers, but still insist on
spending a significant fraction of their critique disputing the calculation of
R0 and lecturing the epidemiologists on how they think it should be
calculated. This is a giant red flag given that the critics' sole stated
expertise is software engineering.

5\. The developers are continuing to add features to simulate new aspects of
the ongoing pandemic (e.g., new code to simulate the impact of contact tracing
apps) instead of fixing every minor issue in the code. Given that we're in the
midst of a deadly epidemic that is putting human beings into the ground, this
seems like a completely unavoidable resource allocation decision to me.

6\. The critics can't even be bothered to pretend their criticism is
objective. E.g., the piece concludes with "on a personal level, I’d go further
and suggest that all academic epidemiology be defunded." Yeah, sure, you're a
totally disinterested software expert here.

I guarantee you if you look at any piece of software -- commercial or
scientific, no matter the budget -- with the politically-motivated goal of
tearing apart the author, you will succeed. I can say this having spent years
doing security and expert witness code reviews for some of the biggest and
smallest companies in the software industry. This piece isn't a critique, it's
just a hatchet job. And not even a particularly creative one.

~~~
thu2111
What is point 6 meant to mean? The author concludes that the code is unfit for
purpose, because it was written by academics who ultimately have no incentive
to write high quality code even when it's being used to make massive safety-
critical decisions.

This is hardly an unexpected conclusion given articles like this one and
comments like yours, in which people defend buggy code on the grounds that
academics shouldn't even be expected to produce work that isn't buggy.
Academics are claiming they structurally can't produce correct, well
documented code, not even when it's being used to advise governments.

Hence the conclusion that maybe academics shouldn't be writing that sort of
code: how is that not an obvious inference to draw? You seem to believe that
any criticism of academia is "politically motivated". It's the same attitude
that can be seen in Imperial College's non-answer to the problems: you can't
state the private sector has higher standards than academia despite
overwhelming evidence it does, because then you're partisan and should be
automatically ignored?

You're also an academic, aren't you? Surely that means you're the politically
biased one here. How could it be otherwise?

~~~
lakis

      The suggestion was not that academics should not write this kind of code. The suggestions was 

"suggest that all academic epidemiology be defunded"

Not do not write bad code but all epidemiology be defunded. So who will do
epidimioly studies and develop models of deadly diseases? The software
engineers? ;-)

------
Silhouette
The sheer arrogance displayed in this article is breathtaking. Scientists are
so super-smart that not only are they experts on the science, they are also
better at the entirely orthogonal skill of programming than _the entire
programming industry_!

The results from using this software have directly influenced political
decisions that have resulted in _the greatest mass curtailment of personal
liberty in modern history_ , not to mention the rest of the response to a
public health situation that is going to cost hundreds of thousands of lives
around the world.

The idea that code that demonstrably contains bugs and exhibits non-
deterministic behaviour, written in a style that has proven time and again to
be resistant to mechanical testing, formal verification or expert peer review,
should be defended because while it's written in such a style that doesn't
matter because this code was written by _scientists_ is... I don't even have
strong enough words. It's laughable. It's patently absurd.

Sorry for the slightly ranty comment, but I am genuinely furious that people
are trying to defend this work in this way. It is insulting to me as a
professional software developer who has also worked on many scientific and
engineering programs over the years. It is the antithesis of good scientific
practice, where methods are transparently disclosed and peer review and
attempts to reproduce (or fail to reproduce) important results are actively
encouraged. And it is downright offensive to me as someone who can't look
after close family members right now because laws made in no small part
because of this software prevent it.

------
preordained
So, I looked at the article they are refuting (or one of them?) and expected a
bunch of superficial criticism as suggested in this refutation, but it does
not look to me at all that "superficial" criticism is being defended against:
[https://lockdownsceptics.org/code-review-of-fergusons-
model/](https://lockdownsceptics.org/code-review-of-fergusons-model/)

^^^Swallow your politics if you can for a moment and read. At a minimum it
seems that the rebuttal here was being argued in very bad faith. Given the
stage set, I was expecting bikeshedding about the Law if Demeter or JSON vs
other format, etc...

------
VMG
This is just a list of excuses.

Following good practices have nothing to do whether it is "commercial
software" or "front-end software", as the author claims.

~~~
mekane8
Agreed, especially the stuff about end users. Of course you want to prevent
bugs that will impact users, but many of the other things being discussed have
nothing to do with whether there is a non-technical person using the software.

------
returningfory
The article says:

> While these checks could also be handled by unit tests, most scientists
> generally just end up with their own weird set of ad hoc test outputs and
> print statements. It’s ugly, and not infallible, but it tends to work well
> given the intensive nature of our result-testing behaviour and community
> cross-checking.

The sense in which this statement is plainly wrong is actually addressed in
the original article:

> Regressions like that are common when working on a complex piece of
> software, which is why industrial software-engineering teams write automated
> regression tests.

This is the whole point. You can only run your "ad hoc" print-statement-based
tests once off, which is why such "tests" are useless for finding regressions.

------
unsrsly
This debate is sort of in the weeds. The models are not that important. We
have plenty of real world evidence to be very concerned about Covid19. The
following facts are not contradictory: 1\. The code is bad, maybe even
unreliable 2\. Scientific software should adhere to best practices for
software development 3\. At a bare minimum, complex simulation software should
be able to generate correct answers for well-understood toy models 4\. Covid19
is a dangerous disease that can spread exponentially and kill a lot of people
if mitigation measures are not taken (look at Wuhan, Lombardy, NYC)

------
matthewdgreen
Scientific code is no different than prototypes of commercial products. The
goal is to iterate quickly, rather than to make a perfect releasable product.
Take a look at the early internal sketches of the iPhone and iPad and you'll
see they look massively less elegant than commercial products: but nobody
writes breathless articles about how Apple Does Not Known How To Design
Consumer Products because of it.

~~~
electrograv
I think the problem here is when buggy “science code” / “prototype code” is
used to inform life-and-death decisions made in the real world.

It is therefore rather irresponsible of the scientific community to allow such
a bad state of code technical debt to live on for so long, when it’s dealing
with a field that is inherently tied to the study of life-or-death scenarios
(like we are facing right now).

The only way to avoid this inevitable conclusion of irresponsibility would be
if the scientists behind the model warned that it cannot be trusted at all due
to the potential of a large number of unknown bugs (did they?). Otherwise,
either the scientists are irresponsibility incapable of understanding how to
write reliable complex code, or were not honest about the trustworthiness of
their code.

~~~
matthewdgreen
We're in the middle of a rapidly-evolving epidemic that is killing thousands
of people per day. Leaders need information, even imperfect information right
now. There is no time to hire a team of production engineers so they can
release a slightly more stable multi-core version 1.0 in 2023.

If there is ever an asteroid heading towards the earth, I swear that some
segment of HN will write angry articles about how the hastily built nuclear-
powered spaceship we sent to intercept it doesn't have proper unit tests.
They'll demand that we delay the launch until 300 days after impact.

~~~
Silhouette
_We 're in the middle of a rapidly-evolving epidemic that is killing thousands
of people per day. Leaders need information, even imperfect information right
now. There is no time to hire a team of production engineers so they can
release a slightly more stable multi-core version 1.0 in 2023._

We knew this was coming for months. The code isn't that large, it's just
horribly written. If the scientists had worked in collaboration with an expert
team of technical programmers, they could have handed off their mathematical
models and had a well implemented, extensively verified program to run those
models within a useful timeframe.

That might sound extreme, but once again, I will reiterate that expert advice
apparently based in part on the output from this program is being used to
inform decisions that will affect whether thousands live or die and that are
severely curtailing the normal lives of the entire population with other
adverse consequences that nobody fully understands yet. Powerful responses
require powerful justifications.

~~~
matthewdgreen
> If the scientists had worked in collaboration with an expert team of
> technical programmers

If you look at the original article, you'll note that for at least a month the
code has been receiving attention and refactoring from a team at Microsoft.
Your criticism is that an academic team should have somehow convinced
Microsoft to donate these resources _before_ the epidemic became severe.

I'm going to go way out on the thinnest limb and propose that major software
companies were not chomping at the bit to donate resources to epidemic
modeling teams until very recently, and this isn't the fault of the epidemic
modeling teams -- as much as you seem to think it is.

~~~
Silhouette
To clarify, I do not believe any of this is the fault of the academics who
were operating on shoestring budgets before the pandemic became the big issue
that it now is (edit: unless they wilfully misrepresented their capabilities I
suppose). I do think however that the government, with several months of
advance warning that this issue could become very important and having
demonstrably identified key advisors including those academics, failed to
provide more than a shoestring budget to upgrade the capabilities of their
expert team in such an obvious way at a much earlier stage.

I'm also curious about why they brought in the people they did to do whatever
cleaning up was done before publishing this code. There are several businesses
in the UK that do this kind of work as a major part of their activity in one
way or another. Current or former staff with that sort of background would
have been the obvious people to tap for assistance. To the best of my
knowledge, the people who have actually been identified as having helped have
no such specialist background.

------
ken
> A software developer will care more about maintainability and end-user
> experience than a scientific coder, who will likely prize flexibility and
> control instead. Importantly, this means that programming patterns and norms
> that work for one may not work for the other — exhortations to keep code
> simple, to remove cruft, to limit the number of parameters and settings,
> might actually interfere with the intended applications of a scientific code
> for example.

Granted, but when _publishing_ , these need to be primary concerns. Source
code, then, is a communications medium.

Many famous scientists like Einstein and Feynman would do their work in their
heads, not by filling blackboards with equations. When they needed to publish,
they'd write down equations because that is the medium of communications. You
make it as simple as possible, so people can follow you. E=mc^2 is brilliant
_because_ it is so simple. Maxwell's equations got a lot more traction once
they were simplified down to just 4.

Scientists using computers for research or analysis are welcome to use any
messy code they want, but if they wish for others to follow their work, they
need to clean it up to the standards of the industry.

It was beat into my head from a young age that research is only part of being
a scientist. You also need to be able to communicate your ideas -- hence all
of the writing class requirements. Scientists need to take this to heart, and
if they wish to publish in terms of source code, to write source code that
others can read.

When even professionals in the field have trouble verifying your work, you've
got a problem. It's been a while since I worked in scientific computing, but I
never saw a single published paper with a program where the program even
matched the description in the paper it was attached to.

~~~
pvg
There's no quality of communication that can magically circumvent a motivated-
reasoning sort of critique. That's why this is an issue with the critique and
not the other way round.

As a trivial historical example - an entire world superpower regularly and
persistently engaged in and acted on large-scale unscientific motivated
reasoning _against its own self-interest_ and the problem with that was not
that the best scientifically verifiable ideas available were inadequately
communicated.

------
gammarator
Whether the Imperial code is good or bad doesn't actually matter: you can
derive the headline numbers that motivated the lockdowns analytically, without
any simulation at all [1].

(The real problem with the worst case estimate is that it assumes people don't
individually change their behavior in the face of a pandemic.)

[1]
[https://twitter.com/trvrb/status/1258879531022082049](https://twitter.com/trvrb/status/1258879531022082049)

------
zwieback
In my time in research I've come across this contemptuous attitude scientists
have for coders and engineers again and again. It's just arrogance and
insecurity mixed together and has no place in today's research environments
where results depend more and more on complex models.

------
wcoenen
This reminds me of those code reviews where you have a colleague that prefers
to spend hours trying to refute reasonable minor comments, instead of 5
minutes to address them.

~~~
lostmsu
> refute reasonable minor comments

If they are truly minor, they should not block the change.

I am guessing colleague tries to refute reasonability, which totally makes
sense, because if in every review they have to fix minor unreasonable comments
to progress, that stalls the thing completely.

------
guscost
We should probably ignore most "scientific code" altogether. The jig is up.

[https://cacm.acm.org/magazines/2010/9/98038-science-has-
only...](https://cacm.acm.org/magazines/2010/9/98038-science-has-only-two-
legs/)

[https://guscost.com/2020/05/12/pandemic-
woo/](https://guscost.com/2020/05/12/pandemic-woo/)

------
turbine29
This is just a list of excuses. Yes, there are different standards, for
scientific code compared to a website. but any number that is used for a
decision, should be up to a standard. Including unit tests, readability and
cleanliness. If it’s just a model, and ‘doesn’t need these’ then it should
remain in the lab and far from policy decisions.

------
techopoly
I dislike this philosophy, that a person in one field of expertise can have
nothing constructive to say about another, or can't learn it.

Reminds me of "don't roll your own cryptography." Well, all the standard
cryptographic algorithms were rolled by someone, and they had to have thrown
out their first pancake too.

~~~
turbine29
Especially when there is such a big overlap between software engineering and
scientific computing. Not all software engineers write to-do apps in angular

------
OneGuy123
Garbage article, quote from it:

"They aren’t worried about it — not because this is a disaster they are trying
to cover up, but because this is a routine bug that doesn’t really affect
anything important."

If you have a bug you can never know the side effects that it can be causing.

So neither the article author nor the guys who wrote the covid sim understand
complex systems.

It would do them much good to read a few works by Taleb. Their problem is that
they don't know what they don't know, they think they know what they don't
know, but they don't.

Also saying that it's "ok that scientific code is crappy formated/strutured"
clearly demonstrates that these people, neither the author of the article,
truly understand intuitively how quickly things can (and always do) go wrong
with programs because they are complex beasts.

This reads as an acadmic without truly intuitive knowledge of how things can
go badly defending other academics.

If a product is meant to put millions into quarantine it must be damn well
tested and not a quick demo playground.

A good product can obviously evolve from a quick academic playground, but
academics seem to finish a product fully so rarely that they don't understand
that unless they test it severly thourougly that it will have bugs.

~~~
matthewdgreen
Most scientific software developers have been developing and working with
enormously complex codebases for years. You're clearly starting from the
perspective that they're eggheads who have no experience with this area; that
assumption is almost certainly wrong.

~~~
OneGuy123
"Working with and enormous codebase" and "actualing ensuring it works" are two
very different things.

Academic "scientific software developers" (lol) never have to prove anything
works in production long term in majority of cases. They make a one-off
project and never touch it again. So this generlization stands.

And yes, I am saying that they are eggheads.

Because this article and the covid sim clearly shows that "scientific software
developers" (lol) must not be trusted with real world-affecting systems
without some other real world developer checking it.

The fact is that the covid sim is garbage and anyone who tries to defend it
does not understand programming and complex systems well enough.

~~~
matthewdgreen
Scientists have to undergo extremely rigorous peer review from other experts
who are massively incentivized to challenge them and show that their results
are wrong. The individual code may not be reviewed, but spurious results will
be noticed and called out.

The closest thing we have to this in industry is bug reporting and software
security evaluation, and this is sporadic at best. Most commercial software
developers keep their code closed source and do everything they can to stifle
public knowledge of bugs, even after they've been fixed.

~~~
thu2111
Why are you defending this work Matthew? You have your own reputation to
defend as well.

This code has never been peer reviewed, ever. That is indisputable because
Ferguson has claimed multiple times that the code is/was entirely
undocumented, and nobody outside him and his few colleagues understands how it
works. The moment it started being _actually_ reviewed, people found a
breathtaking number of bugs like out-of-bounds reads, typos in random number
generator constants etc. Broken RNGs! Of all things I'd hope an expert in
cryptography would be very worried by, a missing digit in an RNG constant
should surely be one of them?

