
To software engineers criticizing Neil Ferguson’s epidemics simulation code - todsacerdoti
http://blog.khinsen.net/posts/2020/05/18/an-open-letter-to-software-engineers-criticizing-neil-ferguson-s-epidemics-simulation-code/
======
tkiley
I poked at the github repo for a bit. The ugliness of the code doesn't bother
me, but the quantity of parameters does.

Here's one params file that specifies some of the inputs to a run of the
model:

[https://github.com/mrc-ide/covid-
sim/blob/master/data/param_...](https://github.com/mrc-ide/covid-
sim/blob/master/data/param_files/p_PC7_CI_HQ_SD.txt)

Here's another one:

[https://github.com/mrc-ide/covid-
sim/blob/master/data/admin_...](https://github.com/mrc-ide/covid-
sim/blob/master/data/admin_units/United_States_admin.txt)

There are hundreds of constants in there. A lot of them appear to be wild-ass
guesses. Presumably, all of them affect the output of the model in some way.

When a model has enough parameters for which you can make unsubstantiated
guesses, you have a _ton_ of wiggle room to generate whatever particular
output you want. I'd like to see policy and public discussion focus more on
the key parameters (R-naught, hospitalization rate, fatality rate) and less on
overly-sophisticated models.

~~~
wbhart
The problem is, unsophisticated models do not predict anything. You apply them
in one country and they do ok, and apply them in another and they get it
totally and completely wrong.

Unless all important factors are accounted for, they are going to result in
incorrect information for someone. Public policy will then be based on
incorrect predictions. People will grow tired of the predictions being wrong
and they'll give up on data science entirely.

It's already quite bad that people think they can choose their reality by
finding numbers that agree with them and ignoring the ones that don't.

I do understand the point you are making, which is like the epicycles
argument. But in global warming and epidemics alike, more parameters are
actually needed to model reality.

I do agree that those parameters should be based on actual data, not guesses
though. But what value of R would you pick? Is that actually well-constrained?

~~~
datastoat
I would pick a value of R that shows itself to have good predictive accuracy.

The way to test predictive models is always to look for their predictive
accuracy on holdout data. Machine learning has this ingrained. Classic
statistics does this too -- AIC is used to compare models, and it's
(asymptotically) leave-one-out cross validation [1].

There's nothing intrinsically wrong with models that have millions of
parameters; they might overfit in which case they will have poor predictive
accuracy on holdout data, or they might predict well.

I agree with the original article that software engineer scrutiny isn't
appropriate for this sort of code -- but I would argue instead that it needs a
general-purpose statistician or data scientist or ML expert to evaluate its
predictive accuracy. You can't possibly figure this out from a simulator
codebase.

At the time the model was published, and acted on by the UK government, there
was very little data on which to test predictive accuracy. That's fine -- all
it means is that the predictions should have been presented with gigantic
confidence intervals.

[1]
[http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf](http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf)

~~~
rainforest
The model isn't predictive though - it's a simulator. If we'd waited until we
had enough data to make predictions with it (which I doubt you could given the
sheer number of parameters) it'd be too late to use any of the interventions.

How would you ethically collect training data for the interventions?

~~~
datastoat
The outputs of the model _were_ being treated as predictions.

The Ferguson paper from 16 March used the language of prediction: "In the
(unlikely) absence of any control measures [...] given an estimated R0 of 2.4,
we predict 81% of the GB and US populations would be infected over the course
of the epidemic." [1]. The news coverage also used that language: "Imperial
researchers model likely impact of public health measures" [2]. And look at
the rest of the comments in this discussion, and count how many types
"predict" appears!

> If we'd waited until we had enough data to make predictions with it

This is like the drunk looking for their keys under a streetlight. "Did you
lose the keys here?" "No, but the light is much better here." \-- "How
confident are you in your model's predictions?" "I have no idea, but it's the
model I have."

Also -- the Ferguson model made predictions, based on the parameters they
picked. You don't need to wait for data to make predictions; you only need
data to validate your predictions.

> How would you ethically collect training data for the interventions?

You don't. You (as a scientist who influences public policy) should publish
validated confidence intervals for your predictions. You (as a government)
should understand that there is a huge margin of uncertainty in the
predictions, and accept that sometimes you just have to make decisions in the
absence of knowledge. You (both the scientist and the government) do not go
around spouting "Our decisions are led by science".

[1]
[https://spiral.imperial.ac.uk:8443/bitstream/10044/1/77482/1...](https://spiral.imperial.ac.uk:8443/bitstream/10044/1/77482/14/2020-03-16-COVID19-Report-9.pdf)

[2] [https://www.imperial.ac.uk/news/196234/covid19-imperial-
rese...](https://www.imperial.ac.uk/news/196234/covid19-imperial-researchers-
model-likely-impact/)

~~~
rainforest
How do you validate the predictions for the number of infected cases in May
for scenarios that don't happen?

------
llarsson
Not a comment on the specific repo in question, but I just want to note that I
have seen utter monstrosities of academic code written in Python, MATLAB, and
R - languages that are ostensibly "easier" than C++. I so not think that poor
code quality is due to the many footguns C++ admittedly gives you.

I am sure that the main research is not in the implemented code. But with
unclear code, it is exceedingly hard to know that there are no mistakes: that
the researched models have been properly encoded.

That, I believe, is what software engineers are afraid of.

~~~
w0utert
Exactly. The C++ code in the GitHub repo is absolutely frightening, but I'm
100% sure if would be pretty much the same in e.g. Python, probaly even worse.
From my own experience code quality is inversely related to barrier of entry,
which means I see a lot more terrible MATLAB and Python code than terrible C++
code.

The only conclusion you can draw from a repo like this, is the conclusion I've
drawn countless times when the next piece of modeling code developed by some
math or physics expert lands on my desk: these kinds of things can not be
developed by domain experts alone, you always need to include skilled software
engineers to translate the prototypes they develop (preferably in something
like MATLAB) into production code.

~~~
tensor
How do you propose identifying these "skilled software engineers"? University
doesn't teach "professional software engineering" so all these hotshots coming
out of school into industry can't be what you're looking for.

There is no standards body defining the skills one needs to be considered
professional. There is no responsibility on practitioners unlike other
engineering professions. If a mechanical engineer screws up and causes death,
it's going to be bad for them. Software routinely screws up with no
consequence.

Hell, take two "software engineers" from FAANG and give them a piece of code
and they won't agree on whether it's good or bad.

None of this is to excuse poorly written scientific code, but if "professional
software engineers" want to throw rocks, maybe they should fix their own glass
house first.

------
ak217
Many senior members of the academic community rely on their reputation as
researchers to brush aside basic issues with the software that they develop
for scientific purposes. These include the lack of testability, debuggability,
reproducibility, separation of concerns, documentation, or usability. The lack
of focus on research software quality among senior PIs, funding committees,
and article reviewers is a huge problem in academia. The problems with the
Ferguson model and codebase are just the latest prominent example of this.

The tools that academics have access to are excellent. If you compare the free
support academic software developers receive from the rest of their community
to other engineering disciplines, it's beyond great. I think the OP is wrong
to suggest otherwise.

The problems with the Ferguson model are an opportunity to educate more
members of the community about the fact that good software development
practices are not optional for good science, and that senior members of the
community jeopardize their own reputation by not paying attention to them.

~~~
chrisseaton
> These include the lack of testability, debuggability, reproducibility,
> separation of concerns, documentation, or usability.

Or maybe these things aren't actually as important as we think they are in
professional software development?

If they're able to produce useful scientific results (in general, not
specifically in this case) without those things then maybe they don't matter
as much as we think they do?

~~~
csours
These qualities are important for exactly the same reasons they are in
production: Without those qualities, your code is brittle, your deploys are
brittle, changes are brittle.

It's just like saying "It runs on my machine". The scientific term for this is
"Replication crisis [0]"

0 -
[https://en.wikipedia.org/wiki/Replication_crisis](https://en.wikipedia.org/wiki/Replication_crisis)

~~~
chrisseaton
> Without those qualities, your code is brittle, your deploys are brittle,
> changes are brittle.

Is this brittleness stopping the scientists achieving what they need to
achieve?

Are you sure that writing tests makes science better? Or are you just assuming
that?

They aren't idiots and they aren't ignorant of how professional software
developers work.

~~~
mcv
If the results aren't reproducible, they can't be assumed to be true. Then
they're only useful if you only care about publication and not about whether
the results are actually true.

And yes, this is a serious problem in science.

~~~
chrisseaton
> If the results aren't reproducible, they can't be assumed to be true.

I don't understand why having brittle code would mean that the results are not
reproducible? You don't need to modify the program to do a reproducibility
study.

~~~
mcv
Reproducibility is one of the issues mentioned earlier in the thread. And
being able to audit the code and understand what it actually does seems rather
important too.

------
orbifold
Science is inherently a low stakes game. Most scientists are competing for
budgets of no more than a few million a year, if they are really successful.
There is little reward for producing excellent code, but a lot of reward for
producing atrocious code to publish in prestigious journals.

If you want to be truly horrified look into the practices surrounding NEURON,
a simulation tool that has been used to publish simulations since the 80s. You
have to write code in a domain specific language called Hoc, that code is
typically littered with hard coded constants. Code is copied over from one
paper to the next, etc. Code is a means to an end, not something that is held
in high regard. Moreover improving someone else's code style or quality won't
earn you a degree or praise. Starting a competing "higher quality" project is
bound to annoy or anger the establish players. It will be hard to get funding
for it, because those players will likely review your proposal. In any case
there is typically a 10-20 year gap between the people doing any decisions and
the people actually doing any practical work.

Finally in some cases code is a competitive advantage, I'm sure that Neil
Ferguson has churned out quite a few papers with the same framework by simply
tweaking some things here and there. In that case you are ill advised to share
the code with anyone.

------
rjtobin
I'm somewhat skeptical. Firstly, I don't think the language is the problem
with scientific code. You can write messy code in any language. So the warning
then has to be about writing software in general. In that case, I think a
warning like "don't try to write software unless you have years of training"
is a bit much. Many people with no training learn to write nice code. Many
projects made by amateurs might have ugly code but still add something to the
world (eg. many games).

The problem here is the project is influencing decisions in healthcare.

Having worked in HPC and academia, I've seen code like this a lot. There are
two archetypes I've noticed: (1) the well-meaning older academic maintaining
legacy code, who have often done a lot of convergence testing, but still have
code that isn't up to modern engineering practices, and (2) the domain experts
with the attitude that "programming is much easier than my area of domain
expertise". These are problems that require attitude changes within academia,
not better warnings on online tutorials. The second group are going to ignore
the warnings anyway.

Remember many of the people writing this academic code also teach programming
courses in their departments! They view themselves as programming experts.

~~~
silveraxe93
This model didn't just influence decisions in healthcare. It single-handedly
changed the UK government's strategy over this pandemic.

From what I understand the UK was planning on beating COVID by creating herd
immunity, similarly to Sweden. Then this model came out and everyone started
yelling that Boris wanted to kill your grandma.

The problem is that it's impossible to have an intelligent discussion over
this. This pandemic became a partisan issue. We're not discussing whether one
of the most impactful decisions made by a government this generation should be
based over absolute trash code. You're either uncritical of the lockdown or
"anti-science".

~~~
kwhitefoot
> creating herd immunity, similarly to Sweden. > ... > The problem is that
> it's impossible to have an intelligent discussion over this.

As far as I can tell the Swedish government never had this plan. It was
mentioned in an interview and dismissed as unworkable, journalists
misunderstood.

On the other hand the UK government appears to have had no plans whatever
until jolted into action by the fear that public opinion would turn against
them.

What Sweden has done is similar to Norway, where I live, which relies largely
on voluntary changes in behaviour and temporary closure of institutions and
businesses that require close contact between employees and customers. But
Sweden took longer to implement those measures and also Swedish society is
different from Norway, anecdotally Swedes seem to me to be more urban people
than Norwegians and more gregarious.

Exactly why Sweden has a much higher death rate, 36/100k inhabitants versus
4.3/100k in Norway, is unclear at the moment partly because of different
definitions but also because of differing conditions, and the epidemic being
at different stages in the two countries.

~~~
silveraxe93
It seems the government was following this document at the start:
[https://assets.publishing.service.gov.uk/government/uploads/...](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/213717/dh_131040.pdf)

The reason it seemed they were doing nothing are these passages:

ii. Minimise the potential impact of a pandemic on society and the economy by:

• Supporting the continuity of essential services, including the supply of
medicines, and protecting critical national infrastructure as far as possible.

• Supporting the continuation of everyday activities as far as practicable.

• Upholding the rule of law and the democratic process.

• Preparing to cope with the possibility of significant numbers of additional
deaths.

• Promoting a return to normality and the restoration of disrupted services at
the earliest opportunity.

There's way more, but I've honestly not read it all. But there _was_ a plan,
drafted before this epidemic.

Public opinion was turning against the government, but it actually kept course
for some time. Something I was honestly impressed with. What made it drop the
plan was Neil Ferguson's study.

There are many reasons for criticising the plan. This article is pretty good.
[https://www.theguardian.com/politics/2020/mar/29/uk-
strategy...](https://www.theguardian.com/politics/2020/mar/29/uk-strategy-to-
address-pandemic-threat-not-properly-implemented)

What really gets me is that if the lockdown was the correct decision, we
arrived there for the wrong reasons.

This paper had such an outsized impact that it should be held to a higher
standard. And it's scary (but not really unexpected) that the government is
making decisions of this magnitude based on such a shaky foundation.

------
Luc
John Carmack has reviewed the code and didn't seem to find it all that bad, so
there's that.

[https://twitter.com/ID_AA_Carmack/status/1258192134752145412](https://twitter.com/ID_AA_Carmack/status/1258192134752145412)

[https://twitter.com/ID_AA_Carmack/status/1244302925855326209](https://twitter.com/ID_AA_Carmack/status/1244302925855326209)

~~~
brigandish
Really?

> Imperial are trying to have their cake and eat it. Reports of random results
> are dismissed with responses like “that’s not a problem, just run it a lot
> of times and take the average”, but at the same time, they’re fixing such
> bugs when they find them. They know their code can’t withstand scrutiny, so
> they hid it until professionals had a chance to fix it, but the damage from
> over a decade of amateur hobby programming is so extensive that even
> Microsoft were unable to make it run right.

That's from the first analysis[1]. There's a follow up[2]:

> Sadly it shows that Imperial have been making some false statements.

and

> It’s clear that the changes made over the past month and a half have
> radically altered the predictions of the model. It will probably never be
> possible to replicate the numbers in Report 9.

[1] [https://lockdownsceptics.org/code-review-of-fergusons-
model/](https://lockdownsceptics.org/code-review-of-fergusons-model/)

[2] [https://lockdownsceptics.org/second-analysis-of-fergusons-
mo...](https://lockdownsceptics.org/second-analysis-of-fergusons-model/)

~~~
Joeboy
Yes. John Carmack says the code is OK, some anonymous person on the
lockdownsceptics.org website says it's a flaming heap of garbage.

For what it's worth I looked into some of the tickets linked in those articles
and concluded the author is, broadly speaking, full of shit. I am nobody in
particular though.

~~~
brigandish
> John Carmack says the code is OK

He doesn't say it's okay, he engages in a weird kind of whataboutery like
"Heck, professional software engineering struggles mightily with just making
completely reproducable builds". I struggle to note one part of the article by
the "retired software engineer" (as if that has any relevance either) that he
deals with specifically.

But since it's John Carmack we must let him wave his hand and say it is so.
The Github issues are also far more enlightening than Carmack's tweets on
this, but again, who cares for precision and points argued with evidence when
we have a _name_ giving their opinion?

~~~
Joeboy
> The Github issues are also far more enlightening than Carmack's tweets on
> this

Totally agree. The "lockdown skeptics" articles significantly misrepresent the
github issues. They imply that there are mysterious uncertainties creeping
into the results, when the actual issues relate to things like failures to set
RNG seeds consistently, or a checksum failing in a test due to floating point
rounding differences in Cray supercomputers' native instructions. Most readers
aren't going to investigate the github issues though.

------
flankstaek
This letter feels as though it is overlooking a large point of contention.

>The scientists who wrote this horrible code most probably had no training in
software engineering, and no funding to hire software engineers

Shouldn't the argument be, that for research that is reliant on coding models,
funding be allocated to experts that can assist in creating said models
(software engineers)?

~~~
klenwell
The conclusions from the first critical code review cited:

 _All papers based on this code should be retracted immediately. Imperial’s
modelling efforts should be reset with a new team that isn’t under Professor
Ferguson, and which has a commitment to replicable results with published code
from day one.

On a personal level, I’d go further and suggest that all academic epidemiology
be defunded. This sort of work is best done by the insurance sector. Insurers
employ modellers and data scientists, but also employ managers whose job is to
decide whether a model is accurate enough for real world usage and
professional software engineers to ensure model software is properly tested,
understandable and so on. Academic efforts don’t have these people, and the
results speak for themselves._

[https://lockdownsceptics.org/code-review-of-fergusons-
model/](https://lockdownsceptics.org/code-review-of-fergusons-model/)

~~~
lern_too_spel
That was a shitty code review. Seeding issues like the ones cited don't affect
the results of a Monte Carlo simulation, and there are tests in the repo, just
not automated ones.

The section you quoted shows the reason for the review's sloppiness. The
reviewer set out to find a way to justify their own beliefs instead of to
actually read the code.

~~~
thu2111
Seriously?

Doing a Monte Carlo simulation means you adjust the seeds to get different
runs. It doesn't mean your program can read uninitialised memory or reuse
variables that weren't reset to zero and still be correct.

Where are people getting this idea that you can just average away the results
of out-of-bounds reads and race conditions?

~~~
lern_too_spel
What does reading uninitialized memory or reusing variables that weren't set
to zero have to do with seeding issues? Read my comment again and reply to its
content instead of making up a comment that you would like to reply to.

------
silveraxe93
I disagree so strongly with this that I had a visceral reaction while reading
it.

C++ is a tool, not an end product. If you're not qualified to use a tool
correctly it's not the manufacturer's fault, it's yours.

Why do so many people believe that good software development is not part of
their job? If you write code then you're a developer, no matter your job
title. If you write shit software, saying you're a researcher is not an
excuse.

~~~
lucideer
This is such a ridiculous comment I don't know where to start.

Are you actually proposing that being a fully experienced & knowledgeable
software engineer should be a base requirement for all academic research (in
any field)?

If you, working as a software engineer, were told tomorrow by your manager
that you needed to perform heart surgery, and that that now fell under your
responsibilities in your current role, would your argument be the same?

The private sector's typical response to the problems in the article would be
to _hire_ qualified software engineers to assist researchers. Funding dictates
this is impossible, so they make do.

~~~
pbourke
> Funding dictates this is impossible, so they make do.

Put another way: software quality is not valued in academia. Let’s say the
code in question was in great shape. Would it have mattered at all in the
trajectory of anyone’s career? No. This is probably the one academic code base
in a million that has received any negative reputational hit due to its
quality.

~~~
TeMPOraL
> _software quality is not valued in academia._

And in general, it shouldn't. Code written by scientists is almost always
throwaway prototypes. So it absolutely doesn't matter how decoupled, or easy
to extend it is. The only aspect of quality that matters here is the
simplicity, in the sense "it obviously has no bugs" (and related practices
like testing). This is something that could be taught, but I think it should
be also addressed from the other end of the spectrum - more pressure put on
verification and reproducibility of results.

~~~
mcv
It doesn't have to be decoupled or easy to extend, but it should be readable
enough to ensure it does what it's supposed to, and well-tested enough to
verify it does what it's supposed to.

~~~
kwhitefoot
I think Blaise Pascal's (or whoever really said it) remark is apposite here:

    
    
        If I Had More Time, I Would Have Written a Shorter Letter
    

[https://quoteinvestigator.com/2012/04/28/shorter-
letter/](https://quoteinvestigator.com/2012/04/28/shorter-letter/)

------
bJGVygG7MQVF8c
this is quite off the mark. the author sets the bar too low for himself by
criticizing the most easily (and to be fair legitimately) dismissed criticisms
of the Imperial College model.

here's a better laid-out critique that the OP doesn't speak to:

 _The Imperial College modelers released the source code a couple of days ago
to the model that shut down the world economy. It 's not the original model
code but was rather original source code turned over to volunteer programmers
who re-wrote it so that is more readable. I have done some model review of
financial models in the past but without the source code I would not be able
to do a full review of the Imperial College model. Now that we have the source
code (sort of), I can._

 _Any such model ought to have been independently reviewed before it is ever
used for real policy decisions. Policy analysis is awash in models but no one
ever really checks them. Going forward, health policy makers should ask for
and disclose independent validation of any model before using its results to
make recommendations of any consequence._

 _Normally, model reviews are long technical documents but there would also be
a summary section. Here 's what I think a summary should have looked like._

 _Overall conclusion: this model cannot be relied on to guide coronavirus
policy. Even if the documentation, coding, and testing problems were fixed,
the model logic is fatally flawed, which is evidenced by its poor forecasting
performance._

[https://www.facebook.com/scarlett.strong.1/posts/25243721950...](https://www.facebook.com/scarlett.strong.1/posts/252437219500976)

~~~
Majromax
> Any such model ought to have been independently reviewed before it is ever
> used for real policy decisions. Policy analysis is awash in models but no
> one ever really checks them. Going forward, health policy makers should ask
> for and disclose independent validation of any model before using its
> results to make recommendations of any consequence.

That's ignoring the time-limited nature of a virus response. "No decision" _is
itself a decision_. Delay has a real cost, and using a potentially imperfect
model is simply using the best information at hand.

I think everyone would agree that the ideal is to have well-documented,
thoroughly-tested, easy-to-use, and bulletproof models to inform public
response to every emergency. However, those models can't be built instantly,
and that kind of bulletproofing is not directly relevant to the day-to-day
research work aided by such models.

Adding research capacity to ensure that governments have a stable of well-
researched, thoroughly-vetted models for emergencies would be a great thing,
but it would also be quite expensive. Keeping any single model up to spec
might be the job of 1FTE (so an additional ~$150k/grant/yr -- large for a
research budget but small for a government), but that would have to be
multiplied by every area where the government might possibly want research-
informed decisionmaking on short notice.

> Even if the documentation, coding, and testing problems were fixed, the
> model logic is fatally flawed, which is evidenced by its poor forecasting
> performance.

That sounds like a great scientific criticism! Model validation is the
cornerstone of research on computer models of things, and finding a poor
forecast opens the door to many great research questions.

But without that further research, "poor performance" sounds a (loud) note of
caution, but isn't necessarily fatal. The leading-order problem is to
understand _why_ the model performed poorly: was it improperly calibrated with
information known at the time (such as if the virus behaves differently than
assumed)? Was there some out-of-sample feature of the forecast that the model
would not expect to do well on (e.g., low death rates in an open society
because everything was shut down for severe weather anyway?) Is an overall
trend correct but the timing in error?

"Flawed," I think, can be easily shown, and this should probably be expected
in a research model. "Fatally flawed," however, is a stronger claim that must
pass a greater burden of proof.

~~~
enitihas
The quickest decision is random number generation. Should people go with that?
Governments across the world had loads and loads of time.

> Adding research capacity to ensure that governments have a stable of well-
> researched, thoroughly-vetted models for emergencies would be a great thing,
> but it would also be quite expensive. Keeping any single model up to spec
> might be the job of 1FTE (so an additional ~$150k/grant/yr -- large for a
> research budget but small for a government), but that would have to be
> multiplied by every area where the government might possibly want research-
> informed decisionmaking on short notice.

If you have a once in a century crisis, and still are thinking about saving
some millions, while being sure your crisis response will be above billions,
your policy is already flawed and not much can be done to help you.

~~~
Majromax
>If you have a once in a century crisis, and still are thinking about saving
some millions, while being sure your crisis response will be above billions,
your policy is already flawed and not much can be done to help you.

By the time you're in the crisis, _it 's too late_ to make software
bulletproof. Not only does that process take time, but it also needs to be
integrated with the whole of the research effort up to that point.

So you can't pick and choose topics: you need the funding to make bulletproof
_every_ potential policy-related model you might need in an emergency. That's
where the costs add up, since we don't have a time machine to go back and fund
exactly the lines of research we in fact need at the moment.

~~~
enitihas
But nobody is asking for bullet proof software here. Nobody is asking perfect
MC/DC coverage.

If governments are so incompetent that they can't forsee a crisis like this
even one month in advance, where even sanitizer hoarders have better
foresight, then off course nothing can really help. Governments do spend
billions and trillions of money on national defence, and an issue like this
when it's almost at the door should be given the zame priority as national
defence.

------
hamandcheese
> Consider what you, as a client, expect from engineers in other domains. You
> expect cars to be safe to use by anyone with a driver’s license. You expect
> household appliances to be safe to use for anyone after a cursory glance at
> the instruction manuals. It is reasonable then to expect your clients to
> become proficient in your work just to be able to use your products
> responsibly? Worse, is it reasonable to make that expectation tacitly?

I don’t buy this argument. Cars are safe, yes. C++ isn’t the car, though. C++
is the dangerous machine shop you use to build the car. A better analogy would
be comparing a car to a web browser, and indeed incredible effort has been put
into keeping web browsers secure.

------
veddox
Well, it looks like one of the biggest problems in computational biology has
finally blown up in our face.

I work in a very similar field to the one discussed here - ecosystem
modelling. And much of the code I see is probably of a similar quality to Neil
Ferguson's model. (Although I haven't had a detailed look at his work.) For
all you angry devs out there, I have three comments:

1\. Yes, we have a problem with code quality, and no, this should not be
acceptable where policy-relevant decisions of such magnitude are concerned.
(Although one should bear in mind that all science is flawed and its achieved
reliability will always be limited by constraints of time/budget/etc.)

2\. However, changing the _status quo_ is really hard. Specifically, the two
greatest changes that are needed are to teach proper software development
practices to students in the natural sciences (just like we teach lab
technique), and/or make it easier to get funding for software developer
positions in a research team. This sounds easy, and ought to be easy, but both
run counter to some pretty entrenched ideas held by the "old folks at the top"
in universities and funding bodies. (Believe me, I'm speaking from experience
:-/ )

3\. But yes, _we are working on it_. The debate is growing, and there's a new
generation of computational biologists with much closer ties to computer
science who are trying to shake things up a bit. In some sense, we are several
decades behind the wider CS world in the techniques we use, but we're catching
up. And at the same time, we're starting to develop some of our own methods of
quality control (such as pattern-oriented modelling).

~~~
veddox
If you're interested, here is some further reading:

* DeAngelis, D. L., & Grimm, V. (2014). Individual-based models in ecology after four decades. F1000prime Reports, 6(June), 39. [https://doi.org/10.12703/P6-39](https://doi.org/10.12703/P6-39)

* Grimm, V., Berger, U., DeAngelis, D. L., Polhill, J. G., Giske, J., & Railsback, S. F. (2010). The ODD protocol: A review and first update. Ecological Modelling, 221, 2760–2768. [https://doi.org/10.1016/j.ecolmodel.2010.08.019](https://doi.org/10.1016/j.ecolmodel.2010.08.019)

* Grimm, V., & Railsback, S. F. (2011). Pattern-oriented modelling: A ‘multi-scope’ for predictive systems ecology. Philosophical Transactions of the Royal Society B, 367(1586), 298–310. [https://doi.org/10.1098/rstb.2011.0180](https://doi.org/10.1098/rstb.2011.0180)

* Nowogrodzki, A. (2019). How to support open-source software and stay sane. Nature, 571(7763), 133–134. [https://doi.org/10.1038/d41586-019-02046-0](https://doi.org/10.1038/d41586-019-02046-0)

* Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2016). Good Enough Practices in Scientific Computing. 1–30. [http://arxiv.org/abs/1609.00037](http://arxiv.org/abs/1609.00037)

------
dekhn
I chuckled when I saw the author of this article. Many years ago I was a
scientist (PhD student) writing code in Python, but wanted to use features in
Mathematica like symbolic integration. Turns out Mathematica has a C API that
lets you send Math expressions to it and evaluate them. I wrote a clunky
Python interface and shared it with Konrad Hinsen, who looked at it and
suggested an elegant recursive object representation of Math objects in Python
that led to automatic conversion between Python and Mathematica, massively
simplifying the code and making it more elegant. I got slightly better at
software engineering that day.

------
motohagiography
I'm trying to be charitable on this code base issue, and the institutions will
need to accept they need to be able to bear more scrutiny on the inputs to
policy recommendations.

Both R and Python are taught in highschool and undergraduate courses for
scientific data analysis. Ferguson and other scientists did not need software
engineers. A free co-op student, or to have spent the three weeks learning a
language that met the needs of the level of abstraction he was working at
would have sufficed.

The culture that enables that code as described to be acceptable for
policymaking is one that intentionally produces complex black boxes that
obfuscate risk and attribution, and to launder decision accountability through
technology. I've seen this in other institutional code as well.

However, while the policy recommendations that resulted from his model may (or
may not) have saved tens of thousands of lives, it did so at the risk of
losing the credibility to do it again.

Deflecting blame to nebulous software engineers is disingenuous and serves
mainly to exacerbate the suspicions of reasonable people, and further polarize
those most harmed by the policy response.

~~~
disgruntledphd2
I honestly don't think R and Python would have worked here.

While I was able to clone the repo and run it for Ireland, the UK requires at
least 26Gb of RAM, which is not common on most personal computers. The US
requires much, much more.

And given that it's pretty slow when written in C++, imagine how slow it would
have been in R or Python?

I agree with you in principle with respect to this stuff being better, but the
incentives skew otherwise at the moment.

~~~
silveraxe93
Any serious attempt at modelling this over python would use the pydata stack
(numpy, pandas, etc), which run on top of C++ anyways.

~~~
disgruntledphd2
Yeah of course, apologies if that wasn't clear.

The best solution here would probably be to package up the core routines into
a library and use this from either R or Python.

~~~
silveraxe93
Sorry if I came out a bit snippy out there. But yeah I assumed you meant
python without numpy, etc.

A lot of the criticism I saw was because the core routines did not need to be
packaged up. There were a lot of common data structures reimplemented, etc.

I don't think the model had many novel routines. It could be built just using
industry standard and tested tools in python, R, Julia (if you really want
speed) etc. But it reinvented the whole ecosystem in one big ball of C.

tbh, this should have been built on STAN or similar. There's so many variables
and assumptions that the output is completely dominated by the parameters
chosen. Seeing the distribution of outcomes instead of a point estimate would
be actually useful.

~~~
disgruntledphd2
True, I think that's worth noting. To be fair though, this codebase is pretty
old, and it's unlikely that the technology landscape looked much like today
(especially in terms of R and Python), so I can see how they ended up here.

I'd _love_ if this was written in R, Python or Stan so I could contribute, but
that's probably not the researchers focus ;)

While Stan is amazing, I shudder to think as to how long this model would have
taken to run using MCMC (1 week plus maybe?).

------
gammarator
Whether the Imperial code is good or bad doesn't actually matter: you can
derive the headline numbers analytically, without any simulation at all [1].

(The real problem with the worst case estimate is that it assumes people don't
individually change their behavior in the face of a pandemic.)

A better critique of the software engineering criticism is [2].

[1]
[https://twitter.com/trvrb/status/1258879531022082049](https://twitter.com/trvrb/status/1258879531022082049)

[2] [https://philbull.wordpress.com/2020/05/10/why-you-can-
ignore...](https://philbull.wordpress.com/2020/05/10/why-you-can-ignore-
reviews-of-scientific-code-by-commercial-software-developers/)

------
chasd00
they need to defer to the experts of software development.

I, as a software engineer, wouldn't try to design an epidemic model, i would
defer to experts. Epidemic experts shouldn't be writing the implementation of
their model, they should defer to the experts.

Also, "It’s you, the software engineering community, that is responsible for
tools like C++ that look as if they were designed for shooting yourself in the
foot. It’s also you, the software engineering community, that has made no
effort to warn the non-expert public of the dangers of these tools"

that's a bit much. if someone buys a lathe, gets it home, flips the switch and
tears off an arm are machinists to blame?

~~~
veddox
> Epidemic experts shouldn't be writing the implementation of their model,
> they should defer to the experts.

Believe me, a lot of scientists would love to. But as pointed out in the OP:
it is next to impossible to get funding for a paid software developer position
in a research team. It's a problem the scientific community is increasingly
aware of, but changing funding guidelines takes a long time. (Source:
computational biologist)

~~~
chasd00
That's a fair point. Also, passing judgement on source code is a very easy
thing to do. May he who writes/maintains perfect code cast the first stone.

~~~
downerending
Most scientific software isn't of subtly poor quality. Most scientific
software is a stinking, flaming dumpster fire that looks like it was written
by packs of drunk kindergartners.

------
mkirch
Absolute cringe, this was hard to read. Seems like a blatant shift of
responsibility. At what point do people have to take responsibility and stand
by their creations, regardless of what has been spoon-fed to them?

------
madhadron
I just took a quick skim through the repository and I'm not quite sure what
everyone is so upset about. It looks like simulation code.

I would argue that the real problem isn't C++ but tooling that is aimed at
producing source code as an artifact as opposed to repeatable executions as
artifacts. There are effectively lots of models in this system, and the code
represents all of them tangled together. But that means that you have source
code, parameters, tracing, and output as a thing.

------
LoSboccacc
> It’s you, the software engineering community, that is responsible for tools
> like C++ that look as if they were designed for shooting yourself in the
> foot.

wait what. a fraction of the community might be a pretentious bunch of purist,
but I don't think it's fair to criticize them for the tooling selection

one wouldn't pick a excavator just because it's the industry standard for
moving earth to plant some tulip in a vase

and even if funding is scarce, google is free: the first result for planting
tulips returns a bulb planter, not a bulldozer.

------
raverbashing
What we're missing is an "open letter" of scientists criticizing software
engineers evaluation of this model.

Because it seems no one of these critics have ever coded anything to do with
stochastic forecasting or simulations.

"The code produces different results between runs" well, I would be very
worried if it didn't, unless your RNG has a fixed seed, and all randomness in
your model derives from it (so, if you use multiple threads, there goes your
repeatability - which is _fine_ ).

Also, you want multiple runs to be different, so you can establish best/worse
case scenarios (also tuning some of the parameters).

Does the model pass basic sniff tests? (for example, no more infected people
than the population of a place? does it follows known pandemic curves? Can the
parameters be reasonably tuned to fit its behaviour in existing places?) And
yes the model was checked against a different model from what I remember.

In essence, if you take Population x (percentage for herd immunity) x IFR you
can get a very good estimate of the worse case scenario for deaths. Then you
can see if your model goes to that value given the known parameters.

Yes, the code is ugly. Yes you probably made worse code at a point in your
career.

------
Nursie
I think the reason I find this so objectionable is that he's focused on a tool
for doing the job of software engineering, and then blamed software engineers
for not making it easy for him.

There's a spot-on comment on there at the moment - this isn't the car that's
ready to use by anyone with a license. This is the welding gear we use to put
bits of car together. You walked into the machine shop here, used some of our
tools and made something that wasn't safe.

If scientists are not software engineers, and do not have the time or the
motivation to become software engineers, perhaps they shouldn't play at being
software engineers?

------
luord
> _Consider what you, as a client, expect from engineers in other domains. You
> expect cars to be safe to use by anyone with a driver’s license. You expect
> household appliances to be safe to use for anyone after a cursory glance at
> the instruction manuals. It is reasonable then to expect your clients to
> become proficient in your work just to be able to use your products
> responsibly? Worse, is it reasonable to make that expectation tacitly?_

That doesn't seem like a proper analogy. I certainly would expect a civil
engineer to think me an idiot if I tried to do his job for him. And that's how
most code written by non software engineers/computer scientists ends up
looking.

Software engineers don't produce programming languages, but programs.
Languages (and libraries, etc) are our _tools_ , not our end products. Much
like an architect's job is not to produce rulers and pencils, but plans.

It just so happens that (more so for computer scientists than for software
engineers) we create our own tools.

------
htk
I get what he’s saying, but the “non-experts” using code to represent their
models should also let outside help come when the need arises.

When your model predicts an apocalyptical scenario and government is taking
drastic measures based on it, it’s a good time to expose your “non-expert
code” to the software engineering communities (and all other associated
fields) to take a look.

~~~
munchbunny
_should also let outside help come when the need arises._

 _expose your “non-expert code” to the software engineering communities (and
all other associated fields) to take a look._

1\. Software engineers are expensive. Hiring them to write your code is how
you end up needing even more money to do your science, and I think the
software engineering world if anything can appreciate prioritizing being
scrappy to get more done.

2\. Open source is slow and doesn't produce consistent results in the
timeframes you need in order to get things implemented in one-offs. In fact,
the value you get from others looking at your code is pretty anemic unless
other software engineers find your code useful, at which point they have real
incentive to help you improve existing functionality instead of reinventing
the wheel. I do think code should be published alongside papers as a matter of
reproduceability, but I don't think opening up the code beforehand will
accomplish much.

Which is all to say that while I agree with you in principle, I don't think
your recommendations are practical.

I think this letter has it right: we should make better tools to help non-
experts do less foot shooting. C++ is very, very foot shooty, and the usual
answer of "well get better at C++" is a non-starter for non-experts.

I think there are other solutions too - software engineers with partial
specializations in academic fields, volunteers, etc.

~~~
rainforest
I believe this model is quite old (at least in some form) so there have been
opportunities to review it. I confess I haven't looked, but I haven't heard
any defects have actually been identified. If the model stood up to peer
review I assume the results it produces are at least consistent with the
expectations of the people who wrote the mathematical model.

Hopefully this will be a watershed moment that makes it easier to cost a
research software engineer onto a grant in the future.

~~~
thu2111
Many defects have been identified. A small collection are linked to from here:

[https://lockdownsceptics.org/second-analysis-of-fergusons-
mo...](https://lockdownsceptics.org/second-analysis-of-fergusons-model/)

------
1_over_n
The scientific method is supposed to be reproducible. Others should be able to
follow the steps and arrive at the same outcome. Scientsits working in wetlabs
are not publishing papers on what they do in a language only they can
understand and expecting others to blindly trust the results and conclusions.

~~~
JamesBarney
This is definitely happening except the language is hard to follow excel.

------
lukeex
Bridges are designed, built, tested, and verified as fit for purpose. Planes
are designed, built, tested, and verified as fit for purpose. Why should
software models that inform policy be excluded from those requirements? What
is the expected cost of failure - it seems that critical risk controls are
missing from academia - in the case that they are wrong.

The case in hand - is problematic because it highlights that lack of controls
that are present and that should probably be present when building models and
simulations to inform policies.

Quality control is important in nearly every other industry - the lack of
quality controls in academia appears to be the root cause to me. Rather than
particular language/build choices.

------
jstewartmobile
With all due respect, Konrad can go f himself. Plenty of performant, non-foot-
gun, alternatives to C++ these days. The choice of that language, and any
abuses of it fall entirely upon Ferguson and his team.

And for this nugget:

“ _It’s also you, the software engineering community, that has made no effort
to warn the non-expert public of the dangers of these tools._ ”

I say he deserves it without lube. Hardly a day goes by on Lobsters or HN
where we (including many C++ devs) don’t complain at great length on what a
reeking dumpster fire C++ is.

------
m0llusk
That is a reasonable problem statement and proposed solution, but the problems
with software engineering go very deep. That is why so many software projects
fail outright. Just as we acknowledge that aviation engineering has different
risks and constraints from basic tool building so we all need to understand
that at least for now and likely for some time software development will be
inherently messy and risky for all who dare it.

------
btrettel
As a researcher, if I want to find a software engineer willing to review my
code for free (I have no budget for this), how should I find one?

The article says

> We can’t ask software experts for a code review every time we do something
> important.

but I think there are people who'd be willing to give at least 15 minutes of
their time to review scientific software once.

------
cheerlessbog
The author makes it sound like a warning label is missing from the C++ tin.
Maybe. But what tool should he have used? I haven't seen this code but is
there any doubt it would look just as bad in Java or Python, maybe with fewer
segfaults? Or FORTRAN.

~~~
dirtydroog
I'm genuinely surprised it wasn't an Excel spreadsheet.

~~~
DavidHm
A well executed Excel model is much easier to explain to a non-technical
person.

To be frank, for all the snobbery towards Excel, it has done a marvelous job
at getting millions of people to think in more quantitative ways instead of
"business acumen".

------
commandlinefan
> A clear message saying “Unless you are willing to train for many years to
> become a software engineer yourself, this tool is not for you.”

If you say anything along those lines, you'll be dismissed as a "gatekeeper".

------
stephc_int13
Could someone give a short example of some ugly code in this repo?

I've quickly looked and I did not find anything clearly uglier than what I
have seen in 99% of online repositories...

Quite the contrary, to be honest.

------
growlist
Laughable nonsense, and especially glib when there are lives on the line due
to the question of competence/lack of it.

One thing I'm surprised not to have seen yet - as a MSc Geographic Information
Science grad - is criticism of the methdology from a GI science aspect. And
the question that I have is: where is the proof that this model's results are
in any way representative of reality? Surely Professor Ferguson - before
releasing results that have the potential to turn economies upside down, and
with thousands of lives at stake - would have run this against some real world
data to validate the model. Or not? If not, why should we listen to anything
he says ever again?

------
dahnhiller
Publicly funded science should in almost all cases make the software
underpinning it publicly available. Otherwise, how are results reproducible,
verifiable or open to peer review?

------
jonnadul
Instead of complaining about code quality, shouldn't SE's jump in and support
academia in writing better code?

------
mcv
What a collection of terrible arguments. I'm afraid I can't help myself here.
I've got to eviscerate this USENET-style:

> _" The scientists who wrote this horrible code most probably had no training
> in software engineering, and no funding to hire software engineers. And the
> senior or former scientists who decided to give tax-payer money to this
> research group are probably even more ignorant of the importance of code for
> science. Otherwise they would surely have attributed money for software
> development, and verified the application of best practices."_

Let's start with the observation that this is indeed the main problem here.
Planning research that requires software to be developed, without accounting
for software development, is a terrible idea. There's a good reason why
scientific programmers exist: to help researchers write the code they need for
their scientific projects. Hire one if you need one. Don't blame someone else
if you forget to do so.

> _" It’s you, the software engineering community, that is responsible for
> tools like C++ that look as if they were designed for shooting yourself in
> the foot. It’s also you, the software engineering community, that has made
> no effort to warn the non-expert public of the dangers of these tools."_

Excuse me? The internet is riddled with jokes about how easy it is to shoot
yourself in the foot with C++. Of course you can shoot yourself in the foot
with any programming language, but C++ excels at it. Many programmers avoid it
because they don't want to have to manage their own memory. Follow their
example and use something that focuses on the problem area you want to focus
on.

> _" You know, the kind of warning that every instruction manual for a
> microwave oven starts with: don’t use this to dry your dog after a bath."_

Anyone who needs that kind of warning is a danger to themselves and others. A
scientist who lacks this level of common sense should seek guidance from
someone who has it.

> _" A clear message saying “Unless you are willing to train for many years to
> become a software engineer yourself, this tool is not for you.”"_

Software engineer is a serious profession that requires lots of training.
Should any idiot expect to whip up their own epidemic simulation without
knowing what they're doing and expect reasonable results?

> _" But power comes with responsibility. If you want scientists to construct
> reliable implementations of models that matter for public health decisions,
> the best you can do is make good tools for that task, but the very least you
> must do is put clear warning signs on tools that you do not want scientists
> to use"_

Is it unreasonable to expect someone who needs tools, to either research what
tools are suitable for their needs, or otherwise ask advice from an expert on
those tools? Nobody grabs just a random tool from their toolbox to solve a
specific problem. If it's a nail, you grab a hammer, if it's a screw, you use
a screw driver.

> _" scientists are not software engineers, and have neither the time nor the
> motivation to become software engineers."_

Then hire one. Don't blame your lack of motivation on others.

> _" Consider what you, as a client, expect from engineers in other domains.
> You expect cars to be safe to use by anyone with a driver’s license."_

Yeah, with a driver's license. We don't let random idiots drive off in a car,
we expect them to learn how to drive first. If you want to use C++, you're
going to have to learn about memory management. If you don't want to do that,
get a different language.

> _" You expect household appliances to be safe to use for anyone after a
> cursory glance at the instruction manuals."_

Is that cursory glance going to be enough to tell them not to put their dog in
a microwave? At least C++ is not going to kill your dog.

> _" It is reasonable then to expect your clients to become proficient in your
> work just to be able to use your products responsibly?"_

They can use our products just fine, but if they want to use our tools, they
need to learn how to use them.

Should toolboxes now contain warning labels not to build your own car from
scratch? Or not to use these tools to repair a nuclear reactor? It's
absolutely valuable to learn how to use tools, but if you just grab random
power tools without knowing what you're doing, and without being willing to
learn to use them, you're likely to lose a limb.

Finally, if you're really looking for a programming language to let scientists
play around with, try Python. It's designed to be easy to learn, and it's also
very suitable for all sorts of scientific modeling. There's a good reason it's
popular in research. I still recommend putting some effort in learning to use
it.

------
s9w
C++ is hard, therefore they're allowed to publish such rubbish and destroy
trillions of dollars as a consequence? Critical code is written all the time.
It can be expected with such decisions.

------
asddsfgdfsh
Everything is fine. You get what you pay for.

------
airlines55
Oh not not the heckin scientists. They are always right! Dang it guys it's our
fault, not Science. I just wish it was 1801 again so we could parade around
streets once more in worship of the god of Rationality!

------
zozbot234
Look, the software engineering community has switched to Rust by now. If
you're still using C++, that's _your_ problem. Don't blame us.

------
enitihas
So government can spend billions and trillions on implementing covid19 related
policies, but can't spend any money on getting good data for the same? The
fact that government policy is driven by some unpublished, undocumented, non
repeatable code, affecting the lives of an entire nation, should be treated as
a national defence issue. Anyone saying the person writing this was not a
software engineer is giving a poor excuse. Government shouldn't appoint random
people to do random jobs and in case of poor results say they weren't a
trained XYZ. The people deserve a better process.

