
Professors’ unprofessional programs have created a new profession - Nuance
https://www.economist.com/news/science-and-technology/21695377-professors-unprofessional-programs-have-created-new-profession-more
======
greydius
I do this kind of work. It's frustrating how awful most of the code is. You'd
think people smart enough to pursue stem phds would understand basic
programming abstractions, but that is often not the case. Another issue is
that researchers don't normally use good software engineering practices. I
have yet to be given any code that has even a single unit test. Source control
is being used these days, but the repositories are usually unorganized messes
with unhelpful commit histories. No one keeps track of system dependencies,
and few understand build systems. I can spend a week just trying to get some
software to build.

I could keep complaining, but I don't want to give people the impression that
I don't like what I do. It beats the hell out of writing CRUD apps.

~~~
britworst
Agree with everything except for unit tests, they're something of a cargo-cult
fad. Any of my developers caught wasting their time on these would get a stern
talking to.

~~~
keldaris
As a physics PhD spending most of my time on writing code for numerical
simulations, I generally agree that unit tests are a waste of time for most
things. Unfortunately, most people nowadays seem to think you either fetishize
unit test coverage percentages or don't care about code quality at all.

In practice, at least for simulation code, functional, integration and
regression tests are useful when employed judiciously. Most importantly, you
verify your results using published benchmarks in the scientific literature or
analytic results where possible. Obsessively covering every trivial bit of
code with a unit test of its own has always struct me as rather a weird fad.

~~~
fsloth
"Obsessively covering every trivial bit of code with a unit test of its own
has always struct me as rather a weird fad."

The advantage unit tests have over regression tests is that the time between
the moment developer implemented a change and finding out they broke something
is as small as possible. When tens of people work on the same codebase this
saves enormous amount of effort.

Fethisizing over some rule for it's sake is silly of course. My org would
waste a lot of money and time without unit tests.

Another feature unit tests provide: Think of the unit tests as a living
documentation. Breaking something leaves a breadcrumb trail to the invariant
which was sullied.

~~~
keldaris
I'll happily grant the point that unit tests are, to a considerable extent,
the price you pay for putting more developers in charge of the same parts of
the codebase. In my line of work that's usually not a concern, but I can see
merit in the idea for large teams. I also think that statement scales down in
the sense that unit tests rarely make sense for code written and maintained by
one or two people.

~~~
fsloth
"I also think that statement scales down in the sense that unit tests rarely
make sense for code written and maintained by one or two people."

I am maintaining several end user critical projects related to data transform
and transport, mostly by myself. I have no idea how I could develop them as
efficiently without unit tests to verify my changes don't brake some corner
case as new features are added.

The cadence for changes can be fairly low - I might return to some component
after doing something else for six months, and I probably need to add some
feature without breaking something in the process.

Sure, we have integration testing and smoke testing, but the further a bug
travels the release pipeline from the developer the more expensive it is to
fix. This cost cascade is quite easy to visualize - the work of whomever
caught the bug is stopped, and depending where they are located in the product
ecosystem their stall can cause lots of work for other people. If they are a
tester they need to file a report. If they are a customer, their work is
interrupted, they contact local sales, who then contact global helpdesk, who
then identify and log the bug.

Much simpler and easier if there is a unit test to catch the bug in the first
place.

Now, there can be domain specific flavors to this. My domain is computational
geometry and transport of 3D modeling data between domains. But in my domain
anyone _not_ securing their code with unit tests is wasting their employers
money and setting their end users at risk by increasing the likelihood of
bugs.

There is lot of cargo cult nonsense in software engineering. Unit testing is
not one of them. It saves time and effort by catching a range of bugs not
caught by e.g. compiler for statically typed language and it secures the
program logic for future changes.

------
dasmoth
Something that goes against the grain of the contemporary "reproducibility"
mindset but worth keeping in mind: that's always the option of reading the
paper, extracting the core ideas (but not necessarily the exact algorithm)
then sitting down and writing your own. To me, that seems like what
"replication" should really mean -- not downloading a Docker image and re-
running someone else's analysis verbatim.

This is far from an original thought:

 _In the good old days physicists repeated each other 's experiments, just to
be sure. Today they stick to FORTRAN, so that they can share each other's
programs, bugs included._

    
    
             -- E. W. Dijkstra, 1975 apparently

~~~
pps43
So you re-write the code from core ideas and get a different result. Now what?
Did you make a mistake? Did the original author make a mistake?

If you have original dataset and .Rmd file, you can then bisect both analyses
to find the bug.

------
fabian2k
One big problem with software developed in academia is that there is almost no
incentive for continued development and maintenance.

Grants are time-limited, and at some point usually the money for developing
the software runs out. The PhD students working on the software move on to
something else, and you have yet another abandoned piece of scientific
software.

There are of course exceptions, but in general it's much harder to get money
for maintaining and improving software over a longer timeframe than for
building something new.

~~~
roel_v
Software gets maintained (somewhat) by those who make a career out of it.
Build something in your PhD, get a postdoc where you can add some more of your
own things to it, and get funding for follow on work in which you get your own
PhD's to work on it. That's the only model I've seen work, but it's usually
very implicit, because when you get to the tenure track stage (or even
beyond), you have an incentive to make it look like you're not yet again re-
using that thing you were doing 10 years ago. So it's not called 'maintenance'
or even 'upgrading' \- you call it something else that sounds new, yet has
enough links to the old thing to give you validation because people have heard
of it, and in this way drag this mutant bastard code through 5 or 10
iterations over 25 years until your career is 'done' and you can finish off
the last years in standby mode.

This sounds negative but I don't mean it that way; it's just how it is, no
judgement meant. But it's not something you really hear anyone teaching new
PhD students as an option, and even if they would, it's highly uncertain and
success depends on many factors out of your control. So I wouldn't call it a
career path, or even viable career advice.

~~~
Balgair
> Build something in your PhD, get a postdoc where you can add some more of
> your own things to it, and get funding for follow on work in which you get
> your own PhD's to work on it.

Dear God how do you manage to do this?!

~~~
roel_v
It's usually not planned, just something that happens to come together. I know
several professors who build their careers this way.

~~~
Balgair
Sooo, luck?

~~~
roel_v
Well yes, but luck favors the prepared of course.

------
amelius
In my experience there is only little money to be made in scientific software.
Perhaps it depends on the field (I worked in design-automation for electronic
circuits). Scientists often have difficult budgets (depend on funding), and
they often expect software to be free (it's not budgeted for). I'm curious
about the experience of others.

PS: I couldn't read the article, presumably because of some anti-adblocker
mechanism.

~~~
akuji1993
One good example that I know of (since I studied in this direction) is POLARIS
([http://www.graphics.stanford.edu/projects/polaris/](http://www.graphics.stanford.edu/projects/polaris/)),
which turned to the now pretty well-known Tableau Software.

> I couldn't read the article, presumably because of some anti-adblocker
> mechanism.

That is so stupid of news sites to do. They only thing that makes me do is
leave your site immediately.

~~~
krapp
>That is so stupid of news sites to do. They only thing that makes me do is
leave your site immediately.

It's almost as if they don't want anyone to read the content without also
viewing the ads...

~~~
keithpeter
I was able to read the page fine in Firefox with javascript disabled in the
about:config settings. No ads visible on this particular page, but many sites
have 'calm' banner style ads visible with javascript switched off.

I keep Chromium in 'full fat' mode for sites with rich content and for coping
with those public wifi connections where you have a javascript based terms and
conditions page.

~~~
tome
Did you know you can set up multiple profiles for Firefox and run one in no-JS
mode and another in "full fat" mode (at the same time!)?

(I think it's "firefox -profileManager")

~~~
keithpeter
Yup and I had a go at that but it got weird with the javascript state
appearing to be the same on both profiles. Might look at it again.

------
plaidfuji
I'm as guilty of this as anyone, but I'll tell you how to fix it:

Make code review part of peer review.

I think the next generation of scientists at least understand that things like
unit tests are useful, but passing peer review is the only incentive to do
anything, and reviewers don't read code.

~~~
newen
Beyond impractical. Besides it taking 10x as long to peer review something, I
can't even imagine my advisor willing to do this. Most professors don't even
touch code, they leave it to the graduate students. Hell, I know professors
(in EE) who teach code heavy classes that don't even know how to run a single
line of the code that they teach; but that's another topic.

------
musgravepeter
I did a project like this to get three body physics for an iOS/Android app
Three Body.
([http://nbodyphysics.com/blog/2015/12/](http://nbodyphysics.com/blog/2015/12/)).
Converting Fortran from 1973 into C# for Unity. This code (only 1000 or so
lines) is very dense.

I also cleaned up my own mess from 1996, when I recently re-released a General
Relativity package for Maple, GRTensorIII
([https://github.com/grtensor/grtensor](https://github.com/grtensor/grtensor))
that was part of my PhD work. The 1996 code is not _that_ tragic, but is far
from my best work - although I did earn my PhD.

------
MurrayHill1980
People usually respond to how they are measured and rewarded. This applies to
software written by graduate students. (Most professors code in English using
students.)

------
dasmoth
A bit of a counterpoint:

[http://yosefk.com/blog/why-bad-scientific-code-beats-code-
fo...](http://yosefk.com/blog/why-bad-scientific-code-beats-code-following-
best-practices.html)

~~~
belusidaty
So, I'm a professor and, like a lot of things, think there's truth to both
sides.

Computer science is something that seems underrecognized in the field I work.
People clearly acknowledge its importance, but then turn around and basically
ignore it when talking to potential grad students or mentoring undergrads in
prep for grad school. We don't offer any kind of course like "programming for
X" even though most of the students need it.

My experience closely parallels the "why bad scientific code beats code
following best practices." I've had comp sci students come in and what happens
is they clearly understood python and java, but had difficulty understanding
the problems with inheritance, and wrapping their head around other more
functional languages we were using. They also were unfamiliar with the
statistical/content areas, so had difficulty implementing things. I had
thought it would be great to have comp sci students involved (and still do)
but it didn't solve my problems like I thought--so instead of having students
who understood the concepts but not the programming, now I had students who
understood the programming but not the concepts.

When you're dealing with really intense math and statistics, it's difficult to
separate out the programming from the math. It's not like web development
where you have an "insert text here" kind of approach that works often; the
algorithms and the problems are really wrapped up in one another. This might
all be changing with data science DL and AI and that kind of stuff
infiltrating comp sci's assumed territory, but I'm not really seeing it much
so far.

It seems the prototypical situation in software design is some software that's
team-developed for mass consumption. In science, you have the reverse often,
which is software designed by small units that might be a one-off thing. These
constraints put different kinds of pressures on the process, such as intense
pressure on getting something to work correctly at all costs, including
elegant design.

Also, the unit testing thing is kind of confusing to me. Every time discussion
about a new language comes up in the context of numerical/scientific
computing, one of the big questions is "does it have a REPL"? It seems one of
the big reasons for doing this is basically unit testing. It might not be unit
testing in the formal sense that you might have at some software design
companies, but anything someone complicated involves feeding each tiny
separable part of the code something with known expected output, sometimes in
strange, boundary-testing ways, so that seems pretty similar to me. There's
also a plethora of test-case datasets out there for this very purpose.

To me the bigger problem is homogenization in software in science, that is, a
domain being dominated by a single piece of software. I think it leads to
unrecognized errors due to lack of replication across implementations, and
problems typical of monopolies (even when something is open source). There's a
kind of development benefit:cost supply:demand problem that leads to dominance
of single works of code that is really unhealthy for science (replicating with
standardized methods is good too, but to me that's a slightly different
issue).

~~~
dasmoth
Thanks for the in-depth reply, a lot here that I agree with, particularly on
the homogenisation/monoculture issue (which I wish I could offer a compelling
answer to -- telling people not to share their code would clearly be throwing
the baby out with the bathwater).

Unit test _vs_ REPLs is an interesting one. Agree that there are similarities
there (although I'd argue that your tooling needs to be pretty damn good for
unit tests to offer the responsiveness that a good REPL can). For me, I think
part of the difference is that a REPL session is personal and nobody sees the
blind alleys, while unit tests are an enduring part of the product and
something others will see, use, and potentially critique. So while they can
address the same kinds of questions, I'm not too shocked that people feel
differently about them.

------
todd8
Over the course of my career, I've had the job of working with environmental
models of lake ecosystems, aided environmental/civil engineers in debugging
the then state-of-the-art flood water analysis programs, given (brief)
guidance to Ph.D. graduate engineering students on the programing for their
dissertations, and worked with mechanical engineers that insisted on doing the
real-time machine control programming themselves because the software guys
took too long.

In each case, the people being given responsibility for the programs where in
way over their heads. They were smart and educated and generally viewed
programming as a skill somewhat akin to typing, something anyone could learn
to do adequately, if not quickly, in a few days of practice.

The ecosystem simulation made unnecessary oversimplifications (assuming that
an exponential relationship could be modeled as a linear one because the
engineers didn't know how to handle the integration of an exponential(!!!) and
how that should be handled in a program).

The flood control modeling program, used by the Federal government, was some
of the worst code I had ever seen. It was written in a fashion where variables
were treated a bit like a small finite set of registers. Ten or twelve global
variables in the program were reused over and over for different purposes,
sometimes to return computed answers, sometimes as temporaries inside of some
function, and sometimes as iteration counters. It was a complete mess.

A graduate student that had never learned C or C++ was being given a previous
grad student's C++ simulation code to use as the basis for his dissertation
work. That code was pretty useless and poorly documented. Software engineering
principles played no part in the work these students were doing, they were
from a different discipline.

In the case of the real time machine control, programs were written in
thousands of lines of assembly code and would control perhaps a dozen
asynchronous activities though a combination of interrupt handlers, polling
loops, and time outs. They were frustrated that the machines would simply hang
every day or two. What a mess.

------
c517402
I think there is a perspective that has been overlooked in these comments. It
is that the "professors' unprofessional programs" are probably very
understandable to someone who deeply understands the science/engineering
concepts the programs represent. Especially, if you get some history about
what was developed first and what order features were added, the spaghetti
starts to become much more understandable. Of course, after passing through
the hands of multiple grad students the code should probably be refactored,
but how many of you recode a big framework just because it isn't perfect for
your situation.

~~~
geebee
I'm not a professor, but I started my career as an industrial engineering grad
(MS, not PhD) who was writing code to build and solve a linear program for a
supply chain application.

It was definitely one of those programs that does this, than that, the
sometimes the other thing, and it went on forever. I had trouble understanding
it myself.

I didn't end up rewriting it because, well, startups. The whole thing went
kaput, and nah, didn't have much to do with the code.

I would agree with you that these programs are _more_ understandable to
someone with expertise, and that can be why it's so hard to refactor them.
It's difficult to become this expert in a branch of science and also take the
time to learn good software design and programming practices.

But overall, I'd say that these programs could become vastly more manageable
and easier to understand through better programming practices.

------
SeanDav
Pretty much this is the problem:

> _" Those who do put effort into producing good code risk being seen by their
> colleagues as time-wasters."_

Producing good code takes time and effort. It would seem a complete culture
shift is required before any significant changes will happen.

------
zwischenzug
Interesting to me (tho I can't view the article) as I'm doing work on the side
to train mathematicians in git.

I wrote this course as a basis:

[http://learngitthehardway.tk/learngitthehardway.pdf](http://learngitthehardway.tk/learngitthehardway.pdf)

Finding the right pitch point for someone to learn in the right way is really
hard. People come to things like git and build systems from all sorts of
angles.

------
georgeecollins
I have a lot of sympathy for this. My first job out of school was working in a
lab, writing a lot of software for them that was for cortical mapping. It was
the kind of job where they expected you to joint a PhD program. You got to co-
author papers. It was the first time I programmed a lot. I'm sure everything I
wrote was pretty ugly.

------
chris_wot
You think this is just in academia? Can I direct you to the Libreoffice
codebase?

------
gaius
[https://news.ycombinator.com/item?id=11359181](https://news.ycombinator.com/item?id=11359181)
dupe

~~~
klez
I wouldn't consider it a dupe after more than a year and a half.

But maybe a (2015) in the title would be a good idea.

~~~
Nanite
True, but happy he pointed it out. The older thread basically covers
everything there is to be said about that article.

------
j45
Refactoring professor's code could be compared to fixing MVP's.

Both appear to be hyper focused on solving the problem that little else
matters.

------
roel_v
I do exactly this, AM(A)A.

~~~
eric_bullington
> I do exactly this, AM(A)A.

Thanks! Do you do this kind of work as a contractor, presumably?

Do you also typically write documentation for the projects you're working on
(inside from inline comments)?

How did you get started in this line of work?

~~~
roel_v
Technically as a contractor, yes - but I'm not like 'most' contractors where I
go in and out for a few months at a time. Rather, I'm part of the research
grant and as such work alongside the scientists to make their prototypes into
something more robust (sometimes market ready, sometimes not yet, depends),
usually in years-long projects.

No I don't usually write docs - I get the authors or maintainers of the models
I work on to do that. I do guide them, provide templates and examples, and ask
for clarification when they're missing parts. I have standardized
methodologies ready for that, which I developed myself mostly (this is one of
my USP's, as long as I manage to convince people of the value, which is quite
hard and which I often fail at). I don't think it's good practice or very
efficient for programmers to reverse-engineer the whole thing because you have
to become a domain expert to do so. I also think that this is why it's not for
everybody - too many people let themselves get sucked in too deep, making it
very time intensive. I understand the temptation, it's much more
intellectually satisfying to go deep yourself. But I think you need to be as
much project manager as programmer, so that you can get the actual domain
experts to figure out the complicated (domain knowledge) parts, and limit
yourself to factor out/replace the plumbing and introduce good software
engineering practices. Those usually don't last after you (I) leave though, so
it's also important not to get too worked up about that.

I started out at a research group that found itself accidentally too heavy on
software people, the group got into projects doing software stuff because of
that, developed a reputation for being 'the software guys' and failed as a
research group because of that (it's a lot more complicated than that, this is
the Cliff's Notes obviously). Through many coincidences that can't be
replicated on purpose, I'm now hyper-specialized in doing the thing the OP
describes in a tiny, narrow field.

The 'trick' (well it's not a trick really) to get work is to be very well
connected and work hard to remain that way (being well connected is not
something you find yourself in, it's the result of many years of thankless,
feedback-less grinding), make your work visible to the outside (i.e.
marketing, although obviously the 'buy Adwords' type of marketing is 100%
useless here) and to know the science funding processes very, very well to
understand incentives of all parties involved. This last part is vastly
underestimated; not just for what I do, but also for researchers themselves.
For example, the reason I'm usually in is when the project asks for something
with demonstrable real-world application (this is a very common requirement
the last decades, even for highly theoretical fields). So knowing how to put a
veneer on theoretical work is a very non-obvious but highly valuable skill.
('veneer' is not 'hiding things' or 'faking', which will work maybe once or
twice - I'm talking about (essentially) science communication more than
'writing papers' science).

Furthermore, being realistic is also important. I'm never going to be rich
doing this, nor will I ever employ large amounts of people (or any people at
all apart from the occasional 1 or 2 day freelance subcontractor). It's also
something for the long haul - 10 years to become established. Other downsides
are the sometimes infuriating academic politics, the eternal 'I'm a
mathematician/physics/CS PhD so I'm God's gift to mankind and everything that
cannot be distilled down to a theorem is 100% useless' characters you run into
(they're not that common tbh but they still annoy me endlessly), and not
having clear goals or even goals at all. It's like being a PhD student except
worse, and with no end in sight or even a thesis to work towards. The upsides
are long-term contracts, lots of freedom, intellectually more stimulating work
than writing marketing websites or CRUD apps, and being respected as an expert
at something even if it's a tiny sliver you're considered to be an expert at.

~~~
chubot
Wow this sounds really great! I'm impressed that you managed to navigate this
area. It sounds like you have a very intelligent approach.

It's something I'd like to do, but the networking/politics might keep me out
of it. (My family are in the sciences, so I'm familiar with how academia
works.)

It's a shame it doesn't pay more. On the other hand, getting exposed to a lot
of different things is more valuable long term than specializing in corporate
niche.

I left a pretty high-paying job to work on open source (for $0) so I can
relate.

~~~
roel_v
"I'm impressed that you managed to navigate this area. It sounds like you have
a very intelligent approach."

To be honest, it's likely more survivor bias than skill or successful
execution of a preconceived plan. This is important to recognize as I'm asked
a few times a year how to end up in the place I'm in, but I don't think there
is a solid path to do so, nor am I in a position to give advice beyond the
standard 'think hard, work hard, keep your eyes out for opportunities, pay
yourself first'.

Just saying - in case anyone is reading this in the hope of steering their
career one way or the other, don't take the answers I gave here as anything
more than an anecdote :)

