
Professors’ unprofessional programs have created a new profession - Athas
http://www.economist.com/news/science-and-technology/21695377-professors-unprofessional-programs-have-created-new-profession-more
======
Animats
Most academic code won't get repurposed. If a rewrite is necessary for
commercial use, so be it.

I once did a job like that in 1996. I was building a collision detection
engine for an animation system. I had Lin's "I-Collide" to look at, and the
paper that goes with it. This was one of the first fast many-vs-many collision
detection systems. There's a bounding box layer for culling, and then a
detailed layer for precision collision detection between pairs of objects.

The code was in C, but written strangely for C. Everything was lists, handled
with raw pointer manipulation. When I saw something being appended to a list
by reversing the list, adding a new member at the head, and reversing the list
again, I realized that this had been written by a LISP programmer. As I found
out later, it was originally in LISP, but had been rewritten in C to speed it
up.

So I started over in C++, using classes for the geometry (Face, Edge, Vertex,
Polyhedron, etc.) and collection classes (Microsoft's, this was pre-STL) for
all the links between the geometry objects. This is one of those problems that
maps well to an object model. I used a standard library of inlines for all the
geometric calculations, added lots of checking for geometric inconsistencies,
and commented the code properly. I noticed that the original implementation
used a bubble sort when starting up the axis-oriented bounding box level,
which made for a slow startup. I replaced the original separating vector
algorithm with GJK. (I think mine was the first collision engine to use axis
oriented bounding boxes with incremental GJK, which is fairly standard now.) I
wrote a different approach to managing pairs of objects, so that pairs were
created when objects got close together and released when they were no longer
close. (I-Collide generated pairs, but never released them, which was a memory
leak.) I added code to detect fast fly-throughs, so that a fast moving object
couldn't get through another object between frame times. I hooked the system
up to QHull to generate convex polyhedra from input geometry, and OpenGL so I
could see what was going on. The end result was faster and much more robust
than the original.

That's the difference between academic proof of concept code and production
code. Having the I-Collide implementation to look at and run was very helpful
for insight into the algorithms, but in the end, the job required all new
code. Making academics do this is inappropriate. Really, very little code
needs to undergo such a process.

~~~
dreamcompiler
Just to be clear, the Lisp code you describe sounds very amateurish. Like the
code described in the post, its quality is a function of the programmer, not a
function of Lisp. No professional Lisp programmer would write double-reverse
consing lists for a serious application. He/she would use a tail-consing
structure, an array, or (if it was Common Lisp) objects, as you did. The fact
that they tried to speed this up by translating everything literally to C
rather than first trying better algorithms and data structures is another red
flag suggesting inexperience.

------
vanderZwan
> _Those who do put effort into producing good code risk being seen by their
> colleagues as time-wasters._

I wonder how important this cultural factor is. Is there peer pressure among
academics to not put effort into learning proper programming?

(As an aside, on a personal level it's kind of funny reading this because I
was hired as a programmer by professor Sten Linnarsson (via a in a "who's
hiring thread" so through this very website). During the job interview I got
the impression he is a very capable programmer but simply has to budget his
time.)

~~~
Fomite
It's not peer pressure - indeed, if anything, the people making noise are the
ones espousing _good_ coding practices. The problem is there's an intense
selective pressure not to spend time on great code. There's a number of
reasons for this - lots of code is only run once "for reals" or a small number
of times. It's not necessarily going to be visible to the outside world. And
there are other time pressures.

To use two examples from my own work:

1\. Right now, I'm working on some code that could really use improvement.
It's hard coded to a specific project, when it could be generic. The variable
names are a mess. It's run on a single thread, when it's practically begging
to be parallelized.

Doing all that, for me, will take some time. In the meantime, several things
that will actually matter for my tenure portfolio someday are due. Things that
will support my lab and it's projects, whereas this code...might come up again
in a few months? And as I'm not writing ground-breaking software, "Authored
X,Y,Z package..." is not going to be something a tenure committee even
considers. Guess which one I'm going to work on?

2\. I worked on some forecasting code for Ebola that I _was_ actually proud
of. Cleaned up a messy, ad-hoc, "And then I go in and change some variables
based on what you put up in a Google Doc..." style system. Fully documented
it. Made the variables arguments instead of being hard coded. Designed it so
it could be used with minimal expertise.

It was used for a week before the epidemic crested and forecasting stopped
being of particular interest. There are papers I could have written instead.

~~~
tikhonj
I understand this perspective, and I've succumbed to the same sort of pressure
in the past, but doesn't it feel like you actually end up losing net time in
the long run by doing this?

I mean, you save a bit of time up-front to hit a deadline but then you have to
spent hours of frantic debugging when you should be sleeping to rerun a
slightly modified benchmark for some other deadline. Or somebody suggests a
good idea that you can't test quickly because the code is a mess. And then you
either spend a bunch of extra time on it or pass on something that _should_
have been easy because it wasn't.

I only worked on a CS research project as an undergrad, but there we had the
same dynamic. On the one hand, it made sense not to spend too much time
writing good code, automating benchmarks... etc. On the other we spent too
much time chasing down stupid bugs as the project went on. I wouldn't be
surprised that, had we been slightly calmer and more methodical up front, we'd
have saved time overall _that month_ if not in the _same week_.

And that's not counting how code like that makes research worse. Like, I see
why writing a framework to optimize your benchmarks could be a waste of time,
but it could also be the difference between running a bunch of tests to
evaluate the project instead of just shoving the first ones we thought of into
the paper because running more would take too much effort.

It's not unique to research either: I ran into exactly the same sort of
tradeoff working at an early stage startup. We care about putting out features
above all until things came tumbling down. Spending a week or two chasing down
bugs and making new features more expensive was bad, but then again we did
need to test things and establish product/market fit quickly...

My intuition is that this sort of pressure-based decisions end up a net loss
of time even in the medium term, maybe even on the scale of weeks. But I don't
know how to actually measure this sort of thing: we never measured how much
time we spent on different tasks, and time is probably the wrong thing to
measure anyhow. Debugging stupid code saps creative energy and lucidity even
more than time, and how do I know what potential things didn't get done
because the code was too ugly or, on the flip side, what things I could have
had done in the time I spent making code better?

How do you deal with this sort of trade-off? I'm not being rhetorical in the
least, by the way: I genuinely don't know. But I really feel that incentives
are geared towards short-term benefits in a way that is simply worse overall
even across relatively short time horizons.

~~~
BookmarkSaver
>It's not unique to research either: I ran into exactly the same sort of
tradeoff working at an early stage startup.

This is what you are missing. It is research. What they are saying is that the
vast majority of the code academics write is "throwaway". It will only ever be
used for a brief time or in a very limited capacity and then never again. This
isn't software development. There is not technical debt incurred, because they
almost never actually have to go back and run the code or repurpose it.
There's not point in putting in the effort "up-front" for a deadline because
there is no "later" that they usually have to deal with.

This article is about the exceptions. And making sure to write cleanly
factored, well documented, maintainable and flexible code every single time
you need a program isn't worth it for the very rare instances when someone
else actually needs to continue developing it.

~~~
dragandj
You are right, but...

I, as an academician, have another perspective. Often the very reason why the
code is throwaway is because it is so bad. That's also one of the reasons why
it is so difficult to replicate research. If code was more reusable, maybe
there would not be that much need for throwaway because, well, we could use
the code from one research project in another. But, it is difficult to teach
old dogs new tricks, I guess...

------
vorg
> In the rarefied atmosphere of academia, [spaghetti code] is generally good
> enough. For commercial applications, though, it is intolerable.

Whoever wrote that article doesn't know what really goes on in commercial IT
outfits. A lot of business code gets written by "programmers" who lied on
their CV's, unit tested by the business users after roll out, and programmers
come in on after-hours callouts for years afterwards to "firm up the business
requirements".

------
JorgeGT
As a researcher (who codes contingently) I've enjoyed for a long time the blog
of Mike Croucher
([http://www.walkingrandomly.com/](http://www.walkingrandomly.com/)) who was
Head of Scientific Applications Software Support at The University of
Manchester and blogs accordingly, with interesting tips and insights on this
issue. Very very recommended!

------
skierscott
This article describes what I'm doing right now.

The academic group I am a part of have developed an adaptive or active machine
learning system. This code is being used by the New Yorker to run their
caption contest (example contest at [1]). It tries to find the funniest
caption and uses previous answers to decide which question to ask next.

The code used to create this contest is a _mess._ Developing a new experiment
type (not a new algorithm) meant copying and pasting roughly 1500 lines and
changing 20.

We have since rewritten it to make it much better, and the shared code is in
only one file. Developing a new experiment type is only 120 lines of code now
:)

[1]:[http://www.newyorker.com/cartoons/vote](http://www.newyorker.com/cartoons/vote)

------
vbtemp
The funny thing is, writing good code takes no longer than writing crappy
code. For anything non-trivial, writing good code takes less effort, less
time, minimizes aggravation, and saves you so much more time to work on other
things that are important.

People have always struck me as masochistic when it comes to this. I think
people write crappy software because they've become so habituated into acting
"busy" that writing crappy code is just the thing into tricking them into
feeling like they're doing what they ought to do (instead of just doing it
right the first time, and going at a relaxed pace)

~~~
JSDave
Learning how to write good code takes much longer than just writing crappy
code.

~~~
dave_sullivan
You've got to write a lot of crappy code before you can write good code, so it
works out.

------
mwest
I've you're involved with scientific research and coding in the UK, both the
SSI ([http://www.software.ac.uk/](http://www.software.ac.uk/)) and Research
Software Engineers community ([http://www.rse.ac.uk/](http://www.rse.ac.uk/))
provide some excellent resources.

------
jernfrost
Sounds interesting. I think it would be cool to fix shitty science software.

~~~
Fomite
Someday, I aspire to have a well-funded enough lab to hire someone to actually
code my stuff well.

~~~
mwest
Keep an eye out for funding calls like this one:
[https://www.epsrc.ac.uk/funding/calls/rsefellowships/](https://www.epsrc.ac.uk/funding/calls/rsefellowships/)

------
pc2g4d
Spaghetti code is not exclusive to academia. I actually felt like we produced
some decent code in my lab, and we utilized some academically-produced
libraries that weren't half bad, either. But that C89 government code we used
was another matter altogether....

------
jimmahoney
Having done a variety of data analysis and computational modeling while
working as a PhD physics student and postdoc, I can attest that many of my
colleagues wrote terrible code.

The problem wasn't that their code didn't have a nice user interface, or
wasn't designed for reuse. The problem was that it was often impossible to
really know whether it worked correctly or not.

In my experience, whether the code you're writing is going to run once or go
into production, it needs to testable and understandable - enough that you
have confidence that it does what you think it's doing. And that means
building it in small enough pieces to analyze. Otherwise you just don't know.

------
xamuel
Another contributing factor is this certain phenomenon that's happening with
the academic job market. There's such an over-supply of new STEM PhD's looking
for postdoc positions, it's becoming common for PIs to hire them up as super
cheap programmers.

On paper, they're academic postdoc researchers. But in practice, they're full-
time programmers.

So you end up with people who have PhDs in theoretical astrophysics or
whatever, and who learned somewhere on the side how to hack things together in
Matlab, working full-time as programmers and you get the inevitable result.

------
tjl
A former Ph.D. student of my supervisor(s) in grad school designed his linear
algebra code in such a way that he was able to separate it out and create a
business from it. I've seen the rest of the code and it's a bit of a tangled
C++ code, but the linear algebra classes were really well designed.

------
merraksh
Turning (possibly closed source) academic code into (closed source) business
code does not make it more reliable, readable, or efficient necessary. The new
profession is made possible by the openness of the code, which is not
necessarily spaghetti.

I would rather think spaghetti academic code is written by people who never
wrote code before (I've witnessed a few such examples), and these people are
often students not professors.

------
alloyed
That headline really is something.

------
julie1
Education is based on mandarinat.

Teachers often never left universities. Teachers never faced the problem of
working. Teachers often steal the work of their students. Teachers don't care
about readable code. Teachers don't care about security.

How can they make student good at something they ignore.

Our education system sux as a whole. Private, public, they all sux big.
Education requires critical thinking. Passing exams require obedience.

And, there was never an evidence in the first place that CS is related in
anyway to better quality or productivity in coding. Not a single study can
prove it. Yet students takes loans and bankrupt themselves for life making
theses studies.

Universities needs to be reformed.

PS they are non CS specialty that requires to get the job done with computers
that are doing a better job at teaching CS than the specialists...

