
Why bad scientific code beats code following “best practices” (2014) - ingve
http://yosefk.com/blog/why-bad-scientific-code-beats-code-following-best-practices.html
======
modeless
I think there is a growing rebellion against the kind of software development
"best practices" that result in the kind of problems noted in the article. I
see senior developers in the game industry coming out against sacred
principles like object orientation and function size limits. A few examples:

Casey Muratori on "Compression Oriented Programming":
[https://mollyrocket.com/casey/stream_0019.html](https://mollyrocket.com/casey/stream_0019.html)

John Carmack on inlined code: [http://number-
none.com/blow/john_carmack_on_inlined_code.htm...](http://number-
none.com/blow/john_carmack_on_inlined_code.html)

Mike Acton on "Data-Oriented Design and C++" [video]:
[https://www.youtube.com/watch?v=rX0ItVEVjHc](https://www.youtube.com/watch?v=rX0ItVEVjHc)

Jonathan Blow on Software Quality [video]:
[https://www.youtube.com/watch?v=k56wra39lwA](https://www.youtube.com/watch?v=k56wra39lwA)

~~~
quotemstr
For ages now, I've been telling people that the best best code, produced by
the most experienced people, tends to look like novice code that happens to
work --- no unnecessary abstractions, limited anticipated extensibility
points, encapsulation only where it makes sense. "Best practices", blindly
applied, need to die. The GoF book is a bestiary, not an example of sterling
software design. IME, it's much more expensive to deal with unnecessary
abstraction than to add abstractions as necessary.

People, think for yourselves! Don't just blindly do what some "Effective
$Language" book tells you to do.

(For starters, stop blindly making getters and settings for data fields!
Public access is okay! If you really need some kind of access logic, change
the damn field name, and the compiler will tell you all the places you need to
update.)

~~~
tigershark
Are you seriously saying that the best code is an untestable mess of big God
classes? Because in my experience this is _by far_ the type of code written by
inexperienced programmers. Abstractions and interfaces are the best way to
make a system testable and extensible and it has nothing to do with using a
pattern just because you read about it in the gof book 5 minutes ago. And
using public fields in a non trivial project is a sure receipt for disaster.

~~~
et1337
> using public fields in a non trivial project is a sure receipt for disaster.

This is just dogma. Every Python project in existence has 100% public fields.
Some are disasters, some are beautiful. Only a Sith deals in absolutes.

~~~
eximius

      Better than ugly, beautiful is.
      Better then implicit, explicit is.
      Better than complex, simple is.
      Better than complicated, complex is.
      Better than nested, flat is.
      Better than dense, sparse is.
      Counts, readability does.
      Special enough to break the rules, special cases are not.
      But beaten by practicality, purity is.
      Silently passed, an error should never be.
      Unless explicitly, is it silenced.
      In the face of ambiguity, the temptation to guess, refuse you must.
      One, preferably only one, way to do it, there should be.
      Not obvious, it might be. 
      Better than never, is now.
      But often better than right now, is never.
      If hard to explain, bad it is.
      If easy to explain, good it may be.
      Namespaces are a honking good idea - more of them we should do!
    

\-- The Zen of Python, Yoda

~~~
kazinator
> _Namespaces are a honking good idea - we should do more of them!_

That should be:

    
    
       english.namespaces english.verbs.are english.articles.a american.english.vernacular.adjective.honking ...

~~~
pyre
You would prefer PHP[1] where the standard library has _no_ namespaces?

[1] I refer to "classic" PHP. No clue if anything PHP5+ fixed this, though I
doubt that they would make such a breaking changed even across major
revisions.

~~~
kazinator
Yes, I would. And though PHP is widely agreed to be piece of shit (it seems: I
don't work with PHP so I'm here only relaying this popular sentiment), that
doesn't tarnish the idea by association (which is what I sense you might be
trying to do).

ISO C and POSIX also have a flat library namespace, _together_ with the
programs written on top. Yet, people write big applications and everything is
cool. Another example is that every darned non-static file-scope identifier in
the Linux kernel that isn't in a module is in the same global namespace.

Namespaces are uglifying and an idiotic solution in search of a problem. They
amount to run-time cutting and pasting (one of the things which the article
author is against). Because if you have some foo.bar.baz, often things are
configured in the program so that just the short name baz is used in a given
scope. So effectively the _language_ is gluing together "foo.bar" and "baz" to
resolve the unqualified reference. The result is that when you see "baz" in
the code, you don't know which "baz" in what namespace that is.

The ISO C + POSIX solution is far better: read, fread, aio_read, open, fopen,
sem_open, ...

You never set up a scope where "sem_" is implicit so that "open" means
"sem_open".

Just use "sem_open" when you want "sem_open". Then I can put the cursor on it
and get a man page in one keystroke.

Keep the prefixes short and sweet and everything is cool.

I was a big believer in namespaces 20 years ago when they started to be used
in C++. I believed the spiel about it providing isolation for large scale
projects. I don't believe it that much any more, because projects in un-
namespaced C have gotten a lot larger since then, and the sky did not fall.

Scoping is the real solution. Componentize the software. Keep the component-
private identifiers completely private. (For instance, if you're making shared
libs, don't export any non-API symbols for dynamic linking at all.) Expose
only API's with a well-considered naming scheme that is unlikely to clash with
anything.

------
jcoffland
This article is anecdotal and ranty but I will respond anyway. I've spent the
last 15 years working on various projects involving cleaning up scientific
code bases. Messy unengineered code is fine if only a very few people ever use
it. However, if the code base is meant to evolve over time you need good
software engineering or it will become fragile and unmaintainable.

That said, there are many "programmers" who apply design concepts willy nilly
with out really understanding why. They often make a bigger mess of things.
There is an art to quality software engineering which takes time to learn and
is a skill which must be continually improved.

The claim in the article that programmers have too much free time on their
hands because they aren't doing real work, like a scientist does, is obviously
ridiculous. Any programmer worth their salt is busy as hell and spends a lot
of thought on optimizing their time.

Conclusion, scientists should work with software engineers for projects that
are meant to grow into something larger but hire programmers with a proven
track record of creating maintainable software.

~~~
sixbrx
I've had similar experience with scientific software. When I'm told that the
existing software is "OK because it works", I ask "how do you know it works?"
because typically there are no unit tests or tests of any sort of individual
stages for that matter.

I've found that scientists tend to assume "it works" when they like the
results they see such as R^2 values high enough to publish.

Recently I converted some scientific software that was using correlation^2
(calling it R^2) as a measure for model predictions, as opposed to something
more appropriate like PRESS-derived R^2s (correlation is totally inappropriate
for judging predictions because it's translation and scale independent on both
observed and predicted sides). Nobody went looking for the problem because
results seem good and reasonable. Converting to a proper prediction R^2, some
of the results are now negative, meaning the models are doing worse than a
simple constant-mean function. Yikes.

------
ThePhysicist
What most people seem to forget is that "best practices" are not universal:
Depending on the size and scope of the software project, some best practices
are actually worst practices and can slow you down. For example, unit testing
and extensive documentation might be irrelevant for a short term project /
prototype while they will be indispensable for code that should be understood
and used by other people. Also, for software projects that have an exploratory
nature (which is often the case for scientific projects) it's usually no use
trying to define a complete code architecture at the start of the project, as
the assumptions about how the code should work and how to structure it will
probably change during the project as you get a better understanding of the
problem that you try to solve. Trying to follow a given paradigm here (e.g.
OOP or MVC) can even lead to architecture-induced damage.

The size of the project is also a very important factor. From my own
experience, most software engineering methods start to have a positive return-
on-investment only as you go beyond 5.000-10.000 lines of code, as at this
point the code base is usually too large to be understandable by a single
person (depending on the complexity of course), so making changes will be much
easier with a good suite of unit tests that makes sure you don't break
anything when you change code (this is especially true for dynamically typed
languages).

So I'd say that instead of memorizing best practices you need to develop a
good feeling for how code bases behave at different sizes and complexities
(including how they react to changes), as this will allow you to make a good
decision on which "best practices" to adopt.

Also, scientists are -from my own experience- not always the worst software
developers as they are less hindered by most of the paradigms / cargo cults
that the modern programmer has to put up with (being test-driven, agile,
always separating concerns, doing MVP, using OOP [or not], being scalable,
...). They therefore tend to approach projects in a more naive and playful
way, which is not always a bad thing.

~~~
ben_jones
Steve Ballmer on "KLOCs" [1]. Not saying you're taking that extreme but LOC
value is certainly debatable...

[1]:
[https://www.youtube.com/watch?v=kHI7RTKhlz0](https://www.youtube.com/watch?v=kHI7RTKhlz0)

~~~
mikekchar
Complexity is related to size, but is also related to coding style.
Comparisons of LOC is meaningless outside of a context,but surprisingly useful
inside of a context (and as long as you don't use them as metrics, because
they can be gamed too easily).

If you want to see this in action, write a script that will troll your code
base and count the total number of uncommented lines of code every day. Draw a
graph. Even without knowing anything about your project, I think you will find
a very interesting thing -- namely that the code base grows consistently and
that the amount it grows per day is a random variable with a normal
distribution. (Obviously this only works if you have a consistent number of
developers)

If you then do a rolling average (say every 2 weeks), I think you will find
something even more interesting: the rate of change will going in one
direction or another -- either higher or lower and it will be doing it
consistently (normalizing for the number of developers is a bit easier here).

Once you have verified that, you can ponder about what it all means.

------
whorleater
Disclosure: I'm a recent astronomy grad who specialized in computational
astrophysics. Definitely biased.

The issue is that at least for many scientists and mathematicians,
mathematical abstraction and code abstraction are topics that oftentimes run
orthogonal to each other.

Mathematical abstractions (integration, mathematical vernacular, etc) are
abstractions hundreds of years old, with an extremely precise, austere, and
well defined domain, _meant to manage complexity in a mathematical manner_.
Code abstractions are recent, flexible, and much more prone to wiggly
definitions, _meant to manage complexity in an architectural manner_.

Scientists often times have already solved a problem using mathematical
abstractions, e.g. each step of the Runge-Kutta [1] method. The integrations
and function values for each step is well defined, and results in scientists
wanting to map these steps one-to-one with their code, oftentimes resulting in
blobs of code with if/else statements strewn about. This is awful by software
engineering standards, but in the view of the scientist, the code simply
follows the abstraction laid out by the mathematics themselves. This is also
why it's often times correct to trust results derived from spaghetti code,
since the methods that the code implements themselves are often times
verified.

Software engineers see this complexity as something that's malleable,
something that should be able to handle future changes. This is why it code
abstractions play bumper cars with mathematical abstractions, simply because
mathematical abstractions are meant to be _unchanging_ by default, which makes
tools like inheritance, templates, and even naming standards poorly suited for
scientific applications. It's extremely unlikely I'll ever rewrite a step of
symplectic integrators [2], meaning that I won't need to worry about whether
this code is future proof against architectural changes or not. Functions, by
and large in mathematics, are meant to be immutable.

Tl; dr: Scientists want to play with Hot Wheels tracks while software
engineers want to play with Lego blocks.

[1]:
[https://en.wikipedia.org/wiki/Runge–Kutta_methods](https://en.wikipedia.org/wiki/Runge–Kutta_methods)

[2]:
[https://en.wikipedia.org/wiki/Symplectic_integrator](https://en.wikipedia.org/wiki/Symplectic_integrator)

~~~
moron4hire
meeeeh, come on. You can't say the sloppy code can be trusted because the
clean math it is based on is verified. The sloppiness of the code prevents
validation that it properly implements that precious math of yours.

The problem is that you want to treat the code as not your "real" job. Your
real job is getting correct answers into published papers, and providing a
proof of that correctness. If your code, on which your results rely, is too
sloppy for anyone else to understand (and note that "anyone else" can include
"you, in 6 months"), then you've not proven correctness at all.

~~~
whorleater
>you want to treat code as not your "real" job

I'm not treating anything, it's because coding _isn 't_ my job. The job of a
scientist is to do research, and coding is nothing more than a tool towards
that goal.

>your code, on which your results rely, is too sloppy for anyone else to
understand...then you've not proven correctness at all

No, my results rely on my experimental methods, my mathematical models, and my
code. Correctness can be proven _in spite of sloppy code_. Would you dispute a
claim on the basis that calculations done on a calculator can't be seen by
others?

Furthermore, the burden of proof after peer review in academia is on the
person disproving in it. If my code is wrong at a basic level, what good does
it do for anyone? If someone is to disprove my paper, they should reimplement
the code in order to account for errors.

Does this excuse spaghetti level code that often accompanies papers? Of course
not. Scientists have a lot to learn from software engineering about proper
programming skills, but programming is simply another tool in the repertoire,
not something that should be put on a pedestal.

~~~
ben_jones
> coding is nothing more than a tool towards that goal.

That's an important idiom that most devs need to understand at some points in
their career, but don't. It's not even exclusive to business goals, but sanity
and complexity ones as well..

------
mcguire
" _Crashes (null pointers, bounds errors), largely mitigated by valgrind
/massive testing_"

Once upon a time I had lunch with a friend-of-a-friend whose entire job, as a
contractor for NASA, was running one program, a launch vehicle simulation.
People would contact her, give her the parameters (payload, etc.) and she
would provide the results, including launch parameters for how to get the
launch to work. Now, you may be thinking, that seems a little suboptimal. Why
couldn't they run the program themselves; they're rocket scientists, after
all?

Unfortunately, running the program was a dark art. The knowledge of initial
parameter settings to get reasonable results out of the back end had to be
learned before it would provide, well, reasonable results. One example: she
had to tell the simulation to "turn off" the atmosphere above a certain
altitude or the simulation would simply crash. She had one funny story about a
group at Georga Tech who wanted to use the program, so they dutifully packed
off a copy to them. They came back wondering why they couldn't match the
results she was getting. It turns out that they had sent the grad students a
later version of the program than she was using.

Anyway, who's up for a trip to Mars?

~~~
noobermin
Here's the thing that grinds my gears. Let's see scientists apply that same
attitude toward papers. Let them label a bunch of equations poorly, and not
label a few, have them explain concepts out of turn in different places in the
document, have them produce shitty, unreadable figures, let's see how that
turns out.

The issue is that code which eventually leads to their results _isn 't_
public, they don't have their reputation lying on it, and so they can pretend
they understand what they talk about when they come to publishing, but one or
two looks at their code let's you know they hardly bullshit. But when if comes
to a paper, well, they will be judged on that, so they can't be messy there.

It's okay if it's a one off code for one group, that's fine. But when a code
is vital for so many people, for it to be that terrible and inaccessible?

Simple solution: if you are funded by the tax payer, what you produce should
be accessible by the tax payer (absent defense restrictions). Demanding
accessibility for gov't funded papers is good but I feel the same restriction
should apply to code.

~~~
munin
> Let them label a bunch of equations poorly, and not label a few, have them
> explain concepts out of turn in different places in the document, have them
> produce shitty, unreadable figures, let's see how that turns out.

This is what they already do, though...

------
sseagull
His first list really, really hand-waves the problems that style of coding can
cause. Just use better tools or run valgrind? It never is that simple.

One aspect of scientific coding is that it can have very long lifetimes. I
sometimes work on some code > 20 years old. Technology can change a lot in
that time frame. For example, using global data (common back then) can
completely destroy parallel capability.

The 'old' style also makes the code sensitive to small changes in theory. Need
to support a new theory that is basically the same as the old one with a few
tweaks? Copy and paste, change a few things, and get working on that paper!
Who cares if you just copied a whole bunch of global data - you successfully
avoided the conflict by putting "2" at the end of every variable. You've got
better things to do than proper coding.

Obviously, over-engineering is a problem. But science does need a bit of
"engineering" to begin with.

Anecdote: A friend of mine wanted my help with parsing some outputs and
replacing some text in input files. Simple stuff. He showed me what he had. It
was written in fortran because that's what his advisor knew :(

Note: I'm currently part of a group trying to help with best practices in
computational chemistry. We'll see how it goes, but the field seems kind of
open to the idea (ie, there is starting to be funding for software
maintenance, etc).

~~~
luthaf
> there is starting to be funding for software maintenance

Any reference concerning this point? I am interested!

~~~
sseagull
Here is a starting point for one big movement in my field:

[https://www.nsf.gov/news/news_summ.jsp?org=NSF&cntn_id=18934...](https://www.nsf.gov/news/news_summ.jsp?org=NSF&cntn_id=189347&preview=false)

It's not quite "maintenance", but is definitely a step away from just writing
software to get an answer and then abandoning it.

Also, anecdotally, a there is movement towards more open-source software.
Slowly but surely, things are moving in the right direction.

------
The_suffocated
I think some of the author's criticisms are misplaced.

Long functions — Yes, functions in scientific programming tend to be longer
than your usual ones, but that's often because they cannot be split into
smaller functions that are meaningful on their own. In other words, there's
simply nothing to "refactor". Splitting them into smaller chunks would simply
result in a lot of small functions with unclear purposes. Every function
should be made as small as possible, but not smaller.

Bad names — The author gives 'm' and 'k' as examples of bad variable names. I
think this is a very misplaced criticism. Unless we are talking about a
scientific library, many scientific programs are just implementations of some
algorithms that appear in published papers. For such programs, the MAIN
documentations are not in the comments but the published papers themselves.
The correct way to name the variables is to use exactly the symbols in the
paper, but not to use your favourite Hungarian or Utopian notations. (Some
programming languages such as Rust or Ruby are by design very inconvenient in
this respect.) As for long variable names, I think they are rather infrequent
(unless in Java code); the author was perhaps unlucky enough to meet many.

------
mamon
This is so true:

"Many programmers have no real substance in their work – the job is trivial –
so they have too much time on their hands, which they use to dwell on "API
design" and thus monstrosities are born"

It also explains proliferation of "cool" MVC and web frameworks, like Node.js,
Angular, React, Backbone, Ember, etc.

~~~
sotojuan
I agree except (to be pedantic) I think Node is misplaced—it's just a runtime
with a tiny standard library, nothing to do with MVC. I've actually found Node
to able to be simple and nice by mostly using streams and a couple of small
libraries—something most Node programmers ignore.

I actually think another problem is that programmers spend too much time
following what the "big players" do and mistakenly apply that stuff to their
so-called trivial work. I've wasted hours trying to sift through code from
companies who thought they needed Facebook/Google-tier infrastructure with
stuff like Relay/GraphQL. A simple CRUD Rails/Django/Phoenix/Node app would've
been fine.

~~~
ddebernardy
As much as it's not an MVC framewrok, I'd argue Node fits right in: it was
created by devs with waaaay too much time on their hands.

------
adrianratnapala
Mostly I agree, bad naive code is better than bad sophisiticated code.

Also science very frequenly only requires small programs that are used for one
analisys and then thrown away. It's OK to have a snarl of bad Fortran or Numpy
if it only 400 lines long.

BUT: scientific projects are often (in my old field, usually) also engineering
projects. Such experiments are complex automated data gathering machines
hardware and take rougly similar data runs tens of thousands of times.

There should be some engineering professionalism at the start to design and
plan such a machine. Especially the software, since it is mostly a question of
integrating off-the shelf hardware.

But PIs think:

(A) engineering is done most cheaply by PhD students -- a penny pinching
fallacy.

(B) that their needs will grow unpredictably over time.

B is true, but is actually is a reason to have a good custom platform designed
at the start, so that changes are less costly. Your part time programmer is
going to develop many thousand of lines of code no one can understand or
extend. (I've done it, I should know.)

~~~
shitgoose
even B is false a lot of times. Just look at most of this 'big data' \- it all
can fit on my mobile phone.

------
ska
I believe this post is fundamentally misguided, but I can see how the author
got there. In fact I see it as a sort of category error. When you talk about a
style of programming being "good" or "bad", I always want to ask "for what?".
I wonder if the author has thought about what would happen if everyone adopted
the "scientific" style they are alluding too.

Most of what the author describes as the problems of code generated by
scientist are what I would call symptoms. The real problems are things like:
incorrect abstractions, deep coupling, overly clever approaches with unclear
implicit assumptions. Of course this causes maintenance and debugging to be
more difficult than it should but the real problem is that such code does not
scale well and is poor at managing complexity of the code base.

So long as your code (if not necessarily its domain) is simple, you are fine.
Luckily this describes a huge swath of scientific code. However system
complexity is largely limited by the the tools and approaches you use .. all
systems eventually grow to become almost unmodifiable eventually.

The point is, this will happen to you faster if you follow the "scientific
coder" approaches the author describes. Now it turns out that programmers have
come up with architectural approaches that help manage complexity over the
last several decades. The bad news for scientific coders is that to be
successful with these techniques you actually have to dedicate some
significant amount of time to learning to become a better programmer and
designer, and learning how to use these techniques. It also often has a cost
in terms of the amount of time needed to introduce a small change. And
sometimes you make design choices that don't help your development at all.
They help your ability to release, or audit for regulatory purposes, or build
cross-platform, or ... you get the idea. So these approaches _absolutely_ have
costs. You have to ask yourself what you are buying with this cost, and do you
need it for your project.

The real pain comes when you have people who only understand the "scientific"
style already bumping up against their systems ability to handle complexity,
but doubling down on the approach and just doing it harder. Those systems
really aren't any fun to repair.

------
raverbashing
It's an interesting discussion, and as the article points out, "Software
Engineer" code has some issues as well

There's also an issue that code ends up reflecting the initial process of the
scientific calculation needed, which might not be a good idea (but if you
depart from that, it causes other problems as well)

Also, I'm going to be honest, a lot of software engineers are bad at math (or
just don't care). In theory a/b + c/b is the same as (a+c)/b, in practice you
might near some precision edge that you can't deal directly and hence you need
to calculate this in another way

Try solving a PDE in C/C++ for extra fun

~~~
ska
It's worse that you say (and think?). For example: in general, floating point
equality isn't transitive, and addition isn't even associative.

Not only do those "bad at math" software engineers get this wrong, most of the
scientists do too. These two groups often make different types of errors, true
- but nearly everybody who hasn't studied numerical computation wiht some care
is just bad at it.

------
dibanez
I'm 80% "software engineer" and 20% "researcher" and have to play both roles
to write supercomputer code (I'm the minority, most peers are more
researchers). These issues are important right now, as the govt is investing
in software engineering due to recent hardware changes that require porting
efforts. We recognize the pitfalls of naive software engineering applied to
scientific code, and would like to do things more carefully. I don't think we
should have to choose one or the other; with proper communication we can
achieve a better balance.

------
joseraul
In his excellent book [1], Andy Hunt explains what expertise is with a multi-
level model [2], where a novice needs rules that describe what to do (to get
started) while an expert chooses patterns according to his goal.

So, "best practices" are patterns that work in most situations, and an expert
can adapt to several (and new) situations.

[1] [https://pragprog.com/book/ahptl/pragmatic-thinking-and-
learn...](https://pragprog.com/book/ahptl/pragmatic-thinking-and-learning)

[2]
[https://en.wikipedia.org/wiki/Dreyfus_model_of_skill_acquisi...](https://en.wikipedia.org/wiki/Dreyfus_model_of_skill_acquisition)

------
nolemurs
The title of this article should really be "Why bad scientific code beats bad
software engineer code."

It contrasts a bunch of bad things scientific coders do, and a bunch of bad
things bad software engineers do. There's no "best practices" to be seen on
either side.

------
pyrale
The article overlooks a massive source of problems : the problems he describes
in engineers' code usually starts to become annoying at larger scale. The
problems he describes in scientists' code rarely happens at scale, because it
can't be extended significantly. I feel it's weird to compare codebases that
probably count in the thousands, and codebases that count in the hundreds of
thousands or million lines of code.

Also it is worth noting that every single problem he has with engineers' code
is described at length in the litterature (working effectively with legacy
code, DDD blue book, etc). Of course, these problems exist. But this is linked
to the fact that hiring bad programmers still yields benefits. I believe this
is not something that we can change, but if the guy is interested in reducing
his pain with crappy code, there are solutions out there.

------
lilbobbytables
> Long functions This isn't the worst thing. As long it gets refactored when
> there is a need for parts of that function to be used in multiple places.

> Bad names (m, k,
> longWindedNameThatYouCantReallyReadBTWProgrammersDoThatALotToo) I can live
> with long winded names, while slightly annoying, they at least still help
> with figuring out what's going on.

What I can't stand are one or two letter variable names. They're just so
unnecessary. Be mildly descriptive and your code becomes so much easier to
follow, compared to alphabet soup.

What annoys me about stuff like this is that it just feels like pure laziness
and disregard for others. Having done code reviews of data scientists they
just don't want to hear it. They adamantly don't care - compared to my
software engineer compatriots who would at least sit there and consider it.

But this is just my own anecdotal experience.

~~~
toufka
As a poster above pointed out, a lot of scientific code is an implementation
of a mathematical device. And the scientist is trying to make their equations
come to life. And in math, many equations are simplified to their variables in
order to avoid insane complexity. Many of the scientists actually are thinking
in terms of 'S', 't' and 'v', etc. What's the particle's x, y, t coordinates,
and how does that get me v, p and l? So that they can write out:

v = ((x2 -x1)^2 + (y2 - y1)^2) ^ (1/2) / t

rather than:

velocity = sqrt(pow((locationX2 - locationX1),2) + pow((locationY2 -
locationY1),2)) / duration

The latter is AWFUL mathematics, and very real code. (and that is an easy
equation. I've had to implement very very complicated calculus into
objective-c code and it is absolutely horrid what comes out as 'code', as
clean as that code might be. It in no way whatsoever resembles the elegance of
the math that birthed it.)

When I first started, I naively tried to write math code with the natural
Objective-C objects and ended up on the very wrong side of the language. I
realize the mistake now, but it's very awkward to ask the (scientist)
programmer to go along programming with the language's tutorialed objects,
then to tell them, "btw, that 'NSNumber' you have, can't be used as an
exponent, along with that 'float' over there. And you can't add NSNumbers and
'integers'. Oh, you want to multiply two NSNumbers together? You want to write
an equation with NSNumbers on one line!? Go for it. Oh, and you want to do a
cross-product on a matrix? Ha!".

------
xapata
The meat is in the footnote, as always.

> (In fact, when the job is far from trivial technically and/or socially,
> programmers' horrible training shifts their focus away from their immediate
> duty – is the goddamn thing actually working, nice to use, efficient/cheap,
> etc.? – and instead they declare themselves as responsible for nothing but
> the sacred APIs which they proceed to complexify beyond belief. Meanwhile,
> functionally the thing barely works.)

It seems the author has been plagued with programmers who avoid taking
responsibility. One strategy for creating job security is to build a system
too complex for anyone else to maintain it. Perhaps the author's colleagues
are using this strategy.

It's hard to take complaints about "best practices" seriously when the
practices described are not best.

------
thearn4
Working in this area (and coming from a math background), the biggest issues
that I have with most scientific and engineering code are:

1) lack of version control

2) lack of testing

Everything else (including the occasional bad language fit) is usually a
distant 3rd.

------
taeric

        > Simple-minded, care-free near-incompetence can be
        > better than industrial-strength good intentions 
        > paving a superhighway to hell.
    

Love this line.

I think the thing about bad scientific code that makes it good is that you can
often get really good walls around what goes in and what comes out. To the
point that you can then mitigate the danger of bad code to just that
component.

Software architects, on the other hand, often try to pull everything in to the
single "program" so that, in the end, you sum all of the weak parts. All too
often, I have seen workflows where people used to postprocess output data get
pulled into doing it in the same run as the generation of the data.

~~~
mattkrause
As always, the right way is somewhere down the middle.

I recently inherited a blob of "scientific code" with basically no
abstraction. Need to indicate the sampling period? That'll never change--just
type .0001; that'll never change. Need to read some files? Just blindly open a
hardcoded list of filename and assume it's okay--it'll always been like that ?
And of course, these files are in _that_ format and there's no need to check.
Of course, after this code was written, we bought new hardware. It gathers
similar data, but samples at a completely different frequency, has a different
number of channels, and records the data in a totally different way.

We _could_ fork the code, find-and-replace the sampling rates, and all that,
and maintain a version for each device we buy. Or we could write a DataReader
interface, some derived versions for each data source, and maybe even the
dreaded DataReaderFactory to automatically detect the filetypes.

Guess which approach will work better in a few years?

~~~
LyndsySimon
In my experience, there is a middle path. Hard-code the sampling period, but
put it in a constant `SAMPLING_PERIOD`. Then when the hardware changes and
things break, refactor the I/O code into a DataReader object. If and only if
you need to support several formats, either implement your DataReaderFactory,
or write a class for each filetype.

------
Rainymood
I recently followed a course on "Principles of programming for Econometrics"
and although I knew a lot about programming already I learned a lot about
being structured and documentation. The professor ran some example code which
he wrote 10 years ago! He wasn't really sure what the function did again and
BAM it was there in the documentation (i.e. comment header of the function).

I used to just hack stuff together in either R or Python but that course
really got me thinking about what I want to accomplish first. Write that down
on paper. And then and only then after you have the whole program outlined in
your head start writing functions with well defined inputs and outputs.

~~~
wintermute42
Why not use the computer to help you define and understand the problem? It
will be much faster to iterate quickly at a repl and then write the cleaned up
version later rather than just trying model the whole thing in your head first

------
cdevs
I know a lot of math majors thrown into c++ jobs that write unreadable code
almost forgetting they are allowed to use words and not just single letters
(though they would probably be fine in the functional programming scene).
There's a learning curve either way, write like your co-workers unless you
have the experience to know your co-workers suck.

------
CyberDildonics
This has nothing to do with scientific programming and everything to do with
"best practices" being mind blowingly awful. Coupling execution and data is
good for data structure initialization, cleanup, and interface. Everywhere
else they should just be kept separate. Data structures should be as
absolutely simple as possible, not as full and generic as possible.

Where people get into trouble many times is thinking that every transformation
or modification of data should be inside one data structure or another, when
really none of them should be except for a minimal interface.

------
parenthephobia
I'd love to hear about the tools which almost completely mitigate parallelism
errors.

The author's list of things that are wrong with "software engineers"' code is
50% "things that are just language features" and 50% "bad ways to use language
features that nobody thinks is best practice in software engineering".

Part of the irony is that lot of the more hairy software engineering
techniques that he decries are used by the people writing platforms and
libraries that scientist programmers use, to make it possible for their "value
of everything, cost of nothing" code to actually run well.

There is a big difference in attitude between scientist programmers and
software engineers.

Often, a scientist already has the solution to the problem, and is just
transcribing it into a program. The program doesn't need to be easy to
understand in isolation, because a scientist doesn't read programs to
understand somebody else's science, she reads the published peer-reviewed
paper. After all, if you wanted to understand Newtonian dynamics, you wouldn't
start by reading Bullet's source, even if it's very well written. (I don't
know if it is.)

Conversely, for a software engineer the program is a tool for finding the
solution. Even though they're in a scientific field, if it's accurate to call
them software engineers they'll be from a background where the program itself
is the product, rather than the knowledge underlying the program.

------
gpderetta
I was introduced to the concept of "Dont' hide power" by Tanenbaum OS book
(although it seems that Lampson is actually the original source [1]).

It always seemed to me a good design rule, but, even after 10 years of
professional programming, it never actually clicked until last year.

I had always rigorously applied encapsulation, decoupling and information
hiding, exposing only the minimal interface necessary to do the job [2].

While this lead to elegant designs which might be even efficient, from
experience is makes them hard to extend. You might need access to some
implementation detail of a lower level layer to either simplify the system or
improve performance but you either violate encapsulation, break existing
interfaces (violating the open-close principle), implement the higher level
directly inside the lower level (this is very common), or simply leave with
the inferior implementation.

I've now given up on complete encapsulation, I expose as much implementation
details as possible, hiding only what's necessary to preserve basic
invariants, and pushing abstractions only on the consumer side of interfaces.

Paraphrasing Knuth, premature generalization is the root of all evil.

[1]
[http://www.vendian.org/mncharity/dir3/hints_lampson/](http://www.vendian.org/mncharity/dir3/hints_lampson/)

[2] these are common rules for OO designs, but by no mean restricted to it. In
fact very little of what I've worked on could be called OO.

------
amai
As a former scientist and now professional software developer I can confirm
some of the observations of the article. This is because enterprise developers
do premature flexibilization. And "Premature Flexibilization Is The Root of
Whatever Evil Is Left", see

[http://product.hubspot.com/blog/bid/7271/premature-
flexibili...](http://product.hubspot.com/blog/bid/7271/premature-
flexibilization-is-the-root-of-whatever-evil-is-left)

But on the other side, most scientific code I've seen, is simple (not by
readability, but it uses simple abstractions), highly optimised and delivers
irreproducible results.

Why? Because most scientist don't write any kind of test. Nobody teaches
scientists test-driven development and most don't know about unit or
integration test, so one can make sure, a program generate consistent results.
Scientists are happy if a program runs on their machine and produces a nice
graph for their paper. If you ever want to reproduce the results of a non-
trivial scientific simulation, good luck. You will discover that the results
will be highly dependent on the type and version of CPU, GPU, compiler,
operating system, time zone, language, random generator seed, version of the
programming language(s), versions of self-written libraries (which most often
don't even have a version number), version of the build system (if one is used
at all), etc... . And that's why you will never discover running scientific
code outside of the scientists computer.

TLDR; Both (scientist and professional developers) can learn from each other.

------
overgard
I think the fundamental problem is that programmers have been taught that
"abstract = good" in all things.

How often do you hear someone say they "abstracted" a piece of code or
"generalized" it, without anyone asking why? Or how often do people "refactor"
things by taking a piece of code that did something specific, and giving it
the unused potential to do more things while creating a lot of complexity? The
problem with "abstracting" things is it means behaviors that were previously
statically decidable can now only be determined by testing run-time behavior,
or the key behaviors are now all driven from outside the system
(configuration, data, etc.)

Also by making things more flexible, your verbs suddenly become a lot more
general and so readability suffers.

Kind of an aside, but whenever I see code where a single class is split into
one interface and one "impl" I've taken to calling it code acne (because Impl
rhymes with pimple). If you're only using an interface for ONE class it's a
huge waste of time to edit two files! The defense is always something like
"well what if we need a mock version for tests". Fine, write the interface
when you actually do that.

------
bastijn
This article itches me on so many levels. It is not wrong directly but it is
definitely not the truth either. I expected more from someone who claims to be
a scientist.

The main issue I have with the piece is the oversimplification of the equation
to such an extend important variables of the equation are removed without
mention or explanation of their removal.

An example would be project size. Yes for FizzBuzz globals are fine probably
and FizzBuzz enterprise shows beautifully that overengineering is a thing
([https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...](https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition)).
All of the authors statements would hold here. We all agree and smile. But the
same architectural choices make sense in many large enterprise projects. Take
the comment on large number of small files for example. This gives less merge
conflicts (amongst many other things). Yes you working alone on your tiny
project won't notice but try working in a single large file with 100 devs
committing to it. Good luck with the merge conflicts! Large methods? Same
issue. Everybody has to change code in that one method, merge conflict.
Inheritance? Nice thing if you build an sdk for others to use and want to hand
them a default base version. They can extend and override your virtual methods
to get custom behaviour. No code duplication which you have to maintain and
keep in sync! Wow!

Next up I would like to address the difficult naming. Everybody nodding that
that was bad. Nice to write it down. However, from a scientist I would expect
a disclaimer that this was based on personal experience with programmers and
not the ground truth for pprogrammers or cite a credible source. I'd say there
is only a small fraction who do that. Disclaimer on this is that both
programmer and scientist should work together if one side does not understand
the naming conventions for the project.

Simple-minded care-free can give you a prototype which is the scientist job.
Enterprise programmers (who often are computer scientists) give you your
product.

Tl;dr stop comparing apples and oranges. Or as a true scientist at least
describe context and omittance of various variables. O, and share your gdamn
ugly code so we don't need to read your papers and implement it ourselves from
scratch. That's the true waste here ;).

------
gcc_programmer
I am sure that the person who wrote this article did it for a reason and has
been frustrated by "programmers". However, this is very anecdotal and, to be
honest, doesn't desever more than a mere acknowledgement - yes, blindly
applying software practices and adding more indirections is not always good,
but creating robust, maintainable, non-ad-hoc software requires abstractions,
indirections, and programmers.

------
moron4hire
The worst thing to ever happen to "best practices" was when managers found out
about them. Suddenly, we were not allowed to think for ourselves and solve the
problem at hand, we also had to figure out what "best practice" to use to
implement our solution.

And it's not like you can argue against "best practices". They're the "best"
after all. So that makes you less than best, to oppose them!

------
firethief
The inexperienced-CS-grad errors he describes are a maintenance nightmare, but
those non-programmer errors cast a lot more doubt on the accuracy of the
results. The importance of correctness depends on the problem I guess.

------
BurningFrog
I think the article makes a decent case for "simple bad code" for small
projects. In a bigger project, this approach collapses, but in small to medium
sized ones, you can do fine, and the uglyness of the code is "shallow", as I
like to call it. That is, the problems are local and done in simple
straightforward ways.

The "software engineer" code he describes sounds like the over engineered crap
most of us did when getting out of the clever novice stage and learned about
cool and sophisticated patterns which we then applied EVERYWHERE.

I guess some never come out of that phase, but the code of real master
programmers is simple and readable, only uses complex patterns when truly
needed, and has no need to show off how clever the author is in the code.

You know, the people who made it necessary to invent "POJO"
([http://www.martinfowler.com/bliki/POJO.html](http://www.martinfowler.com/bliki/POJO.html)).

~~~
mattkrause
The part where scientific code gets nasty is when the "simple bad code" from a
proof-of-concept suddenly gets abruptly promoted to the core of some pipeline.
This happens _a lot_.

------
okket
Previous discussion:
[https://news.ycombinator.com/item?id=7731624](https://news.ycombinator.com/item?id=7731624)
(2 years ago, 168 comments)

~~~
thr0waway1239
If the URL is exactly the same, I wish HN would just surface the old comment
thread automatically at the top of the page.

------
shitgoose
I know what you mean by 'messy scientific code', hairy stuff. Deal with it
almost on a daily basis. 10 element tuples, weird names etc. Makes you wanna
puke at the beginning. But then, as I get to understand what they are trying
to say (i.e. Business Purpose) things get easier. Somehow I remember what 6th
element in the tuple is and where approximately in 2000 LOC function should I
look for something. BUT... When it comes to 'properly engineered' piece of
infrastructure OOP shit filled with frameworks and factories, I have no idea.
No matter how hard I try I cannot remember nor understand what the fuck are
they trying to say. My guess, this is because they have got nothing to say,
really.

------
collyw
The examples he gives seem like using complex features of programming
languages for the sake of it rather than best practices.

------
makecheck
Remember that _algorithms_ , data structure design and API experience are also
crucial parts of coding. These are not necessarily things that will be learned
by iterative hacking.

Scientific data sets can be _huge_ , and there are all kinds of ways to write
code that doesn’t scale well.

If the scientific code is trying to display graphics, then you _really_ have
to know all the tricks for the APIs you are using, how to minimize updates,
how to arrange data in a way that gives quick access to a subset of objects in
a rectangle, etc.

------
ef4
This is describing two stages in the growth of programmer skill.

The researchers are at beginner stage and make classic beginner-stage
mistakes. The developers are at intermediate stage, and they make classic
intermediate-stage mistakes.

There is a later stage of people who can avoid both, but the author probably
hasn't worked with anyone in that stage. Which is not surprising, because once
you're that experienced there are big financial incentives to get out of
academia.

------
levbrie
I think you can reframe this debate pragmatically and widen its applicability
significantly: At what point is "bad" code more effective than the
alternatives. If you get down into a debate about "best practices" you'll have
to concede that anyone writing the code the author is talking about might be
using "best practices" in some explicit way, but isn't "following best
practices", which are designed to avoid precisely the difficulties he
outlines. On the other hand, it's true that most code out there is bad code,
and that heavily architecting a system with bad code can be even more of a
nightmare than more straightforward bad code. The real question is, when
should scientists favor bad code? I'm a huge fan of best practices and of
thoughtful and elegant coding, but I could see an argument being made that in
most circumstances, scientific code is better off being bad code, as long as
you keep it isolated. I'd love to see someone make that argument.

------
alecbenzer
From the comments:

> Of course, design patterns have no place in a simple data-driven pipeline or
> in your numerical recipes-inspired PDE solver. But the same people that
> write this sort of simple code are also not the ones that write the next
> Facebook or Google.

> post author: Google is kinda more about PageRank than "design patterns".

wut

~~~
parenthephobia
What's the problem?

Google's initial success can be attributed to the effectiveness of the
PageRank algorithm, not the quality of the code that implemented it. It
doesn't matter if the first implementation (or even the current
implementation) was a horrible mess of gotos in a single ten-thousand-line-
long function, from the point-of-view of its users.

~~~
alecbenzer
Obviously if code works it doesn't matter what it looks like to users, but the
whole point of code design is making code easier for programmers to work with
and maintain _so that_ it stays working for users. Writing and maintaining
Google scale software without design principles would be a nightmare.

------
zby
OK - not to defend all professional programmers - but it seems quite
reasonable that perhaps the tasks where people are hired specifically to write
code are perhaps bigger and more complicated than programming tasks that are
completed by people who do programming only as a small part of their job.

------
dankohn1
They're talking about a different kind of best practices, but I highly
recommend taking a look at the Core Infrastructure Initiative's Best Practices
Project [0], which was created partially in response to the Heartbleed
disaster. It's a list of 66 practices that all open source software, including
scientific software, should be following.

[0] [https://github.com/linuxfoundation/cii-best-practices-
badge/...](https://github.com/linuxfoundation/cii-best-practices-badge/#core-
infrastructure-initiative-best-practices-badge)

(Disclosure: I'm a co-founder of the project. It's completely free and open
source, and the online BadgeApp itself earns the best practices badge.)

------
jlarocco
Both sides of this argument are correct because both sets of practices are
used for different purposes.

A mid-sized or large software project (say 100k+ LOC) with single letter
variables all over, global variables, etc. would be an absolute maintenance
nightmare. So the software engineering perspective is correct there. And in
large projects it really is helpful to split projects up into multiple
directories, use higher level abstractions, etc.

At the same time, most scientific code bases are not in that category. They
don't have dozens (or hundreds) of people working on them, they're not going
to be expanded much beyond their original use case, and they're mostly used by
the people writing the code and/or a small group around those people.

------
engine_jim
This is a debate I engage in often. You can write "prototype" code to solve an
"algorithmic" or "scientific" problem and it can be sloppy, but if you are
planning on integrating it into a large project your team will run into
problems unless the code is extremely contained.

It's true that there is a growing rebellion against best practices and design
patterns, and I think in many cases some practices are dogmatic. However, the
part that disturbs me is that inexperienced programmers are using it as an
excuse to not apply basic principles they don't understand in the first place.

I've seen experienced software engineers that are lazy and spend more time
criticizing the work of others than actually producing anything themselves,
and I've seen novices that have poor fundamentals but grind for weeks to solve
difficult "scientific" problems albeit with horrendous code that proves to be
not maintainable in the long run. I find that in the latter case (I'll call
them "grinders"), the programmer takes much longer to solve their problem
because they have such limited coding experience (I've been asked many times
to help debug trivial problems that result from not understanding basic
concepts like how recursion works).

The author of this article does a good job at identifying the characteristics
of this low quality "scientific" code, especially that it uses a lot of
globals, bugs from parallelism, and has other bugs and crashes that are not
understood. The author seems to insinuate that testing is the way to mitigate
the bugs and crashes, this is partially true but it's better to write code you
understand in the first place instead of relying on testing to fix everything
so you don't continually introduce new bugs.

Grinders can benefit from understanding best practices and learning
programming and computer science fundamentals. That way they can make their
code more robust, code faster, and truly understand when they should and
shouldn't apply a best practice. Software engineers can improve by matching
the work ethic of the grinders and explaining where the grinders are making
mistakes.

------
JamesBarney
Two points

1\. All his developer errors are not best practices.

2\. Writing domain logic is sooo much easier than writing the
plumbing/integration logic that comprises most enterprise development. One of
the hardest things in software is defining the right abstractions and names.
But in domain logic 80% of those names and abstractions have already been
created.

Oh what am I gonna call this thing my company owns that generates profit and
losses for us and generally resides at a single location. Maybe I'll call it a
"Store". Versus what do I call this thing that decides whether we pull
information from bing or google based on complicated rules around performance,
cost, and time of day. BingGoogleApiDecider?

------
jeffdavis
"try rather hard to keep things boringly simple"

Good engineering does mean keeping things boringly simple. You should only
make things complex to hit a performance target, match complex requirements,
or avoid greater complexity somewhere else.

Some types of complexity are subjective. If you need to parse something,
bison/yacc is often a great choice; but for a simple grammar I could see how
someone who doesn't know it could say it introduces needless complexity.

Programming is writing, and like all writing, you are communicating with some
audience (in the case of software, it's other developers). If you lose track
of who you are writing for, you'll not succeed.

------
p4wnc6
This is a mess of an essay and does little to persuade me that allowing domain
experts to have free reign to make software messes is in any way a good idea.

One of the criticisms applied to software engineers -- the one about bad
abstractions like "DriverController" and "ControllerManager" etc. -- is a huge
pet peeve of mine because it's basically a manifestation of Conway's Law [0].
It indicates that the communication channels of the organization are
problematically ill-suited for the type of system that is needed. The
organization won't be able to design it right because it is constrained by its
own internal communication hierarchy, and so everyone is thinking in terms of
"Handlers" and "Managers" and pieces of code literally end up becoming
reflections of the specific humans and committees to which certain
deliverables are due for judgement. This is not a problem regarding best
practices at all -- it's a sociological problem with the way companies manage
developers.

Domain specific programmers aren't immune to this either. You'll get things
like "ModelFactory" and "FactoryManager" and "EquationObject" or
"OptimizerHandler" or whatever. It's precisely the same problem, except that
the manager sitting above the domain-specific programmers is some diehard
quadratic programming PhD from the 70s who made a name by solving some crazy
finite element physics problem using solely FORTRAN or pure C, and so that
defines the communication hierarchy that the domain scientists are embedded
in, and hence defines the possible design space their minds can gravitate
towards.

There is definitely a risk on the software development side of over-
engineering -- I think this is what the essay is getting at with the cheeky
comments about too much abstraction or too much tricky run-time dispatching or
dynamic behavior. But this is part of the learning path for crafting good
code. You go through a period when everything you do balloons in scope because
you are a sweaty hot mess of stereotyped design ideas, and then slowly you
learn how only one or two things are needed at a time, how it's just as much
about what to leave out as what to put in. The domain programmers who are
given free reign to be terrible and are never made to wear the programming
equivalent of orthopedic shoes to fix their bad patterns will never go through
that phase and never get any better.

[0] <
[https://en.wikipedia.org/wiki/Conway%27s_law](https://en.wikipedia.org/wiki/Conway%27s_law)
>

------
DanielBMarkham
This is funny because the author is exactly right, but I think he's
misidentified the poor coders. The folks he's complaining about are _academic
coders without a lot of commercial experience_ , which tend to make all of
those errors.

He also nails it when he says "idleness is the source of much trouble"

In the commercial world, you code to _do something_ , not just to code
(hopefully). So you get in there, get it done right, then all go out for a
beer. You don't sit around wondering if there's some cool CS construct that
might be fun to try out here (At least hopefully not!) Clever code is
dangerous code.

Good essay.

~~~
pixie_
I've seen tons of over-engineered code in the real world. People with the
title of 'architect' abstracting every last bit of code so that it's
impossible to make sense of. Everyone starts off wanting to make a 'powerful
framework' that can do everything, but end up with an over complicated mess of
configuration that makes it too difficult to do anything with it.

I've seen this happen multiple times at multiple companies.

~~~
collyw
I saw a really good / fun post about writing a Hello World app by years of
experience.

The first example was

print "Hello world!".

It gradually started adding functions, inheritance and other features with
each year of experience.Then after ten years it went back to print "Hello
World".

Thats been the reality for me. Understand the complex features of a language,
but more importantly learn when they are appropriate.

~~~
mamon
Are you refering to this? :)

[http://www.connexin.net/computer-software-humor-
jokes/progra...](http://www.connexin.net/computer-software-humor-
jokes/programmer-evolution.html)

~~~
collyw
Almost. I have tried to find it a few times but never been able to since I
first stumbled upon it. The funniest part was the end result (the most
experienced programmer) being exactly the same as the first - which this
version is missing.

~~~
SmurfJuggler
[https://medium.com/@webseanhickey/the-evolution-of-a-
softwar...](https://medium.com/@webseanhickey/the-evolution-of-a-software-
engineer-db854689243)

~~~
collyw
Yes, thats the one!

------
hyperion2010
> the products of my misguided cleverness.

To me this is the take home. For a long time I would try to find clever
solutions to problems, or just try to be clever in general, and it is not just
other people but your own future self that has to deal with it. This also
applies to other parts of academic life as well such as grant writing. Code is
also about communication with other people and if you are clever then you had
bettered be able to explain your cleverness in a way others can understand.
KISS.

------
Smaointe
It's because the programmers aren't involed in the science being undertaken.
They're put in a position where they are just programming for programming's
sake

------
fitzwatermellow
Might have been true a decade ago. When simulations performed on a laptop in
Matlab were enough for dissertation quality research. But data set size has
exploded. If you are currently in school, learn how to move your research to
the cloud, and learn some best cloud practices. Best prep for the future to
come. And if you decide to leave academia you can possibly nab an interview at
Netflix ;)

------
mwest
Flaws uncovered in the software researchers use to analyze fM.R.I. data:
[https://news.ycombinator.com/item?id=12378791](https://news.ycombinator.com/item?id=12378791)

Wonder whether or not the software followed "best practices"...

------
cerisier
Not sure how this articles brings constructive critique... Comparing the
hardly avoidable issues brought by specific scope and priorities of scientific
work vs dumb "bad practices" has little value to me...

------
jimjimjim
programming is subject to fashions just like everything else.

every few years something comes along and eventually gets recognition, then a
following and then it becomes the 'one true way of doing things' and those
that don't do it are mocked as out-of-date, old fashioned or clueless.

and then in another bunch of years, after the 'one true way' has been applied
everywhere it shouldn't be, people point of the flaws and the cycle starts
again with a new thing.

oo, patterns, corba/com, factoryfactoryfactory.

I'm personally waiting for Agile to finish it's run

------
mkagenius
> so they have too much time on their hands, which they use to dwell on "API
> design" and thus monstrosities are born.

In free time, they mostly go for refactoring the code, don't they?

~~~
Klockan
Adding unnecessary abstractions is refactoring.

------
darksky13
As someone who feels like I always complain about quality, I feel like I don't
know how to actually write quality code. All code eventually turns into a
nightmare. A lot of the code I see by coworkers and myself is super hacky. I
really wonder if we're all just terrible programmers or if that's the natural
evolution of code.

Apart from having a mentor, what are the best ways to learn about code quality
l? Books to read for example that I can then use to look at my own code and
fix it? I really have no idea when making decisions what ends up being the
best over the long run.

------
Sylos
Well, yeah, because Software Engineers are trained for building large projects
and those "best practices" are aimed at exactly that, too.

Long functions, bad names, accesses all over the place and using complex
libraries, those are errors which are acceptable at a small scale, but become
horrendous when you build a larger project.

Many abstraction layers and a detailed folder structure, those might add a lot
of complexity in the beginning, but there's not much worse than having to
restructure your entire project at a later date.

------
hifier
This person has obviously never worked on a project of any scale. See where
your ad-hoc practices get you when you have millions of LOC.

Can we all agree that there is good code and bad code and the difference
between the two is often contextual, then move on. Geez.

------
geromek
I sometimes give a talk to startup companies, in which I tell them why their
code should be horrible. It's an intentionally provocative thing to say, but
there is reasoning behind it, and some of the same reasoning applies to a lot
of scientific code. The linked article has a few comments that tangentially
touch on my reasoning, but none that really spell it out. So here goes...

Software development is about building software. Software engineering is about
building software with respect to cost. Different solutions can be more or
less expensive, and it's the engineer's job to figure out which solution is
the least expensive for the given situation. The situation includes many
things: available materials and tools, available personnel and deadlines, the
nature and details of the problem, etc. But the situation also includes the
anticipated duration of the solution. In other words, how long will this
particular solution be solving this particular problem? This is called the
"expected service lifetime".

Generally speaking, with relatively long expected service lifetimes for
software, best practices are more important, because the expected number of
times a given segment of code will be modified increases. Putting effort into
maintainability has a positive ROI. On the other hand, with relatively short
expected service lifetimes for software, functionality trumps best practices,
because existing code will be revisited less frequently.

Think of the extremes. Consider a program that will be run only once before
being discarded. Would we care more that it has no violations, or would we
care more that it has no defects? (Hint: defects.) That concern flips at some
point for long-lived software projects. Each bug becomes less of a priority;
yes, each one has a cost (weighted by frequency and effect), but a code
segment with poor maintainability is more costly over the long term, since
that code is responsible for the cumulative costs due to all potential bugs
(weighted by probability) that will be introduced over the lifetime of the
project due to that poor code.

So, short expected service lifetimes for software, prioritize correct behavior
over maintainability; long expected service lifetimes for software, prioritize
maintainability over correct behavior. The source code written by a brand-new
company will be around for six months (maybe) before it gets factored away, or
torn out and rewritten. During that time, less-experienced coders will be
getting to know new technologies with foreign best practices, and those best
practices will be violated frequently but unknowingly. Attempting to learn and
retroactively apply best practices for code that will likely last a short
period of time is simply more expensive (on average) than just making things
work. The same applies to scientific code, which gets run for a graduate
degree or two before being discarded. If the code wasn't horrible, I'd think
that effort was being expended in the wrong places.

In my experience, most "fights" about best practices (whether a technique
should be considered a best practice, or whether a best practice should be
applied) usually boil down to people who have different expected service
lifetimes in mind. (One of those people is probably considering an expected
service lifetime of infinity.)

