Hacker News new | comments | show | ask | jobs | submit login
Why bad scientific code beats code following “best practices” (2014) (yosefk.com)
283 points by ingve 329 days ago | hide | past | web | 261 comments | favorite



I think there is a growing rebellion against the kind of software development "best practices" that result in the kind of problems noted in the article. I see senior developers in the game industry coming out against sacred principles like object orientation and function size limits. A few examples:

Casey Muratori on "Compression Oriented Programming": https://mollyrocket.com/casey/stream_0019.html

John Carmack on inlined code: http://number-none.com/blow/john_carmack_on_inlined_code.htm...

Mike Acton on "Data-Oriented Design and C++" [video]: https://www.youtube.com/watch?v=rX0ItVEVjHc

Jonathan Blow on Software Quality [video]: https://www.youtube.com/watch?v=k56wra39lwA


For ages now, I've been telling people that the best best code, produced by the most experienced people, tends to look like novice code that happens to work --- no unnecessary abstractions, limited anticipated extensibility points, encapsulation only where it makes sense. "Best practices", blindly applied, need to die. The GoF book is a bestiary, not an example of sterling software design. IME, it's much more expensive to deal with unnecessary abstraction than to add abstractions as necessary.

People, think for yourselves! Don't just blindly do what some "Effective $Language" book tells you to do.

(For starters, stop blindly making getters and settings for data fields! Public access is okay! If you really need some kind of access logic, change the damn field name, and the compiler will tell you all the places you need to update.)


>the best best code, produced by the most experienced people, tends to look like novice code that happens to work

Could it be that best practices are designed to make sure mediocre programmers working together produce decent code?

After all, actual novice programmers write code similar to the best programmers except that it doesn't work.


"It took me four years to paint like Raphael, but a lifetime to paint like a child." - Picasso


> Could it be that best practices are designed to make sure mediocre programmers working together produce decent code?

Yes, it is, but the issue is that the industry should move away from the idea that software can be done in an assembly line. It is better to have a few highly qualified people capable of writing complex software than a thousand mediocre programers that use GoF patterns everywhere.


Why? The assembly line doesn't need elite hackers, and there aren't that many elite hackers around. There is a place for cost-effective development.


I'm a security analyst, not a developer so please forgive my ignorance, but how do you become one of the highly qualified coders without first spending some time being a mediocre developer?


You don't. There's no way around having to spend some time in the trenches --- but we can at least minimize the amount of time developers spend in the sophomoric architecture-astronaut intermediate phase by not glorifying excess complexity. Ultimately, though, there's not a good way to teach good taste except to provide good examples.


"Best" practices may be a misnomer, but I don't believe it's possible to execute large projects without some kind of standardization. It is inevitable that in some cases that standardization will hinder the optimal strategy, but it will still have a net positive impact. Perhaps if we started calling it "standard practices" people would stop acting like it's a groundbreaking revelation to point out that these practices are not always ideal.


I think it's more that the best practices were adopted for specific reasons, but they are not understood or transmitted in a way that makes those reasons very clear. That is, a 'best practice' tends to solve a specific kind of problem under a certain set of constraints.

Nobody remembers what the original constraints are or if they even apply in their current situation, even if they are actually trying to solve the same problem, which they might not be.


This spirals as well: I've spent quite a lot of time recently helping people at the early stages of learning programming, and it's taken me a while to stop becoming frustrated with their misunderstandings. Sometimes it is just because something basic isn't clicking for them, but a lot of the time it's down to me trying to explain things through a prism of accumulated patterns that are seen as best practise/common sense, but stepping back and viewing objectively are opaque and sometimes nonsensical outside of very specific scenarios. There's a tendency to massively overcomplicate, but you forget very quickly how complicated, then you build further patterns to deal with the complexity, ad infinitum


Yep the biggest issue we have with software is every developer knows Single Responsibility Principle, and no developers know why. Everyone knows decouple and increase cohesion, but few any know what cohesion is.


That's it. "Best practices" is, essentially, coding bureaucracy. That's not a pejorative; bureaucracy is quite necessary.

I have a rough idea of "why OO?" but in practice, it can be pretty hard on things like certain kinds of scaling, projects that require a goodly amount of serialization/configurability and the like.


There is a spectrum of "it just works" to "I've applied every best practice" bad programmers who spend huge amounts of time on best practices will end up wasting a lot of time over optimizing but it may have the benefit of reducing risk where they may have a lack of deep understanding or it may shine light on that risk, that is if they apply practices correctly.


You're ignoring how "best practices" frequently add negative value. Design style isn't a trade-off between proper design and expedience. It's a matter of experience, taste, and parsimony.


Those "Effective $Language" books make arguments for their advice, which you are free to find compelling, or not. Back when I first read Effective Java, maybe about a decade ago, I thought it was over-complicated hogwash that I knew better than. When I read it again starting about a year ago, I found the arguments for most of its advice very compelling, based on problems I've run into time and again in my own, and other people's, code, and not just in Java. YMMV I suppose!

The backlash against "engineered" software definitely seems real, and I think that's great - questioning assumptions is critical - but I think a lot of the insinuations about peoples' motivations and talents are unnecessary and honestly kind of silly. Most of us are just trying to find ways to avoid issues we've seen become problems in other projects in the past, it's not some nefarious conspiracy against simple code.


Are you seriously saying that the best code is an untestable mess of big God classes? Because in my experience this is by far the type of code written by inexperienced programmers. Abstractions and interfaces are the best way to make a system testable and extensible and it has nothing to do with using a pattern just because you read about it in the gof book 5 minutes ago. And using public fields in a non trivial project is a sure receipt for disaster.


> using public fields in a non trivial project is a sure receipt for disaster.

This is just dogma. Every Python project in existence has 100% public fields. Some are disasters, some are beautiful. Only a Sith deals in absolutes.


  Better than ugly, beautiful is.
  Better then implicit, explicit is.
  Better than complex, simple is.
  Better than complicated, complex is.
  Better than nested, flat is.
  Better than dense, sparse is.
  Counts, readability does.
  Special enough to break the rules, special cases are not.
  But beaten by practicality, purity is.
  Silently passed, an error should never be.
  Unless explicitly, is it silenced.
  In the face of ambiguity, the temptation to guess, refuse you must.
  One, preferably only one, way to do it, there should be.
  Not obvious, it might be. 
  Better than never, is now.
  But often better than right now, is never.
  If hard to explain, bad it is.
  If easy to explain, good it may be.
  Namespaces are a honking good idea - more of them we should do!
-- The Zen of Python, Yoda


> Namespaces are a honking good idea - we should do more of them!

That should be:

   english.namespaces english.verbs.are english.articles.a american.english.vernacular.adjective.honking ...


No, that's Java. Flat is better than nested: no self-respecting Python programmer would write hierarchy that deep.


No, Java looks like the famous quotation about the nail and the horse from:

http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom...


So which rule take precedents here? Namespace is obviously not flat, at least it's more nested than one without namespace.

That's the problem I have with people praising Zen of Python as if it means anything. It's like a bible you just pick the verse you like to justify your action even if it might conflict with other rules. Then you praise the whole thing for being so wise.


It means a lot: good python code follows it. However, yes, some of the verses conflict, because it turns out that good advice sometimes contradicts itself. All I can say is don't go too far in either direction.


Much of the point of the Zen of Python is that it's self-contradictory.


Balanced, you must be.


You would prefer PHP[1] where the standard library has no namespaces?

[1] I refer to "classic" PHP. No clue if anything PHP5+ fixed this, though I doubt that they would make such a breaking changed even across major revisions.


Yes, I would. And though PHP is widely agreed to be piece of shit (it seems: I don't work with PHP so I'm here only relaying this popular sentiment), that doesn't tarnish the idea by association (which is what I sense you might be trying to do).

ISO C and POSIX also have a flat library namespace, together with the programs written on top. Yet, people write big applications and everything is cool. Another example is that every darned non-static file-scope identifier in the Linux kernel that isn't in a module is in the same global namespace.

Namespaces are uglifying and an idiotic solution in search of a problem. They amount to run-time cutting and pasting (one of the things which the article author is against). Because if you have some foo.bar.baz, often things are configured in the program so that just the short name baz is used in a given scope. So effectively the language is gluing together "foo.bar" and "baz" to resolve the unqualified reference. The result is that when you see "baz" in the code, you don't know which "baz" in what namespace that is.

The ISO C + POSIX solution is far better: read, fread, aio_read, open, fopen, sem_open, ...

You never set up a scope where "sem_" is implicit so that "open" means "sem_open".

Just use "sem_open" when you want "sem_open". Then I can put the cursor on it and get a man page in one keystroke.

Keep the prefixes short and sweet and everything is cool.

I was a big believer in namespaces 20 years ago when they started to be used in C++. I believed the spiel about it providing isolation for large scale projects. I don't believe it that much any more, because projects in un-namespaced C have gotten a lot larger since then, and the sky did not fall.

Scoping is the real solution. Componentize the software. Keep the component-private identifiers completely private. (For instance, if you're making shared libs, don't export any non-API symbols for dynamic linking at all.) Expose only API's with a well-considered naming scheme that is unlikely to clash with anything.


PHP namespacing in many ways ruined the language. I'm not just talking about the poorly chosen use of the backslash '\' path separator or the fact that namespaces aren't automatically inferred which causes me to have to write "use" endless times at the top of the file which destroys my productivity when working outside an IDE.

I'm talking about the heart of PHP which is stream processing. Why in the world would you destroy the notion of simply including other source files in order to wedge in this C++ centric notion of a namespace? Before "namespace" and "use", the idea was that all of the files included together can be treated as one large file, and sadly that conceptual simplicity has been lost.

Also the lost opportunity of having objects be associative arrays like Javascript, combined with namespacing, have convinced me that perhaps PHP should be forked to a language more in-line with its roots. I haven't tried Hack or PHP7 yet but I am apprehensive that they probably make things worse in their own ways.

I think of PHP as a not-just-write-only version of Perl, lacking the learning curve of Ruby, with far surpassed forgiveness and access to system APIs over Javascript/NodeJS. Which is why it's still my favorite language, even though the curators have been asleep at the wheel at the most basic levels.


The standard library isn't namespaced. If you want to use a stdlib function inside an NS, you can without issue. If you want to use a stdlib class, or any top level class, it needs backslash before its name or you need to import it.

There is a community move towards a standard namespacing with PHP-FIG. This is useful and we are seeing lots of progress on internals thanks to the work by the community.

Like a lot of things, PHP name spaces are or were a big mess. Lots of progress is being made to improve though I think a lot must remain.

I've thought about creating a project that packages up various categories in the stdlib into namespaces with consistent inputs and outputs. That would be nice but the process isn't a lot of fun.


(an attempt to translate to standard english grammar, for the benefit of other non-native English readers, who may also struggle to parse this)

  Beautiful is better than ugly.
  Explicit is better then implicit.
  Simple is better than complex.
  Complex is better than complicated.
  Flat is better than nested.
  Sparse is better than dense.
  Readability counts.
  Special cases are not special enough to break the rules.
  But purity is beaten by practicality.
  An error should never be silently passed.
  Unless it is silenced explicitly.
  You must refuse the temptation to guess in the face of ambiguity.
  There should be one, preferably only one, way to do it.
  It might not be obvious. 
  Now is better than never.
  But never is often better than right now.
  It is bad if it is hard to explain.
  It may be good if it is easy to explain.
  Namespaces are a honking good idea - we should do more of them!


instead of offering your own translation, we can instead go back to the original english. Run 'python -m this' and you will see

The Zen of Python, by Tim Peters

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than right now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!


Now those are some best practices I can live by.


I think the advice is more relevant in the context of the specific language; it's not universal.

In python you can go from attribute access to using a property (getter/setter) without breaking anything.

The same is not true in a language like Java where obj.foo is always a direct field access distinct from calling a method like obj.getFoo(), so going from public fields to getters is not backwards compatible and can be painful.


>In python you can go from attribute access to using a property (getter/setter) without breaking anything.

True, but that should be avoided if possible. Python 'properties' violate the principal of "explicit is better than implicit". Once you realize "Oops, I need an accessor function here", the lazy programmer says "Aww, grepping for all uses of .foo and replacing them with .getFoo() will take 20 minutes. Instead, I'll just redefine it as a property and no one will notice." If you care about quality, go the extra mile: make it clear to the people reading your code that a function is being called.

Properties are a kinda nice language feature, but they are so frequently misused that I think the language would have been better off without them. They encourage bad habits.


However, it's still a simple enough change that you shouldn't build getter/setters unless you're already pretty sure it'll be changed. OTOH, if you're using a language with generated getter/setters (ruby, smalltalk, lisp), just do it.


Using find & sed is not painful.

I use getters and setters, but still think you will waste more time arguing about this issue, than leaving it be and finding out you have to change them down the line.

Edit: also, especially for getters, I like my accessors to be simple accessors. Hiding too much code behind them can be unpleasantly surprising, so as they deviate further from accessors I like to rename them - e.g. CalculateXxxx rather than GetXxx. Fewer surprises. Given that, I potentially have an issue with continuing to call it GetXXX or SetXXX in the face of certain changes.


Using find & sed is not painful.

Using an IDE with an understanding of the language you have is even less painful. Having the IDE automatically refactor a public field to use getters and setters is a breeze with managed languages like Java and C#.


The same can be said about interal interfaces with only one implementation. If there is really a need for an interface then why not add it later until it's actually needed it instead of creating additional overhead for something we may never need?


If there are inexperienced programmers working on it, as in my preamble, then almost surely it is in whatever language that doesn't prohibit to mutate objects indiscriminately. In F#, Haskell and other languages that enforce immutability obviously it isn't a problem, in Python most surely it is.


Eh. Inexperienced programmers will find a way to screw things up somehow. Probably by over-engineering, like the original article describes.


Code can be underabstracted, but it can also be overabstracted - and abstracted with the wrong abstractions. And fixing the latter sometimes involves a temporary stay at "untestable mess of big God classes" when you remove the bad abstractions to clear the way for creating better ones. Not because it's the best code - far from it - but because it's slightly less terrible code.

> And using public fields in a non trivial project is a sure receipt for disaster.

Ergo, all non trivial C projects are disasters? Well, maybe, but I disagree on the reasons.

Language enforced encapsulation is a useful tool, but some people take it to the deep end and assume that if their math library's 3D vector doesn't hide "float z;" in favor of "void set_z(float new_z);" and "float get_z() const;" (one of which will probably have a copy+paste bug because hey, more stupid boilerplate code), they'll have a sure receipt for disaster. Which I suspect you'd agree is nonsense - but would also follow from reading your words a little too literally.


In my experience, in quite big projects, people had a tendency to mutate objects in the wrong place and in the wrong reasons. A field encapsulated in a getter without setter certainly helps.


But if it has to be able to be mutated in some cases, you're stuck. This is a place where Scheme's parameterize may be useful: Define a closure with the setter in scope, and write it so that when it's called with a lambda as an arg, it will raise an error unless those certain conditions are met, in which case, it will use parameterize to make the setter available in the lambda's scope. Or I suppose it could pass it in, which would be slightly simpler...


That would be a final field. But generally I agree with you and its by far the pattern I use most.


The issue is, for example, if you have for example a class encapsulating a bunch of flags, and allowing the user to either set each flag seperately, or a bitfield representing all of it.

In C, you might be able to do some magic, but in Java, you’ll need setters and getters there – you can’t even do final fields there.

Luckily, in C#, you can just use accessors, and maybe in Java with Lombok, too.


I think he is rather referring to cascades of function calls/types and "custom solutions". Of course abstractions are a great tool when used properly. Thinking of a best practice from Object oriented design: low coupling but high cohesion.

When you use abstractions, you get low coupling. But like everything in life, also this has a price: you may need an extra line of code to instantiate the abstraction and sometimes have to write an extra getter because it's not perfect yet. That's a fine price to pay unless go you use way too many abstractions and they are nested deeply. It may have some aesthetics but it can be hell to debug and overly complicated to extend code.

So that's why one must also focus on high cohesion. I really like the modern JavaScript way, the imports/requires are for low coupling and the build automation is for cohesion. Anyways, these things are best practices as well but not sure if the author took those into account... ;)


I don't think that's seriously what he's saying. You've over-interpreted to project the "logical" extreme of the POV expressed on the author. This is a trope in online debate that needs to die.


I've returned recently ( for domain-specific reasons ) to sets of (opaque) god-classes with good interfaces when needed. No state in a god-class is visible except through the interface.

Since the domain requires a great deal of serialization, the interfaces are usually strings over that. In cases, it's even easier to just open a socket to the serialization interface.

So far, I'm able to pub/sub periodic data streams, but it'd be pretty easy to add "wake me when" operators and such.

It forces the use of one central table of names cross callbacks ( which can be grown dynamically ) but it's very, very nice to work with.

YMMV. The domain is instrumentation and industrial control, which just fits this pattern nicely. All use cases can and are specified as message sequences.


I've been using the delegation pattern a lot lately as a nice way to combine the best bits of god classes (few dependencies) with the best of SRP (small easy to test parts).

This way A, B, C (and many others) only depends on X, the delegator (which exposes a number of interfaces for practically everything), but X depends on everything and the kitchen sink, X contains no real functionality.


"Delegator". Now I know what to call it :)


>And using public fields in a non trivial project is a sure receipt for disaster.

Why, though? Surely it's the programmers' job to access what they need and leave alone what the don't. As someone else said, Python has no concept of public/private and it works okay.


If you make the statically typed field public you lose the ability to change the implementation without breaking the clients. (Getters/setters have the same problems, but not as pronounced). And no, you don't always have the ability to recompile everything that's using your code.


Python doesn't really "work okay" for ongoing projects. Framework updates become a 6-month migration job.


An argument for properties are to hide the implementation details. It keeps idiots from changing variables they shouldn't (if they really want to, there's sometimes reflection), and it hides implementation detail variables from the autocomplete box.


Make your code testable, but public fields are fine.


Blind application needs to die in general. SOLID is a good idea (it should just be SID, IMHO, but anyways), so are design patterns... when they make sense. When you think "I need to delegate object construction to different subclasses contextually" use a factory. Don't use a factory when you don't think that. When you need to chose an approach that should be given by the caller, no, for the love of god don't use the Strategy pattern. The strategy pattern is a hack that belongs in the past, and you should use lambdas instead because it's the 21st freaking century, not 1996, and we all have lambdas now. So don't use strategy, unless you're stuck behind, and if you do, feel bad about it.

/rant

Anyways, yeah, just use your brain, insist on defined interfaces, and if something might be suboptimal, allow for it to be changed, and you'll be fine.


> The strategy pattern is a hack that belongs in the past, and you should use lambdas instead

In that case, your lambda object is your strategy. One of the good points that the "Design Patterns" authors make and that everyone else ignores is that these patterns are names for recurring structures that pop up independently of implementation language and environment. Some silly functor object is one way of implementing the more abstract "Strategy" pattern. Your lambda is a better one in many cases. It's the same high-level concept.


Well, yes, but if you every write a class for C#, modern Java, or pretty much any language save C++ or Python (and even then you should write functions) that has the word "strategy" in it, you're doing it wrong, and should be punished for reinventing language features using OO methodology.

Was it Steve Yegge who menationed the Perl community calling Design Patterns "FP for Java"?


I don't know, but this Yegge essay is one of my favorites ever: http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom...


I hadn't read that one. The line about the meaning of lambda is hilarious. Anyways, yeah the line was in "Singleton Considered Stupid."

https://sites.google.com/site/steveyegge2/singleton-consider...


There is a name for "strategy pattern" that existed long before the GoF book: higher-order function. But what's worse, in that it causes confusion, is when names are repurposed, "Functor" is a useful abstraction in programming, but is not related to your usage above.


Well, a Functor isn't an HOF, and an HOF needn't be an implementation of Strategy.


I never said a Functor was a HOF, but it's not far wrong. The implementation of the morphism-mapping part of a functor is a higher order function. Show me an example of Strategy that is not essentially a HOF.


All strategy is HOF, not all HOF is strategy: Observe:

  (define (make-counter n)
    (lambda (c)
      (set! n (+ n c))
      n))
This is a HOF, but not an instance of Strategy.


I asked for an example of Strategy that wasn't just a HOF, but I now see you do agree with this. Yes your example is technically a HOF as it returns a function. So I guess your point is that HOF is not specific enough, although one might argue that in general usage it usually does imply function-value arguments.


I mean, that's not the general use I see, so we clearly have different social circles. The thing is, a HOF is a mechanism, the Strategy pattern is an intent. I would in fact argue that there is at least one HOF that takes functions as arguments and isn't an instance of strategy: call/cc.

call/cc does not actually use the passed in function to determine how any action should be done. All it does is provide a capture of the current continuation as an argument. That's it. So it's not a strategy, it's just a HOF.


Well Java didn't add lambdas to write make-counter, it was map and fold that got them envious. In fact, IIUC mathematically make-counter is first-order, counting the nesting of funtion types. In other words, I don't understand why many describe it as a HOF at all. Map and fold are second order. Callcc, ignoring its Scheme implementation as a macro, would be third order. One could argue that the second order argument to callcc is strategy, with the continuation being the strategy.


Call/cc isn't a macro. It's actually a special form, although in CPS-based schemes, it could probably be implemented as a function if there was a lower level continuation primitive.

And semantically, a continuation is pretty much never a strategy.

As for make-counter not being higher-order, that's just not true. A higher-order function takes and/or returns a function. make-counter returns a function: it's higher-order.


No its disputed, for the reasons I gave. But I've already accepted that many regard make-counter to be a HOF. I was careful and said make-counter is not a second-order HOF, which Strategy is.

Callcc can be implemented as a function, no continuation primitives needed. Haskell has examples in its libraries.

Lastly, regarding semantics, I see no difference between Strategy and a second-order HOF. Yes the scope of Strategy is supposed to much more limited, but I don't see value in this. I concede that others might do.


Having thought about this further, I think it is probably only correct of me to talk about the "order" of "functionals" as in functions to scalars in mathematics. The term "rank" is better suited here. Category theory supports the popular definition of higher-order function and gives a different meaning to "order". So make-counter is rank-1 but still higher-order. Apologies for the previous post.


Automatic factories are extremely useful to kill the ServiceLocator anti-pattern.


The ServiceLocator anti-pattern seems a bit silly: either just depend on the object directly, or if there may be different objects used throughout the system, pass one in.


One of my guiding principals as a programmer is to never add code because I might need it some day.


Still, apply common sense. Sometimes the probability of adding something is so high and the cost of making an extension point so low that it makes sense to design with extensibility in mind. These extension points are rare, though, and usually occur at major module boundaries (e.g., plugins), and don't need to be scattered through random bits of your code.


You also have to factor in the cost to maintain your extension which you currently have no use for, which most people forget.

It also might be easy to include now, but your extension might complicate a new feature request. Or the new feature might complicate your previously simple extension. If it's still unused, don't be afraid to throw it away then either.


I find that unneeded functionality also complicates refactoring efforts.


Still, why not wait to add it?


Because sometimes those choices have a large effect on the amount of work/mental tax in the future - for example, if you know that a new feature will have to interop with yours in the near future and the cost is low to implement the right extensibility point now, it would be absolutely stupid not to - I would call that bad engineering that costs time & effort.

Obviously one doesn't want to go down the rabbit hole too early, but the other extreme is just as bad.


Waiting is good when the cost model for the added code is much more in focus later.


The cost to add it is often much higher, once the code has high fan in (lots of code paths depend on it).

I am definitely a fan of not over engineering, but I'm more of a fan of thinking. Think about your problem and your use cases, and about your own ability to predict the future. If you can predict future changes with high probability, then go ahead and design for those. If you're not a domain expert, you probably can't predict the future at all (and you probably grossly underestimate how bad your predictions are), so you should stick to the bare minimum.


It's a slippery slope.


The GoF book is a dictionary. When it came out, we all got names for patterns we use when appropriate. The thing that must die is over-reliance on formalisms where they don't add value. (Because there absolutely are places where they do.)

Having the taste to use the right approach for each job is what sets experience apart!


Sorry, what is the GoF book?


Design Patterns: Elements of Reusable Object-Oriented Software, whose four authors are sometimes called the Gang of Four (GoF). It established a vocabulary of common patterns.

https://en.wikipedia.org/wiki/Design_Patterns


Thanks very much!


For ages now, I've been telling people that the best best code, produced by the most experienced people, tends to look like novice code that happens to work --- no unnecessary abstractions, limited anticipated extensibility points, encapsulation only where it makes sense.

I love this.

"Perfection is reached, not when there is nothing left to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery

Have to disagree a little on getters and setters though. They're tremendously useful as places to set breakpoints when debugging. Well, setters are, anyway; I guess it's rarer that I'll use a getter for that. Anyway, perhaps we can agree that the need for these is a design flaw in Java; C# has a better way.


For ages now, I've been telling people that the best best code, produced by the most experienced people, tends to look like novice code that happens to work --- no unnecessary abstractions, limited anticipated extensibility points, encapsulation only where it makes sense.

In other words, the best code is the simplest code that works. It usually tends to be very flexible and extensible anyway, because there is so little of it that understanding it all and modifying it becomes easy. The most experienced programmers are the ones who can assess a problem and write code to capture its essence, and not waste time doing that which isn't necessary.

I've observed that there is a "spectrum of complexity" with two very distinct "styles" or "cultures" of software at either end; at one end, there is the side which heavily values simplicity and pragmatic design. Examples of these include most of the early UNIXes, as well as later developments like the BSD userland. The code is short, straightforward, and humble, aptly described by "novice code that happens to work".

At the other end, there's the culture and style of Enterprise Java and C#, where solving the simplest of problems turns into a huge "architected" application/framework/etc. with dozens of classes, design patterns, and liberal amounts of other bureaucratic indirection. The methodology also tends to be highly process-driven and rigid. I don't think it's a coincidence that the latter is heavily premised on and values "best practices" more than anything else.

Here's another one of the "rebellion" articles against "best practices": ttp://www.satisfice.com/blog/archives/27


^^^This. The tendency for a certain type of software engineer to go around telling everyone else they're doing it wrong reminds me strongly of this scene from Justified: https://www.youtube.com/watch?v=LG4hOjJ9tEs.


The problem with their advice is that they are not master Foo. They could tone it down a little for their wisdom is not absolute. Some rules works for some people in some situations and other rules are just practical conventions.

One good advice I found is to just plain ignore status quo, and follow common sense when the context demands it.


The getter/setter problem is solved very nicely by C#. You can change a field to a property at any time without changing the rest of the code, and a lot of the use cases for get/set can be handled compactly like public int X{get; private set;} instead of having to have 2 variables.


Getters/setters are important when the data structure is used from a different linkage unit than where it is defined, and binary compatibility across versions is important.

That doers happen, but usually in fewer cases than most intermediate programmers realize. Instead, they see cases where it is used for real (like winforms or Direct3D) and cargo cult it into all code they write.


> they see cases where it is used for real (like winforms or Direct3D) and cargo cult it into all code they write.

And a lot, a lot, of my CS prof colleagues believe it is deeply important, for reasons most of them are unable to articulate, that in an intro CS1 class all instance variables be private and accessed only through getters and setters. This is actually baked into the College Board's course description for AP CS A, and a point or two (out of 80) often hinges on it in every exam, so it is near-universally taught in high-school level CS classes (in the US). Sigh.


So I have to write BS for AP CS A, and I have to do it in a bad language, and we'll never get Python or Lisp?

Okay, College Board, I warned you...

smack

this is for the above.

smack

and this is for teaching CS poorly in general.

smack

and this is for making us all by overpriced graphing calculators, thus keeping demand high...


The kinds of things you need to do in order to maintain a stable ABI are not the kinds of things you should apply to all parts of your program. That way lies madness.

BTW, you don't need accessors even for public ABIs. stat(2), for example, doesn't need "accessors" for struct stat and it's been stable for decades. The same idea applies to Win32 core APIs.


Not a disagreement, but it's interesting that at least 3 of 4 of the people you mentioned are game programmers (not sure about Mike Acton because his name doesn't ring a bell). Some, like Carmack, are definitely brilliant programmers. But game programming has very specific constraints, doesn't it? Speed and size are comparatively more important than in bussiness/enterprise software, and maintenance is comparatively less important.

That said, I welcome anyone trying to knock OOP off its pedestal.


Mike Acton is also a game programmer. Maybe it's just because of the sources I tend to read, but I do think the game industry is leading the way here and I hope others will follow. Perhaps it's because of the focus on performance, but I think any industry could benefit from that. I curse the lack of attention to performance in modern software development every time the Twitter app takes multiple seconds to load on my 2 GHz smartphone.

> maintenance is comparatively less important

I don't agree with this. Certainly maintainability is less of a concern for the gameplay code specific to each game, but game development also encompasses engines and tools which span multiple titles and are used for many, many years. Also, the trend toward free-to-play and subscription games is making maintenance more of a concern even for single titles. World of Warcraft, Team Fortress 2, League of Legends, Clash of Clans; these titles are going to be maintained for years to come. Valve Corporation recently transplanted an entire game (DotA 2) from one game engine to another, while people continued to play.


>I curse the lack of attention to performance in modern software development every time the Twitter app takes multiple seconds to load on my 2 GHz smartphone.

I strongly agree with the sentiment, but also it should be noted that the load time of applications is related more to the storage speed, which is often abysmally bad in phones.


If they didn't have so much code (and data) to load in the first place, a lot of it probably unnecessary, applications would certainly load much faster.


People who aren't performance obsessed programmers enjoy detailed graphical interfaces, and those tend to require lumps of code and data. It's fine to decry it I guess, but it won't be changing any time soon.


The point is that you can have those without the fluff and layers of inefficiency that modern programming introduces.


They're allowed to do it because of the emphasis on "shipping" and the complete lack of "maintaining" that they have to do afterwards. Most games on release are complete shit for a reason, with very few modern exceptions. Hell it isn't uncommon for a studio to outsource ports and expansions. Making their code someone else's problem is not something non software devs should aspire to.


Except there's these things called game "engines" that have can have lifecycles of 10+ years(Unreal and ID comes to mind).


"Give a man a game engine and he delivers a game. Teach a man to make a game engine and he never delivers anything."

Most game developers do not build game engines.


All of the cited developers do, however.


Maintenance as in "changing the behavior of code on demand" in games is not solved by modifying the code but in the data. As a game programmer I'd terrified if I had to change the code every time requirements change because the requirements in games change dozen times a day. I'd rather give the tools to the game designers so they could implement their own requirements themselves.

Maintenance as in "keeping the code around and reusing it for other projects" is quite important with the games. We go extra length to make code reusable by isolating it from 3d party libraries, compiler features and such.


But this goes to reinforce the idea that the principles involved in coding games are very different than those for other software, doesn't it? Therefore, what works in games is not necessarily a good idea for business software, and viceversa.


I am not sure that modifying code instead of data is a good principle for anybody.

I do agree that it depends on the goal you are trying to achieve. E.g. if I had been running a custom software shop or an embedded IT department developing some internal software I'd too make maintenance as complicated as possible so I could bill for more engineers/QA or have more reports to increase my profit/bonuses/political weight etc.


> I am not sure that modifying code instead of data is a good principle for anybody.

In my experience, for in-house software, modifying (or adding) code happens all the time. It's so common I'm puzzled that you don't consider it a good principle. In fact, the alternative -- making software so flexible every possible behavior can be customized by modifying data alone -- leads to the "enterprise software" antipattern (as often mocked in The Daily WTF), where everything is needlessly flexible and complicated. Or maybe even the "inner platform effect"!

Even for game development, I've read teams (often?) hack the engines they buy, like it was famously the case for Half-Life, which used a heavily modified Quake engine.


Look at how many people make a game and how many people make in-house software (if you don't know how many people make a game - go to mobygames.com and look up credits for it). Consider functional complexity of one and another then consider the actual complexity of the code. John Blow, quoted above, gives few examples too (e.g. Facebook's client requiring a change in the OS because of running out of the class limit).

Game teams sure hack the engines and modify the code all the time but these modifications are not in response to changing requirements. It's 99% bug fixes, optimizations and planned feature implementation and 1% behavior changes requested by design.


I honestly don't understand where you're going re: number of people. Care to elaborate?

I won't argue with you about the 99% data, 1% behavior changes thing. I can't argue with unreferenced statistics :) I seriously doubt your percentages, though. In any case, for business/enterprise software development it's definitely NOT the case. Software changes are both very common and a reasonable activity, so common in fact that if you disagree with this I have to ask (if you don't mind me asking): do you work in software development, and if so, what kind of software?

Edit: and to answer a previous remark of yours: writing new code for feature/change requests has little to do with "making software as complicated as possible". It's likely the opposite: software that does just one thing is easier & faster to write and maintain than needlessly "flexible" and "customizable" systems. If I wanted to make a mess -- and myself indispensable -- I'd definitely go the "extremely customizable, this thing does everything you want sir!" route ;)


There are several orders of magnitude of (man hours)/(functional complexity) difference between game companies and enterprise. Changing code is incredibly expensive compared to changing data.

>Software changes are both very common and a reasonable activity, so common in fact that if you disagree with this I have to ask (if you don't mind me asking): do you work in software development, and if so, what kind of software?

I do work in software development, as I said above I am a game programmer.


Thanks for the reply!

Changing code is indeed expensive, but not prohibitively so. It's so reasonable an activity, in fact, that it's what I do in my day job, and what many others in the business/enterprise software industry also do (especially for in-house tools!). It's making "flexible" software that often turns out to be the costly option. YAGNI and other mottos apply (not blindly of course, but they often do apply). I've never seen in-house software that could be controlled entirely by data. Change requests most often require code changes, and it's not the end of the world.

I suspect your opinions are colored by the fact you're a game programmer. This ties back to my initial assertion: that game programming is different to other kinds of software development; that ease of maintenance, modularity and changing code are comparatively less important, and therefore some practices of the software engineering world are less relevant for game development, while others (raw speed, memory footprint, clever hacks to produce an interesting effect, etc) are more important. This means one has to take the advice of developers from the games industry with this in mind: that what works for games is not always best for the software industry at large, because the constraints & requirements are very different!


As I said above, requirements for games change dozen times a day, every day. This is a solved problem in the games industry. Good for you if you can afford code changes but it doesn't make such a solution better.

As of now there is no financial back pressure against the enterprise practices. It may stay like this forever. However, it's not entirely unlikely the situation will change in the future. For instance, in 80-90s people used to be paid handsomely for developing things like in-house email or spreadsheet. Now these jobs are gone because it's much cheaper to configure an off the shelf office suite than keep developers on payroll.


Let me ask you this: have you ever worked in anything but games development?

The rest of the industry is very different, and it's naive to dismiss it as caused by "lack of financial back pressure". The actual goals and constraints are different, even the program's life cycle, and it's only natural the development principles differ in turn! You'll notice a vast amount of literature about modularity, patterns, programming paradigms, encapsulation, etc. -- some of it misguided, some not. All of this literature is an attempt to cope with change and complexity in the software world; if the answer was "just modify data and never touch the code" I think someone might have noticed.

Do you by any change consider modifying game scripts as "modifying data"? If so, this would explain our disagreement on this matter :)


>Let me ask you this: have you ever worked in anything but games development?

I sure did, wrote and supported in-house software. Even though I don't see how it's relevant. It's not like 99% of programming information available is not in reference of enterprise and custom software development since these industries employ so many programmers.

> The rest of the industry is very different,

I am in complete agreement with this.

> and it's naive to dismiss it as caused by "lack of financial back pressure"

How do you explain the difference in cost then?

> The actual goals and constraints are different, even the program's life cycle, and it's only natural the development principles differ in turn!

Let me ask you this: have you ever worked in games development? If not then how do you know it's different? From my experience the only difference is in the financials. Game studios sell their programs and have to make profit from it to stay afloat. In-house developers don't sell anything and are financed by the actual main business' profits. Custom software developers are one step removed from the in-house: they need to sign a customer first but it's done by sales people, after the customer has started paying it's the same smooth sailing. The actual programming in any case is the same - specs go in, code comes out.

> Do you by any change consider modifying game scripts as "modifying data"?

No, scripts are also code. I personally oppose scripting on principle and see the need of scripting as a failure of the programmers but even teams relying on scripts do not spend much effort on the scripting because, as I said above, modifying code is incredibly expensive


I asked the question because what you're arguing is completely at odds with the reality of software development outside the games industry. Surely you agree your position (I'll restate it here, just so we're in the same page: that it is a bad idea to modify code and a good idea to implement most changes as "changes in data") is completely non-mainstream? Can you grant me that?

I've never worked in the games industry (though I wrote my own naive videogames, starting with my C64; like many of us, I got into computers because I wanted to make games). However, I have many friends who either work or worked in that industry, and they told me how it is. I know enough about death marches to know I mostly don't want to work in videogames (other jobs I'll never take if I can help it: consulting / "staff augmentation" companies). I also know many games programmers don't write automated tests -- I'd be scared of changing anything too if that was the case!

I'm very curious now about your position. I'm sure I must have misunderstood it. If you don't believe in changing code and you don't believe in scripting, then how do you propose stuff like changes in unit behaviors in an RTS are implemented? Say you have to change the enemy AI, or add a new unit that behaves differently, or even fix the path-finding algorithm. How do you change that by modifying only data? I understand tweaking your game by changing data (new sprites, changing the max speed of a unit or the geometry of a 3D level, etc), but actually changing behavior? And what if you're the one who's actually building the game's engine?

> How do you explain the difference in cost then?

I don't follow. Are you arguing games are less or more expensive? The economy of making games is different to business software. Games are hit based. If I read all those articles correctly, most games sell a lot of units near release, then taper off and are forgotten. Yes, some games have multiplayer and the most successful of them may last many years, and some others get add-ons, but still. Business software is completely different, especially if it's in-house: you don't need a "hit", you don't get "sales" and because it's usually not a product in itself, it gets modified constantly as the end-users (who may or may not be programmers) discover new features they need. Software like this usually lasts years, and must therefore be designed using engineering principles which will help a team of (possibly changing) programmers to alter its source code over the course of many years.


>Surely you agree your position (I'll restate it here, just so we're in the same page: that it is a bad idea to modify code and a good idea to implement most changes as "changes in data") is completely non-mainstream?

It's been mainstream in the games industry for the past 10 years or so. Obviously it's not in the enterprise software.

>If you don't believe in changing code and you don't believe in scripting, then how do you propose stuff like changes in unit behaviors in an RTS are implemented? Say you have to change the enemy AI, or add a new unit that behaves differently, or even fix the path-finding algorithm.

You are conflating two things. Bug fixing obviously requires code changes to repair the defective code. Implementing a new unit or AI is better be setting up flags or adding components from a set. It's no different from the business software recording transactions or entities into a database. Hopefully you don't write new code for each new order or a new SKU in the warehouse?

>I don't follow. Are you arguing games are less or more expensive?

As I said above, games show orders of magnitude less cost per functional complexity. A game with much more different complex behaviors than a typical enterprise system takes much less man/hours to code and test.


It is, but every once in a while when game engine or library gets posted on, say, HN, you get the usual arguments how code is not unittestable, functions pack boatload of different behaviour into themselves or take in 20 arguments, because if it works for me in my rails app, why shouldn't it work for game engine?


A poor programmer uses the first abstractions and ideas to come into their head, and runs with it.

A mediocre programmer uses ideas and abstractions they've heard about being good ideas for this scenario, and just runs with it, occaisionally rewriting as needed.

A good programmer carefully figures out what abstractions and ideas are appropriate for the job at hand, studying and rewriting until they're sure they've gotten them right, and uses them.

A master programmer uses the first abstractions and ideas to pop into their head: they've been at this long enough to know the right approach.


Yes. Unfortunately, as with driving, the vast majority of us think we're better than we are. It's good to have confidence, but near impossible to know when we're over estimating our own capabilities ... until it's too late.

In this case it's also more difficult because one doesn't always see the end results of their output, which may also be years down the road (no pun intended).


That's why becoming an master requires experience: you need experience to know what's good and bad, down the line.


Ridiculously long functions are a maintainability problem but so is a ton of really small functions that do not provide a logical separation of concerns.

OO code can provide modularity which can greatly improve the ability to make changes without breaking other code. On the other hand, when applied poorly it can have they opposite effect.

It's not the concepts, it's how they are applied.


A swarm of small functions is a worse maintainability problem --- it's not obvious how they interact to solve a particular problem, and the amount of plumbing you need to ship state between these different functions is frequently brutal. Sometimes it's easier to stick things in local variables and just have a long function.

Languages that make it easy to define "local" functions that operate on implicitly captured state can help --- e.g., C++, Lisp, sometimes Java --- but only where it makes sense. I don't believe in splitting functions solely because they're too long.


> Languages that make it easy to define "local" functions that operate on implicitly captured state can help

You don't even need that. In many languages, you can have {}-delimited blocks which cause variables inside of them to go out of scope when control flow exits them. I've used that to great effect in Perl to keep intrinsically large functions maintainable.


>- it's not obvious how they interact to solve a particular problem, and the amount of plumbing you need to ship state between these different functions is frequently brutal

this is a problem with the architecture, not a problem with small functions.


Not to mention that the more small functions you have, the worse your locality of reference (in terms of programmer cache, not CPU cache). In absurdum, software composed of 1-2 line functions which are then composed into higher and higher level 1-2 line functions is no better than software composed of one giant function with internal gotos for flow control.

Agreed that you should never split functions due purely to length, but a super long function smells bad because it suggests poor separation of concerns (if a function's super long then it's probably doing a lot more than one thing). Sometimes this is a problem, sometimes (like the case of Carmack's big main loop function) there's just a lot of small things to do sequentially and one big function is as good a way to represent that as any other.


Small functions (less dependencies) are easier to test


Here's another bit of heresy: testing isn't everything. Very fine-grained test suites frequently break when the structure of code under test changes even when the new code still does its job. In the limit, it's the equivalent of just breaking if SHA256(old_code) != SHA256(new_code).

Very short functions, I've found, encourage this kind of over-testing and just add friction to code changes without actually improving system reliability.

You're better off testing at major functional boundaries, and if you do that, the length of functions matters less than the interface major modules provide to each other.


> You're better off testing at major functional boundaries, and if you do that, the length of functions matters less than the interface major modules provide to each other.

Otherwise known as integration testing, which is far more useful than unit testing because bugs, especially regression bugs, more often occur in the system, than in specific functions.

You can have as many unit tests as you want, but until you have integration test, you have zero coverage for the really complex part of your code.

Unit tests are great for algorithms though.


Small functions are generally uninteresting to test. As smallness approaches "one liner", you're just verifying the compiler. Yes, 1+1 == 2.


It depends on how small, and how swarm-y. If the function is hilariously simple, and improves readability (like a premade getter for an alist or cons based structure in lisp), just do it. If it's pretty big, and you'll never reuse it... No. If you must, keep it local.


Even if you don't reuse a function it still encapsulates certain things. By looking at the name you know what it does (self-documenting-code). The interface tells you what variables it depends on. And finally it logically segregates your code.

I think the real reason people don't like small function is simply code navigations - which honestly is a poor excuse. That's an editor/IDE problem


Yes, but encapsulation isn't always a good thing: it lessens your awareness of what's happening within the encapsulated environment. This can lead to bugs.


Agreed, but really long functions are an indication that the code could benefit from some judicious refactoring.


And yet invariably it's refactoring into a ton of small functions. Round and round we go...


Why? I've been recommending the middle ground all along.


The 90's and 00's fetishization of OO as magic incantation that makes your code better did generate a lot of anti-patterns, some built into the languages. Multiple/deeply-multilevel inheritance, singletons, constructors with side effects, IO objects, and all the more trivial code-bloating boilerplate like "put everything in a class" and getters and setters.


In regards to the length of functions, to me it comes down to whether it is preferable to have the entire content of the function visible to the programmer at the time any changes need to be made, or if there are multiple things going on that can be assessed independently of each other. The idea of setting a hard rule that a function can only be as long as your screen height ignores the context is what is being done within the function, and encourages the programmer to make breaks in places where it may not make sense to do so.


Was it Atwood who said, modifying someone else's quote: "The two hardest problems in programming are cache invalidation, naming things, and off-by-one errors?"

Aside from separation of concerns when you make many, many functions, you have to come up with So Many Names.


A lot of those names will end up being the equivalent of nasty old assembly comments.

  add 10   ; add 10 to accumulator.
becomes

  int AddTen(int x) { return(x+10);}


Muratori's compression-oriented programming and Acton's data-oriented design have really helped me in writing HPC code. Carmack's arguments make sense for apps that have a clear main function, although they're less applicable to libraries.

I consider these also "best practices", they are just better for performance than object-oriented practices applied to many small objects.


One other important thing to keep in mind when considering that crowd is that all of them are game programmers, which face a very different set of constraints than web developers (which is what I think a lot of people here are) face. That being said, I do like a lot of what they have said in the past.

Not to say that the above aren't all examples of skilled programmers, and likely much more practical than a lot of people, just that they have a very different experience in the world than say, Uncle Bob or Martin Fowler. (Some of the more "best practices" developers).

I think an overarching trend is that programmers in general are realizing that "best practices" like all the OOP design patterns (like the flyweight or adapter patterns) are better if you don't have to go out of your way to accommodate them, but they fit into the language well.

The movement of languages like Rust, Go and Elixir (what I've been able to investigate lately) away from class-based OOP by splitting it up into its various pieces (subtyping, polymorphism, code sharing, structured types) is a good trend for the programming industry IMO. I'm looking forward to more improvements in the ability to statically verify code a la Rust. Also exciting is the improvements that C# is getting from Joe Duffy's group to help it reduce allocations and GC pressure.

It's an exciting time to be in software development and to be following PTL development, some meaningful progress seems to be happening.


I think a lot of this could be covered by two principles that are often quoted but over looked. The first is, KISS, keep it simple stupid. The second is single responsibility.

When I design software, I apply both of these to every facet of the system (though I admit sometimes not as well as I should). The end result is I might not have a ton of interfaces and hierarchies. It might not handle curve balls as well as an abstract MachineFactoryFactory could. It does handle everything that we've thrown at it however.


I do so agree with that. Me, i am a old school programmer. Started with basic, pascal, cobol, clipper, dbase, Vb, C++, php, javascript, java. Always created the framework i needed and the libraries i needed. Straight forward piramid structure software. Everything was functions ( its coming back ). Hardly any testers other then the client and yourself. Lots of that stuf is still running. Now i'm lost a of times in the complexity of the frameworks and the use of endless classes. Debugging takes ages because some class in a total different environment is badly written. My advice, KEEP IT SIMPLE.


A sony dev on "Pitfalls of Object Oriented Programming": http://harmful.cat-v.org/software/OO_programming/_pdf/Pitfal...


Casey Muratori's article really hits home for me. That's how I've felt for a long time, and I'm glad to have a coherent article to point to, to explain this to others.


"The code has bugs and segfaults" is a clear, objective problem.

"I don't understand the code" isn't ... quite the same type of problem.


This article is anecdotal and ranty but I will respond anyway. I've spent the last 15 years working on various projects involving cleaning up scientific code bases. Messy unengineered code is fine if only a very few people ever use it. However, if the code base is meant to evolve over time you need good software engineering or it will become fragile and unmaintainable.

That said, there are many "programmers" who apply design concepts willy nilly with out really understanding why. They often make a bigger mess of things. There is an art to quality software engineering which takes time to learn and is a skill which must be continually improved.

The claim in the article that programmers have too much free time on their hands because they aren't doing real work, like a scientist does, is obviously ridiculous. Any programmer worth their salt is busy as hell and spends a lot of thought on optimizing their time.

Conclusion, scientists should work with software engineers for projects that are meant to grow into something larger but hire programmers with a proven track record of creating maintainable software.


I've had similar experience with scientific software. When I'm told that the existing software is "OK because it works", I ask "how do you know it works?" because typically there are no unit tests or tests of any sort of individual stages for that matter.

I've found that scientists tend to assume "it works" when they like the results they see such as R^2 values high enough to publish.

Recently I converted some scientific software that was using correlation^2 (calling it R^2) as a measure for model predictions, as opposed to something more appropriate like PRESS-derived R^2s (correlation is totally inappropriate for judging predictions because it's translation and scale independent on both observed and predicted sides). Nobody went looking for the problem because results seem good and reasonable. Converting to a proper prediction R^2, some of the results are now negative, meaning the models are doing worse than a simple constant-mean function. Yikes.


Yes, I work on a mixed team of physicists, engineers, and computer scientists, and the most frustrating part is trying to work with some of the physicists' code. For the most part it is fairly functional, but the problem is that it is almost unreadable. It is quite clear that they write it as fast as possible so they can do what the OP would call real work without regard for others will need to work with and maintain that code later on.


What most people seem to forget is that "best practices" are not universal: Depending on the size and scope of the software project, some best practices are actually worst practices and can slow you down. For example, unit testing and extensive documentation might be irrelevant for a short term project / prototype while they will be indispensable for code that should be understood and used by other people. Also, for software projects that have an exploratory nature (which is often the case for scientific projects) it's usually no use trying to define a complete code architecture at the start of the project, as the assumptions about how the code should work and how to structure it will probably change during the project as you get a better understanding of the problem that you try to solve. Trying to follow a given paradigm here (e.g. OOP or MVC) can even lead to architecture-induced damage.

The size of the project is also a very important factor. From my own experience, most software engineering methods start to have a positive return-on-investment only as you go beyond 5.000-10.000 lines of code, as at this point the code base is usually too large to be understandable by a single person (depending on the complexity of course), so making changes will be much easier with a good suite of unit tests that makes sure you don't break anything when you change code (this is especially true for dynamically typed languages).

So I'd say that instead of memorizing best practices you need to develop a good feeling for how code bases behave at different sizes and complexities (including how they react to changes), as this will allow you to make a good decision on which "best practices" to adopt.

Also, scientists are -from my own experience- not always the worst software developers as they are less hindered by most of the paradigms / cargo cults that the modern programmer has to put up with (being test-driven, agile, always separating concerns, doing MVP, using OOP [or not], being scalable, ...). They therefore tend to approach projects in a more naive and playful way, which is not always a bad thing.


Steve Ballmer on "KLOCs" [1]. Not saying you're taking that extreme but LOC value is certainly debatable...

[1]: https://www.youtube.com/watch?v=kHI7RTKhlz0


Complexity is related to size, but is also related to coding style. Comparisons of LOC is meaningless outside of a context,but surprisingly useful inside of a context (and as long as you don't use them as metrics, because they can be gamed too easily).

If you want to see this in action, write a script that will troll your code base and count the total number of uncommented lines of code every day. Draw a graph. Even without knowing anything about your project, I think you will find a very interesting thing -- namely that the code base grows consistently and that the amount it grows per day is a random variable with a normal distribution. (Obviously this only works if you have a consistent number of developers)

If you then do a rolling average (say every 2 weeks), I think you will find something even more interesting: the rate of change will going in one direction or another -- either higher or lower and it will be doing it consistently (normalizing for the number of developers is a bit easier here).

Once you have verified that, you can ponder about what it all means.


Amen!


Disclosure: I'm a recent astronomy grad who specialized in computational astrophysics. Definitely biased.

The issue is that at least for many scientists and mathematicians, mathematical abstraction and code abstraction are topics that oftentimes run orthogonal to each other.

Mathematical abstractions (integration, mathematical vernacular, etc) are abstractions hundreds of years old, with an extremely precise, austere, and well defined domain, meant to manage complexity in a mathematical manner. Code abstractions are recent, flexible, and much more prone to wiggly definitions, meant to manage complexity in an architectural manner.

Scientists often times have already solved a problem using mathematical abstractions, e.g. each step of the Runge-Kutta [1] method. The integrations and function values for each step is well defined, and results in scientists wanting to map these steps one-to-one with their code, oftentimes resulting in blobs of code with if/else statements strewn about. This is awful by software engineering standards, but in the view of the scientist, the code simply follows the abstraction laid out by the mathematics themselves. This is also why it's often times correct to trust results derived from spaghetti code, since the methods that the code implements themselves are often times verified.

Software engineers see this complexity as something that's malleable, something that should be able to handle future changes. This is why it code abstractions play bumper cars with mathematical abstractions, simply because mathematical abstractions are meant to be unchanging by default, which makes tools like inheritance, templates, and even naming standards poorly suited for scientific applications. It's extremely unlikely I'll ever rewrite a step of symplectic integrators [2], meaning that I won't need to worry about whether this code is future proof against architectural changes or not. Functions, by and large in mathematics, are meant to be immutable.

Tl; dr: Scientists want to play with Hot Wheels tracks while software engineers want to play with Lego blocks.

[1]: https://en.wikipedia.org/wiki/Runge–Kutta_methods

[2]: https://en.wikipedia.org/wiki/Symplectic_integrator


> The issue is that at least for many scientists and mathematicians, mathematical abstraction and code abstraction are topics that oftentimes run orthogonal to each other.

Excellent observation. I'm an ex-physicist and on the few occasions that I had to use computers the only thing I cared about was how computer functions mapped into the mathematical abstractions that I cared about. Everything else was just noise.


>mathematical abstractions are meant to be unchanging by default

Let's say today I am doing RK2, and tomorrow I want RK4, how do I easily make my change? In my codes, it's a change of a single line and I get higher order convergence, etc. It is not a week or month project, as for many codes, it would be because of some of those abstractions you derride.

Also, computational math is an active area of research, the method you mentioned is not hundreds of years old, although yes, it was developed in the early 1900's. To this day, people are developing new methods that give higher order accuracy (orders above O(err^10) to abuse notation)...but as you can guess, no one uses them because changing the current codes are so difficult they just don't.[0] Of course, I agree O(err^4) is often enough, so the motivation to change codes now isn't that over-powering, but it again is something we lose by learning things a little but outside our field which could be helpful.

[0]Instead we, choose smaller and smaller mesh-sizes and timesteps to deal with small order error, and request millions of cpu hours, use electricity, kill trees and contribute to global warming.


Higher order integration methods are not always more accurate or more power efficient. They are typically worse in terms of stability if the step is too big, they require more computations per step and they may have higher error constant, so they actually often require smaller steps than low order methods. That's why in circuit simulators only methods of order up to 4 are really useful, and most of the time simple schemes of order 2 are used.


It sounds like you want a language like Haskell. Abstractions are based on mathematical (algebraic and category theoretic) abstractions with well-defined laws. The language has immutable semantics and admits equations reasoning. Using libraries like Dimensional has made me better at physics; many fields of physics play fast and loose with units and dimensions and aren't even aware of it.


I'm glad someone picked up on it! I've definitely picked "FP is the truth for astrophysics programming" as my hill to die on, but the issue with Haskell is it's combination of vernacular and tools that make it hard to approach for the novice scientific programmer. Almost all dedicated astrophysics programmers wind up using Fortran90, which is sorta the de facto due to it's imperative nature.


Modern Fortran is pretty awesome though; it has pure functions, it's fast, scales well, is easy to read and understand. Probably much more suited to most scientific fields than Haskell.


meeeeh, come on. You can't say the sloppy code can be trusted because the clean math it is based on is verified. The sloppiness of the code prevents validation that it properly implements that precious math of yours.

The problem is that you want to treat the code as not your "real" job. Your real job is getting correct answers into published papers, and providing a proof of that correctness. If your code, on which your results rely, is too sloppy for anyone else to understand (and note that "anyone else" can include "you, in 6 months"), then you've not proven correctness at all.


>you want to treat code as not your "real" job

I'm not treating anything, it's because coding isn't my job. The job of a scientist is to do research, and coding is nothing more than a tool towards that goal.

>your code, on which your results rely, is too sloppy for anyone else to understand...then you've not proven correctness at all

No, my results rely on my experimental methods, my mathematical models, and my code. Correctness can be proven in spite of sloppy code. Would you dispute a claim on the basis that calculations done on a calculator can't be seen by others?

Furthermore, the burden of proof after peer review in academia is on the person disproving in it. If my code is wrong at a basic level, what good does it do for anyone? If someone is to disprove my paper, they should reimplement the code in order to account for errors.

Does this excuse spaghetti level code that often accompanies papers? Of course not. Scientists have a lot to learn from software engineering about proper programming skills, but programming is simply another tool in the repertoire, not something that should be put on a pedestal.


> coding is nothing more than a tool towards that goal.

That's an important idiom that most devs need to understand at some points in their career, but don't. It's not even exclusive to business goals, but sanity and complexity ones as well..


Chances are, coding isn't your "real" job in a lot of cases (including most software engineers and programmers). Your real job is solving a problem for someone, using code. Good software architecture, coding style, etc. are there to help you achieve this goal, but the end user of the software doesn't care about them.


No more than a home owner cares whether or not their house's blueprints were printed on a napkin. If such a thing were to happen, it would be indicative of an incompetent contractor.


Unless you're the first owner of the house, you probably don't even have meaningful engineering drawings for it. And even if you did, they are not what you care about. (Well, it'd be nice to have some documentation about the wiring, plumbing, which walls are load bearing, etc. but that's another rant.)

What you care about as a homeowner is that your house is solid, watertight, safe to live in, acceptable to look at, and meets your needs as a tenant. Whether your builder designed it down to the last tack and cable-tie in SolidWorks, or sketched it on a napkin, or made it up as they went along, makes no difference to you as the homeowner.

Quality of process is only ever a proxy for quality of results.


But it's an extremely good proxy. You don't see good results coming out of bad tools on a regular basis.


If the contractor's blueprints had been meticulously peer reviewed, why does the blueprint medium matter?


You're speaking in absurdities.

There are things you just don't see together. You don't see quality blueprints printed on napkins, and you don't see quality code that is well-suited to its task written in a sloppy way. Technically speaking, you can lose weight eating all your meals from McDonald's. But the person who eats all their meals from McDonald's isn't going to exercise the portion control necessary to do it. It's just not a thing that happens often enough to bother considering it.

Sloppy code is an excellent indicator that the program is a buggy piece of junk. I don't care if it's technically "possible" to write a "good" program in a sloppy way. If your code is sloppy, you aren't that person who makes that program. The person who is capable of making good programs doesn't write sloppy code, even if they are technically capable of doing it.


So is something like Haskell a potentially better fit? When you use terms like "immutable" and "unchanging", makes me think of functional programming.


"Crashes (null pointers, bounds errors), largely mitigated by valgrind/massive testing"

Once upon a time I had lunch with a friend-of-a-friend whose entire job, as a contractor for NASA, was running one program, a launch vehicle simulation. People would contact her, give her the parameters (payload, etc.) and she would provide the results, including launch parameters for how to get the launch to work. Now, you may be thinking, that seems a little suboptimal. Why couldn't they run the program themselves; they're rocket scientists, after all?

Unfortunately, running the program was a dark art. The knowledge of initial parameter settings to get reasonable results out of the back end had to be learned before it would provide, well, reasonable results. One example: she had to tell the simulation to "turn off" the atmosphere above a certain altitude or the simulation would simply crash. She had one funny story about a group at Georga Tech who wanted to use the program, so they dutifully packed off a copy to them. They came back wondering why they couldn't match the results she was getting. It turns out that they had sent the grad students a later version of the program than she was using.

Anyway, who's up for a trip to Mars?


Here's the thing that grinds my gears. Let's see scientists apply that same attitude toward papers. Let them label a bunch of equations poorly, and not label a few, have them explain concepts out of turn in different places in the document, have them produce shitty, unreadable figures, let's see how that turns out.

The issue is that code which eventually leads to their results isn't public, they don't have their reputation lying on it, and so they can pretend they understand what they talk about when they come to publishing, but one or two looks at their code let's you know they hardly bullshit. But when if comes to a paper, well, they will be judged on that, so they can't be messy there.

It's okay if it's a one off code for one group, that's fine. But when a code is vital for so many people, for it to be that terrible and inaccessible?

Simple solution: if you are funded by the tax payer, what you produce should be accessible by the tax payer (absent defense restrictions). Demanding accessibility for gov't funded papers is good but I feel the same restriction should apply to code.


> Let them label a bunch of equations poorly, and not label a few, have them explain concepts out of turn in different places in the document, have them produce shitty, unreadable figures, let's see how that turns out.

This is what they already do, though...


That's incredibly common in enterprises. the fad is to move enterprises to an ITIL model where work is done by a mix of matrixed in-house and outsourced teams, the overhead of "best practices" becomes important.


His first list really, really hand-waves the problems that style of coding can cause. Just use better tools or run valgrind? It never is that simple.

One aspect of scientific coding is that it can have very long lifetimes. I sometimes work on some code > 20 years old. Technology can change a lot in that time frame. For example, using global data (common back then) can completely destroy parallel capability.

The 'old' style also makes the code sensitive to small changes in theory. Need to support a new theory that is basically the same as the old one with a few tweaks? Copy and paste, change a few things, and get working on that paper! Who cares if you just copied a whole bunch of global data - you successfully avoided the conflict by putting "2" at the end of every variable. You've got better things to do than proper coding.

Obviously, over-engineering is a problem. But science does need a bit of "engineering" to begin with.

Anecdote: A friend of mine wanted my help with parsing some outputs and replacing some text in input files. Simple stuff. He showed me what he had. It was written in fortran because that's what his advisor knew :(

Note: I'm currently part of a group trying to help with best practices in computational chemistry. We'll see how it goes, but the field seems kind of open to the idea (ie, there is starting to be funding for software maintenance, etc).


> there is starting to be funding for software maintenance

Any reference concerning this point? I am interested!


Here is a starting point for one big movement in my field:

https://www.nsf.gov/news/news_summ.jsp?org=NSF&cntn_id=18934...

It's not quite "maintenance", but is definitely a step away from just writing software to get an answer and then abandoning it.

Also, anecdotally, a there is movement towards more open-source software. Slowly but surely, things are moving in the right direction.


I think some of the author's criticisms are misplaced.

Long functions — Yes, functions in scientific programming tend to be longer than your usual ones, but that's often because they cannot be split into smaller functions that are meaningful on their own. In other words, there's simply nothing to "refactor". Splitting them into smaller chunks would simply result in a lot of small functions with unclear purposes. Every function should be made as small as possible, but not smaller.

Bad names — The author gives 'm' and 'k' as examples of bad variable names. I think this is a very misplaced criticism. Unless we are talking about a scientific library, many scientific programs are just implementations of some algorithms that appear in published papers. For such programs, the MAIN documentations are not in the comments but the published papers themselves. The correct way to name the variables is to use exactly the symbols in the paper, but not to use your favourite Hungarian or Utopian notations. (Some programming languages such as Rust or Ruby are by design very inconvenient in this respect.) As for long variable names, I think they are rather infrequent (unless in Java code); the author was perhaps unlucky enough to meet many.


This is so true:

"Many programmers have no real substance in their work – the job is trivial – so they have too much time on their hands, which they use to dwell on "API design" and thus monstrosities are born"

It also explains proliferation of "cool" MVC and web frameworks, like Node.js, Angular, React, Backbone, Ember, etc.


I agree except (to be pedantic) I think Node is misplaced—it's just a runtime with a tiny standard library, nothing to do with MVC. I've actually found Node to able to be simple and nice by mostly using streams and a couple of small libraries—something most Node programmers ignore.

I actually think another problem is that programmers spend too much time following what the "big players" do and mistakenly apply that stuff to their so-called trivial work. I've wasted hours trying to sift through code from companies who thought they needed Facebook/Google-tier infrastructure with stuff like Relay/GraphQL. A simple CRUD Rails/Django/Phoenix/Node app would've been fine.


As much as it's not an MVC framewrok, I'd argue Node fits right in: it was created by devs with waaaay too much time on their hands.


Or enterprise environments like JavaEE? Or would complexity in that area, be explained by attempted vendor lock-in?

Node using Javascript at least has first-class functions, which makes the code more expressive and readable than a lot of Java + Spring XML.

And React + flux eliminates a huge class of bugs due to state side-effects by getting rid of stateful object graphs altogether. The motivation is more profound than bored programmers tinkering with the API because they have too much time on their hands.


Still, I think that these frameworks are important, if not for their successes, then for their failures.

The thing that's more disturbing is that we (as a profession) can't seem to learn from our failures. There is too few structured research on best practices.


Our profession is younger than most of our grandparents. I want to see what programming practices will be like in 200 years (assuming we're still programming in some way by then).


Mostly I agree, bad naive code is better than bad sophisiticated code.

Also science very frequenly only requires small programs that are used for one analisys and then thrown away. It's OK to have a snarl of bad Fortran or Numpy if it only 400 lines long.

BUT: scientific projects are often (in my old field, usually) also engineering projects. Such experiments are complex automated data gathering machines hardware and take rougly similar data runs tens of thousands of times.

There should be some engineering professionalism at the start to design and plan such a machine. Especially the software, since it is mostly a question of integrating off-the shelf hardware.

But PIs think:

(A) engineering is done most cheaply by PhD students -- a penny pinching fallacy.

(B) that their needs will grow unpredictably over time.

B is true, but is actually is a reason to have a good custom platform designed at the start, so that changes are less costly. Your part time programmer is going to develop many thousand of lines of code no one can understand or extend. (I've done it, I should know.)


even B is false a lot of times. Just look at most of this 'big data' - it all can fit on my mobile phone.


I believe this post is fundamentally misguided, but I can see how the author got there. In fact I see it as a sort of category error. When you talk about a style of programming being "good" or "bad", I always want to ask "for what?". I wonder if the author has thought about what would happen if everyone adopted the "scientific" style they are alluding too.

Most of what the author describes as the problems of code generated by scientist are what I would call symptoms. The real problems are things like: incorrect abstractions, deep coupling, overly clever approaches with unclear implicit assumptions. Of course this causes maintenance and debugging to be more difficult than it should but the real problem is that such code does not scale well and is poor at managing complexity of the code base.

So long as your code (if not necessarily its domain) is simple, you are fine. Luckily this describes a huge swath of scientific code. However system complexity is largely limited by the the tools and approaches you use .. all systems eventually grow to become almost unmodifiable eventually.

The point is, this will happen to you faster if you follow the "scientific coder" approaches the author describes. Now it turns out that programmers have come up with architectural approaches that help manage complexity over the last several decades. The bad news for scientific coders is that to be successful with these techniques you actually have to dedicate some significant amount of time to learning to become a better programmer and designer, and learning how to use these techniques. It also often has a cost in terms of the amount of time needed to introduce a small change. And sometimes you make design choices that don't help your development at all. They help your ability to release, or audit for regulatory purposes, or build cross-platform, or ... you get the idea. So these approaches absolutely have costs. You have to ask yourself what you are buying with this cost, and do you need it for your project.

The real pain comes when you have people who only understand the "scientific" style already bumping up against their systems ability to handle complexity, but doubling down on the approach and just doing it harder. Those systems really aren't any fun to repair.


It's an interesting discussion, and as the article points out, "Software Engineer" code has some issues as well

There's also an issue that code ends up reflecting the initial process of the scientific calculation needed, which might not be a good idea (but if you depart from that, it causes other problems as well)

Also, I'm going to be honest, a lot of software engineers are bad at math (or just don't care). In theory a/b + c/b is the same as (a+c)/b, in practice you might near some precision edge that you can't deal directly and hence you need to calculate this in another way

Try solving a PDE in C/C++ for extra fun


It's worse that you say (and think?). For example: in general, floating point equality isn't transitive, and addition isn't even associative.

Not only do those "bad at math" software engineers get this wrong, most of the scientists do too. These two groups often make different types of errors, true - but nearly everybody who hasn't studied numerical computation wiht some care is just bad at it.


I'm 80% "software engineer" and 20% "researcher" and have to play both roles to write supercomputer code (I'm the minority, most peers are more researchers). These issues are important right now, as the govt is investing in software engineering due to recent hardware changes that require porting efforts. We recognize the pitfalls of naive software engineering applied to scientific code, and would like to do things more carefully. I don't think we should have to choose one or the other; with proper communication we can achieve a better balance.


In his excellent book [1], Andy Hunt explains what expertise is with a multi-level model [2], where a novice needs rules that describe what to do (to get started) while an expert chooses patterns according to his goal.

So, "best practices" are patterns that work in most situations, and an expert can adapt to several (and new) situations.

[1] https://pragprog.com/book/ahptl/pragmatic-thinking-and-learn...

[2] https://en.wikipedia.org/wiki/Dreyfus_model_of_skill_acquisi...


The title of this article should really be "Why bad scientific code beats bad software engineer code."

It contrasts a bunch of bad things scientific coders do, and a bunch of bad things bad software engineers do. There's no "best practices" to be seen on either side.


The article overlooks a massive source of problems : the problems he describes in engineers' code usually starts to become annoying at larger scale. The problems he describes in scientists' code rarely happens at scale, because it can't be extended significantly. I feel it's weird to compare codebases that probably count in the thousands, and codebases that count in the hundreds of thousands or million lines of code.

Also it is worth noting that every single problem he has with engineers' code is described at length in the litterature (working effectively with legacy code, DDD blue book, etc). Of course, these problems exist. But this is linked to the fact that hiring bad programmers still yields benefits. I believe this is not something that we can change, but if the guy is interested in reducing his pain with crappy code, there are solutions out there.


> Long functions This isn't the worst thing. As long it gets refactored when there is a need for parts of that function to be used in multiple places.

> Bad names (m, k, longWindedNameThatYouCantReallyReadBTWProgrammersDoThatALotToo) I can live with long winded names, while slightly annoying, they at least still help with figuring out what's going on.

What I can't stand are one or two letter variable names. They're just so unnecessary. Be mildly descriptive and your code becomes so much easier to follow, compared to alphabet soup.

What annoys me about stuff like this is that it just feels like pure laziness and disregard for others. Having done code reviews of data scientists they just don't want to hear it. They adamantly don't care - compared to my software engineer compatriots who would at least sit there and consider it.

But this is just my own anecdotal experience.


As a poster above pointed out, a lot of scientific code is an implementation of a mathematical device. And the scientist is trying to make their equations come to life. And in math, many equations are simplified to their variables in order to avoid insane complexity. Many of the scientists actually are thinking in terms of 'S', 't' and 'v', etc. What's the particle's x, y, t coordinates, and how does that get me v, p and l? So that they can write out:

v = ((x2 -x1)^2 + (y2 - y1)^2) ^ (1/2) / t

rather than:

velocity = sqrt(pow((locationX2 - locationX1),2) + pow((locationY2 - locationY1),2)) / duration

The latter is AWFUL mathematics, and very real code. (and that is an easy equation. I've had to implement very very complicated calculus into objective-c code and it is absolutely horrid what comes out as 'code', as clean as that code might be. It in no way whatsoever resembles the elegance of the math that birthed it.)

When I first started, I naively tried to write math code with the natural Objective-C objects and ended up on the very wrong side of the language. I realize the mistake now, but it's very awkward to ask the (scientist) programmer to go along programming with the language's tutorialed objects, then to tell them, "btw, that 'NSNumber' you have, can't be used as an exponent, along with that 'float' over there. And you can't add NSNumbers and 'integers'. Oh, you want to multiply two NSNumbers together? You want to write an equation with NSNumbers on one line!? Go for it. Oh, and you want to do a cross-product on a matrix? Ha!".


It is tradition in mathematics (and physics, and maybe other sciences) to use single letter names. A function is f, a variable is x, a parameter is a. These short names are intuitive for the scientist who wrote the code, even if programmers have different conventions.


The meat is in the footnote, as always.

> (In fact, when the job is far from trivial technically and/or socially, programmers' horrible training shifts their focus away from their immediate duty – is the goddamn thing actually working, nice to use, efficient/cheap, etc.? – and instead they declare themselves as responsible for nothing but the sacred APIs which they proceed to complexify beyond belief. Meanwhile, functionally the thing barely works.)

It seems the author has been plagued with programmers who avoid taking responsibility. One strategy for creating job security is to build a system too complex for anyone else to maintain it. Perhaps the author's colleagues are using this strategy.

It's hard to take complaints about "best practices" seriously when the practices described are not best.


Working in this area (and coming from a math background), the biggest issues that I have with most scientific and engineering code are:

1) lack of version control

2) lack of testing

Everything else (including the occasional bad language fit) is usually a distant 3rd.


    > Simple-minded, care-free near-incompetence can be
    > better than industrial-strength good intentions 
    > paving a superhighway to hell.
Love this line.

I think the thing about bad scientific code that makes it good is that you can often get really good walls around what goes in and what comes out. To the point that you can then mitigate the danger of bad code to just that component.

Software architects, on the other hand, often try to pull everything in to the single "program" so that, in the end, you sum all of the weak parts. All too often, I have seen workflows where people used to postprocess output data get pulled into doing it in the same run as the generation of the data.


As always, the right way is somewhere down the middle.

I recently inherited a blob of "scientific code" with basically no abstraction. Need to indicate the sampling period? That'll never change--just type .0001; that'll never change. Need to read some files? Just blindly open a hardcoded list of filename and assume it's okay--it'll always been like that ? And of course, these files are in that format and there's no need to check. Of course, after this code was written, we bought new hardware. It gathers similar data, but samples at a completely different frequency, has a different number of channels, and records the data in a totally different way.

We could fork the code, find-and-replace the sampling rates, and all that, and maintain a version for each device we buy. Or we could write a DataReader interface, some derived versions for each data source, and maybe even the dreaded DataReaderFactory to automatically detect the filetypes.

Guess which approach will work better in a few years?


In my experience, there is a middle path. Hard-code the sampling period, but put it in a constant `SAMPLING_PERIOD`. Then when the hardware changes and things break, refactor the I/O code into a DataReader object. If and only if you need to support several formats, either implement your DataReaderFactory, or write a class for each filetype.


Statistically, you aren't going to make it a few years.


I recently followed a course on "Principles of programming for Econometrics" and although I knew a lot about programming already I learned a lot about being structured and documentation. The professor ran some example code which he wrote 10 years ago! He wasn't really sure what the function did again and BAM it was there in the documentation (i.e. comment header of the function).

I used to just hack stuff together in either R or Python but that course really got me thinking about what I want to accomplish first. Write that down on paper. And then and only then after you have the whole program outlined in your head start writing functions with well defined inputs and outputs.


Why not use the computer to help you define and understand the problem? It will be much faster to iterate quickly at a repl and then write the cleaned up version later rather than just trying model the whole thing in your head first


I know a lot of math majors thrown into c++ jobs that write unreadable code almost forgetting they are allowed to use words and not just single letters (though they would probably be fine in the functional programming scene). There's a learning curve either way, write like your co-workers unless you have the experience to know your co-workers suck.


This has nothing to do with scientific programming and everything to do with "best practices" being mind blowingly awful. Coupling execution and data is good for data structure initialization, cleanup, and interface. Everywhere else they should just be kept separate. Data structures should be as absolutely simple as possible, not as full and generic as possible.

Where people get into trouble many times is thinking that every transformation or modification of data should be inside one data structure or another, when really none of them should be except for a minimal interface.


I was introduced to the concept of "Dont' hide power" by Tanenbaum OS book (although it seems that Lampson is actually the original source [1]).

It always seemed to me a good design rule, but, even after 10 years of professional programming, it never actually clicked until last year.

I had always rigorously applied encapsulation, decoupling and information hiding, exposing only the minimal interface necessary to do the job [2].

While this lead to elegant designs which might be even efficient, from experience is makes them hard to extend. You might need access to some implementation detail of a lower level layer to either simplify the system or improve performance but you either violate encapsulation, break existing interfaces (violating the open-close principle), implement the higher level directly inside the lower level (this is very common), or simply leave with the inferior implementation.

I've now given up on complete encapsulation, I expose as much implementation details as possible, hiding only what's necessary to preserve basic invariants, and pushing abstractions only on the consumer side of interfaces.

Paraphrasing Knuth, premature generalization is the root of all evil.

[1] http://www.vendian.org/mncharity/dir3/hints_lampson/

[2] these are common rules for OO designs, but by no mean restricted to it. In fact very little of what I've worked on could be called OO.


As a former scientist and now professional software developer I can confirm some of the observations of the article. This is because enterprise developers do premature flexibilization. And "Premature Flexibilization Is The Root of Whatever Evil Is Left", see

http://product.hubspot.com/blog/bid/7271/premature-flexibili...

But on the other side, most scientific code I've seen, is simple (not by readability, but it uses simple abstractions), highly optimised and delivers irreproducible results.

Why? Because most scientist don't write any kind of test. Nobody teaches scientists test-driven development and most don't know about unit or integration test, so one can make sure, a program generate consistent results. Scientists are happy if a program runs on their machine and produces a nice graph for their paper. If you ever want to reproduce the results of a non-trivial scientific simulation, good luck. You will discover that the results will be highly dependent on the type and version of CPU, GPU, compiler, operating system, time zone, language, random generator seed, version of the programming language(s), versions of self-written libraries (which most often don't even have a version number), version of the build system (if one is used at all), etc... . And that's why you will never discover running scientific code outside of the scientists computer.

TLDR; Both (scientist and professional developers) can learn from each other.


I'd love to hear about the tools which almost completely mitigate parallelism errors.

The author's list of things that are wrong with "software engineers"' code is 50% "things that are just language features" and 50% "bad ways to use language features that nobody thinks is best practice in software engineering".

Part of the irony is that lot of the more hairy software engineering techniques that he decries are used by the people writing platforms and libraries that scientist programmers use, to make it possible for their "value of everything, cost of nothing" code to actually run well.

There is a big difference in attitude between scientist programmers and software engineers.

Often, a scientist already has the solution to the problem, and is just transcribing it into a program. The program doesn't need to be easy to understand in isolation, because a scientist doesn't read programs to understand somebody else's science, she reads the published peer-reviewed paper. After all, if you wanted to understand Newtonian dynamics, you wouldn't start by reading Bullet's source, even if it's very well written. (I don't know if it is.)

Conversely, for a software engineer the program is a tool for finding the solution. Even though they're in a scientific field, if it's accurate to call them software engineers they'll be from a background where the program itself is the product, rather than the knowledge underlying the program.


I think the fundamental problem is that programmers have been taught that "abstract = good" in all things.

How often do you hear someone say they "abstracted" a piece of code or "generalized" it, without anyone asking why? Or how often do people "refactor" things by taking a piece of code that did something specific, and giving it the unused potential to do more things while creating a lot of complexity? The problem with "abstracting" things is it means behaviors that were previously statically decidable can now only be determined by testing run-time behavior, or the key behaviors are now all driven from outside the system (configuration, data, etc.)

Also by making things more flexible, your verbs suddenly become a lot more general and so readability suffers.

Kind of an aside, but whenever I see code where a single class is split into one interface and one "impl" I've taken to calling it code acne (because Impl rhymes with pimple). If you're only using an interface for ONE class it's a huge waste of time to edit two files! The defense is always something like "well what if we need a mock version for tests". Fine, write the interface when you actually do that.


This article itches me on so many levels. It is not wrong directly but it is definitely not the truth either. I expected more from someone who claims to be a scientist.

The main issue I have with the piece is the oversimplification of the equation to such an extend important variables of the equation are removed without mention or explanation of their removal.

An example would be project size. Yes for FizzBuzz globals are fine probably and FizzBuzz enterprise shows beautifully that overengineering is a thing (https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...). All of the authors statements would hold here. We all agree and smile. But the same architectural choices make sense in many large enterprise projects. Take the comment on large number of small files for example. This gives less merge conflicts (amongst many other things). Yes you working alone on your tiny project won't notice but try working in a single large file with 100 devs committing to it. Good luck with the merge conflicts! Large methods? Same issue. Everybody has to change code in that one method, merge conflict. Inheritance? Nice thing if you build an sdk for others to use and want to hand them a default base version. They can extend and override your virtual methods to get custom behaviour. No code duplication which you have to maintain and keep in sync! Wow!

Next up I would like to address the difficult naming. Everybody nodding that that was bad. Nice to write it down. However, from a scientist I would expect a disclaimer that this was based on personal experience with programmers and not the ground truth for pprogrammers or cite a credible source. I'd say there is only a small fraction who do that. Disclaimer on this is that both programmer and scientist should work together if one side does not understand the naming conventions for the project.

Simple-minded care-free can give you a prototype which is the scientist job. Enterprise programmers (who often are computer scientists) give you your product.

Tl;dr stop comparing apples and oranges. Or as a true scientist at least describe context and omittance of various variables. O, and share your gdamn ugly code so we don't need to read your papers and implement it ourselves from scratch. That's the true waste here ;).


I am sure that the person who wrote this article did it for a reason and has been frustrated by "programmers". However, this is very anecdotal and, to be honest, doesn't desever more than a mere acknowledgement - yes, blindly applying software practices and adding more indirections is not always good, but creating robust, maintainable, non-ad-hoc software requires abstractions, indirections, and programmers.


The worst thing to ever happen to "best practices" was when managers found out about them. Suddenly, we were not allowed to think for ourselves and solve the problem at hand, we also had to figure out what "best practice" to use to implement our solution.

And it's not like you can argue against "best practices". They're the "best" after all. So that makes you less than best, to oppose them!


The inexperienced-CS-grad errors he describes are a maintenance nightmare, but those non-programmer errors cast a lot more doubt on the accuracy of the results. The importance of correctness depends on the problem I guess.


I think the article makes a decent case for "simple bad code" for small projects. In a bigger project, this approach collapses, but in small to medium sized ones, you can do fine, and the uglyness of the code is "shallow", as I like to call it. That is, the problems are local and done in simple straightforward ways.

The "software engineer" code he describes sounds like the over engineered crap most of us did when getting out of the clever novice stage and learned about cool and sophisticated patterns which we then applied EVERYWHERE.

I guess some never come out of that phase, but the code of real master programmers is simple and readable, only uses complex patterns when truly needed, and has no need to show off how clever the author is in the code.

You know, the people who made it necessary to invent "POJO" (http://www.martinfowler.com/bliki/POJO.html).


The part where scientific code gets nasty is when the "simple bad code" from a proof-of-concept suddenly gets abruptly promoted to the core of some pipeline. This happens a lot.


Previous discussion: https://news.ycombinator.com/item?id=7731624 (2 years ago, 168 comments)


If the URL is exactly the same, I wish HN would just surface the old comment thread automatically at the top of the page.


I know what you mean by 'messy scientific code', hairy stuff. Deal with it almost on a daily basis. 10 element tuples, weird names etc. Makes you wanna puke at the beginning. But then, as I get to understand what they are trying to say (i.e. Business Purpose) things get easier. Somehow I remember what 6th element in the tuple is and where approximately in 2000 LOC function should I look for something. BUT... When it comes to 'properly engineered' piece of infrastructure OOP shit filled with frameworks and factories, I have no idea. No matter how hard I try I cannot remember nor understand what the fuck are they trying to say. My guess, this is because they have got nothing to say, really.


The examples he gives seem like using complex features of programming languages for the sake of it rather than best practices.


Remember that algorithms, data structure design and API experience are also crucial parts of coding. These are not necessarily things that will be learned by iterative hacking.

Scientific data sets can be huge, and there are all kinds of ways to write code that doesn’t scale well.

If the scientific code is trying to display graphics, then you really have to know all the tricks for the APIs you are using, how to minimize updates, how to arrange data in a way that gives quick access to a subset of objects in a rectangle, etc.


This is describing two stages in the growth of programmer skill.

The researchers are at beginner stage and make classic beginner-stage mistakes. The developers are at intermediate stage, and they make classic intermediate-stage mistakes.

There is a later stage of people who can avoid both, but the author probably hasn't worked with anyone in that stage. Which is not surprising, because once you're that experienced there are big financial incentives to get out of academia.


I think you can reframe this debate pragmatically and widen its applicability significantly: At what point is "bad" code more effective than the alternatives. If you get down into a debate about "best practices" you'll have to concede that anyone writing the code the author is talking about might be using "best practices" in some explicit way, but isn't "following best practices", which are designed to avoid precisely the difficulties he outlines. On the other hand, it's true that most code out there is bad code, and that heavily architecting a system with bad code can be even more of a nightmare than more straightforward bad code. The real question is, when should scientists favor bad code? I'm a huge fan of best practices and of thoughtful and elegant coding, but I could see an argument being made that in most circumstances, scientific code is better off being bad code, as long as you keep it isolated. I'd love to see someone make that argument.


From the comments:

> Of course, design patterns have no place in a simple data-driven pipeline or in your numerical recipes-inspired PDE solver. But the same people that write this sort of simple code are also not the ones that write the next Facebook or Google.

> post author: Google is kinda more about PageRank than "design patterns".

wut


What's the problem?

Google's initial success can be attributed to the effectiveness of the PageRank algorithm, not the quality of the code that implemented it. It doesn't matter if the first implementation (or even the current implementation) was a horrible mess of gotos in a single ten-thousand-line-long function, from the point-of-view of its users.


Obviously if code works it doesn't matter what it looks like to users, but the whole point of code design is making code easier for programmers to work with and maintain so that it stays working for users. Writing and maintaining Google scale software without design principles would be a nightmare.


OK - not to defend all professional programmers - but it seems quite reasonable that perhaps the tasks where people are hired specifically to write code are perhaps bigger and more complicated than programming tasks that are completed by people who do programming only as a small part of their job.


Two points

1. All his developer errors are not best practices.

2. Writing domain logic is sooo much easier than writing the plumbing/integration logic that comprises most enterprise development. One of the hardest things in software is defining the right abstractions and names. But in domain logic 80% of those names and abstractions have already been created.

Oh what am I gonna call this thing my company owns that generates profit and losses for us and generally resides at a single location. Maybe I'll call it a "Store". Versus what do I call this thing that decides whether we pull information from bing or google based on complicated rules around performance, cost, and time of day. BingGoogleApiDecider?


They're talking about a different kind of best practices, but I highly recommend taking a look at the Core Infrastructure Initiative's Best Practices Project [0], which was created partially in response to the Heartbleed disaster. It's a list of 66 practices that all open source software, including scientific software, should be following.

[0] https://github.com/linuxfoundation/cii-best-practices-badge/...

(Disclosure: I'm a co-founder of the project. It's completely free and open source, and the online BadgeApp itself earns the best practices badge.)


Both sides of this argument are correct because both sets of practices are used for different purposes.

A mid-sized or large software project (say 100k+ LOC) with single letter variables all over, global variables, etc. would be an absolute maintenance nightmare. So the software engineering perspective is correct there. And in large projects it really is helpful to split projects up into multiple directories, use higher level abstractions, etc.

At the same time, most scientific code bases are not in that category. They don't have dozens (or hundreds) of people working on them, they're not going to be expanded much beyond their original use case, and they're mostly used by the people writing the code and/or a small group around those people.


This is a debate I engage in often. You can write "prototype" code to solve an "algorithmic" or "scientific" problem and it can be sloppy, but if you are planning on integrating it into a large project your team will run into problems unless the code is extremely contained.

It's true that there is a growing rebellion against best practices and design patterns, and I think in many cases some practices are dogmatic. However, the part that disturbs me is that inexperienced programmers are using it as an excuse to not apply basic principles they don't understand in the first place.

I've seen experienced software engineers that are lazy and spend more time criticizing the work of others than actually producing anything themselves, and I've seen novices that have poor fundamentals but grind for weeks to solve difficult "scientific" problems albeit with horrendous code that proves to be not maintainable in the long run. I find that in the latter case (I'll call them "grinders"), the programmer takes much longer to solve their problem because they have such limited coding experience (I've been asked many times to help debug trivial problems that result from not understanding basic concepts like how recursion works).

The author of this article does a good job at identifying the characteristics of this low quality "scientific" code, especially that it uses a lot of globals, bugs from parallelism, and has other bugs and crashes that are not understood. The author seems to insinuate that testing is the way to mitigate the bugs and crashes, this is partially true but it's better to write code you understand in the first place instead of relying on testing to fix everything so you don't continually introduce new bugs.

Grinders can benefit from understanding best practices and learning programming and computer science fundamentals. That way they can make their code more robust, code faster, and truly understand when they should and shouldn't apply a best practice. Software engineers can improve by matching the work ethic of the grinders and explaining where the grinders are making mistakes.


"try rather hard to keep things boringly simple"

Good engineering does mean keeping things boringly simple. You should only make things complex to hit a performance target, match complex requirements, or avoid greater complexity somewhere else.

Some types of complexity are subjective. If you need to parse something, bison/yacc is often a great choice; but for a simple grammar I could see how someone who doesn't know it could say it introduces needless complexity.

Programming is writing, and like all writing, you are communicating with some audience (in the case of software, it's other developers). If you lose track of who you are writing for, you'll not succeed.


This is a mess of an essay and does little to persuade me that allowing domain experts to have free reign to make software messes is in any way a good idea.

One of the criticisms applied to software engineers -- the one about bad abstractions like "DriverController" and "ControllerManager" etc. -- is a huge pet peeve of mine because it's basically a manifestation of Conway's Law [0]. It indicates that the communication channels of the organization are problematically ill-suited for the type of system that is needed. The organization won't be able to design it right because it is constrained by its own internal communication hierarchy, and so everyone is thinking in terms of "Handlers" and "Managers" and pieces of code literally end up becoming reflections of the specific humans and committees to which certain deliverables are due for judgement. This is not a problem regarding best practices at all -- it's a sociological problem with the way companies manage developers.

Domain specific programmers aren't immune to this either. You'll get things like "ModelFactory" and "FactoryManager" and "EquationObject" or "OptimizerHandler" or whatever. It's precisely the same problem, except that the manager sitting above the domain-specific programmers is some diehard quadratic programming PhD from the 70s who made a name by solving some crazy finite element physics problem using solely FORTRAN or pure C, and so that defines the communication hierarchy that the domain scientists are embedded in, and hence defines the possible design space their minds can gravitate towards.

There is definitely a risk on the software development side of over-engineering -- I think this is what the essay is getting at with the cheeky comments about too much abstraction or too much tricky run-time dispatching or dynamic behavior. But this is part of the learning path for crafting good code. You go through a period when everything you do balloons in scope because you are a sweaty hot mess of stereotyped design ideas, and then slowly you learn how only one or two things are needed at a time, how it's just as much about what to leave out as what to put in. The domain programmers who are given free reign to be terrible and are never made to wear the programming equivalent of orthopedic shoes to fix their bad patterns will never go through that phase and never get any better.

[0] < https://en.wikipedia.org/wiki/Conway%27s_law >


This is funny because the author is exactly right, but I think he's misidentified the poor coders. The folks he's complaining about are academic coders without a lot of commercial experience, which tend to make all of those errors.

He also nails it when he says "idleness is the source of much trouble"

In the commercial world, you code to do something, not just to code (hopefully). So you get in there, get it done right, then all go out for a beer. You don't sit around wondering if there's some cool CS construct that might be fun to try out here (At least hopefully not!) Clever code is dangerous code.

Good essay.


I've seen tons of over-engineered code in the real world. People with the title of 'architect' abstracting every last bit of code so that it's impossible to make sense of. Everyone starts off wanting to make a 'powerful framework' that can do everything, but end up with an over complicated mess of configuration that makes it too difficult to do anything with it.

I've seen this happen multiple times at multiple companies.


What can I say. I was trying to be kind.

It's a noob error -- and the worst part? People make these mistakes, create huge messes, then go on to other companies and do it all over again.

The general point is this: imperative, OO code has a known set of anti-patterns which good practitioners become familiar with and avoid. If you're fresh out of school, making these mistakes are just the way you learn the ropes. Using those folks as an example was much kinder than complaining about the lousy status of some programmer/architects in general. At least the college kids have a good excuse :)

ADD: I'll say this a different way. Yes, I've seen tons of it in the wild. I do not consider these people to be professional programmers. I consider them folks who got technically stuck at some point in their education.


I saw a really good / fun post about writing a Hello World app by years of experience.

The first example was

print "Hello world!".

It gradually started adding functions, inheritance and other features with each year of experience.Then after ten years it went back to print "Hello World".

Thats been the reality for me. Understand the complex features of a language, but more importantly learn when they are appropriate.



Almost. I have tried to find it a few times but never been able to since I first stumbled upon it. The funniest part was the end result (the most experienced programmer) being exactly the same as the first - which this version is missing.



Yes, thats the one!


Actually it's not exactly the same: he added a README.


This truly cannot be overstated. These 'architects' have it in their mind that they can create an app framework that can handle _any_ change that comes along and they sell the umpteenth rewrite of the software to management based on this fallacy.


In the commercial world, specially at enterprise level, coders are only allowed to follow beautiful designs of UML layers created by business analysts, solution architects that haven't written a single line of code in the last 15 years.


Do places like that actually still exist?


Yes, but now the business analysts and architects call themselves scrum masters and product owners.


Yes, lots of them specially at DAX level in projects with offshoring components.


> the products of my misguided cleverness.

To me this is the take home. For a long time I would try to find clever solutions to problems, or just try to be clever in general, and it is not just other people but your own future self that has to deal with it. This also applies to other parts of academic life as well such as grant writing. Code is also about communication with other people and if you are clever then you had bettered be able to explain your cleverness in a way others can understand. KISS.


It's because the programmers aren't involed in the science being undertaken. They're put in a position where they are just programming for programming's sake


Might have been true a decade ago. When simulations performed on a laptop in Matlab were enough for dissertation quality research. But data set size has exploded. If you are currently in school, learn how to move your research to the cloud, and learn some best cloud practices. Best prep for the future to come. And if you decide to leave academia you can possibly nab an interview at Netflix ;)


Flaws uncovered in the software researchers use to analyze fM.R.I. data: https://news.ycombinator.com/item?id=12378791

Wonder whether or not the software followed "best practices"...


Not sure how this articles brings constructive critique... Comparing the hardly avoidable issues brought by specific scope and priorities of scientific work vs dumb "bad practices" has little value to me...


programming is subject to fashions just like everything else.

every few years something comes along and eventually gets recognition, then a following and then it becomes the 'one true way of doing things' and those that don't do it are mocked as out-of-date, old fashioned or clueless.

and then in another bunch of years, after the 'one true way' has been applied everywhere it shouldn't be, people point of the flaws and the cycle starts again with a new thing.

oo, patterns, corba/com, factoryfactoryfactory.

I'm personally waiting for Agile to finish it's run


> so they have too much time on their hands, which they use to dwell on "API design" and thus monstrosities are born.

In free time, they mostly go for refactoring the code, don't they?


Adding unnecessary abstractions is refactoring.


As someone who feels like I always complain about quality, I feel like I don't know how to actually write quality code. All code eventually turns into a nightmare. A lot of the code I see by coworkers and myself is super hacky. I really wonder if we're all just terrible programmers or if that's the natural evolution of code.

Apart from having a mentor, what are the best ways to learn about code quality l? Books to read for example that I can then use to look at my own code and fix it? I really have no idea when making decisions what ends up being the best over the long run.


Well, yeah, because Software Engineers are trained for building large projects and those "best practices" are aimed at exactly that, too.

Long functions, bad names, accesses all over the place and using complex libraries, those are errors which are acceptable at a small scale, but become horrendous when you build a larger project.

Many abstraction layers and a detailed folder structure, those might add a lot of complexity in the beginning, but there's not much worse than having to restructure your entire project at a later date.


This person has obviously never worked on a project of any scale. See where your ad-hoc practices get you when you have millions of LOC.

Can we all agree that there is good code and bad code and the difference between the two is often contextual, then move on. Geez.


I sometimes give a talk to startup companies, in which I tell them why their code should be horrible. It's an intentionally provocative thing to say, but there is reasoning behind it, and some of the same reasoning applies to a lot of scientific code. The linked article has a few comments that tangentially touch on my reasoning, but none that really spell it out. So here goes...

Software development is about building software. Software engineering is about building software with respect to cost. Different solutions can be more or less expensive, and it's the engineer's job to figure out which solution is the least expensive for the given situation. The situation includes many things: available materials and tools, available personnel and deadlines, the nature and details of the problem, etc. But the situation also includes the anticipated duration of the solution. In other words, how long will this particular solution be solving this particular problem? This is called the "expected service lifetime".

Generally speaking, with relatively long expected service lifetimes for software, best practices are more important, because the expected number of times a given segment of code will be modified increases. Putting effort into maintainability has a positive ROI. On the other hand, with relatively short expected service lifetimes for software, functionality trumps best practices, because existing code will be revisited less frequently.

Think of the extremes. Consider a program that will be run only once before being discarded. Would we care more that it has no violations, or would we care more that it has no defects? (Hint: defects.) That concern flips at some point for long-lived software projects. Each bug becomes less of a priority; yes, each one has a cost (weighted by frequency and effect), but a code segment with poor maintainability is more costly over the long term, since that code is responsible for the cumulative costs due to all potential bugs (weighted by probability) that will be introduced over the lifetime of the project due to that poor code.

So, short expected service lifetimes for software, prioritize correct behavior over maintainability; long expected service lifetimes for software, prioritize maintainability over correct behavior. The source code written by a brand-new company will be around for six months (maybe) before it gets factored away, or torn out and rewritten. During that time, less-experienced coders will be getting to know new technologies with foreign best practices, and those best practices will be violated frequently but unknowingly. Attempting to learn and retroactively apply best practices for code that will likely last a short period of time is simply more expensive (on average) than just making things work. The same applies to scientific code, which gets run for a graduate degree or two before being discarded. If the code wasn't horrible, I'd think that effort was being expended in the wrong places.

In my experience, most "fights" about best practices (whether a technique should be considered a best practice, or whether a best practice should be applied) usually boil down to people who have different expected service lifetimes in mind. (One of those people is probably considering an expected service lifetime of infinity.)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: