
Laws of Performant Software - sidcool
http://tagide.com/blog/advice/laws-of-peformant-software/
======
throwawayReply
> a cache can be added in 10 LOCs

Yes, and then 10 months debugging edge cases where communicating parts are
looking at different versions of the "same" data.

Caching is really important, but caching (and cache-invalidation) is really
difficult, adding caching to an application that doesn't use caching is not
"10 LOC and done".

~~~
huhtenberg
> caching (and cache-invalidation) is really difficult

That's an urban legend.

Needlessly complicated or over-abstracted general-purpose caching frameworks
are difficult, but your dumbest imaginable linear LRU fast lookup is both
_exceptionally_ useful and can indeed be done in 10 LoC in a lot of cases.

~~~
throwawayReply
It's not the algorithm that's difficult, it's the effect of adding caching to
a system that wasn't built with caching in place at the start that is
difficult.

This isn't an "urban legend" it's first hand experience working with companies
trying to add caching. No, those companies aren't even trying to write caching
algorithms, they're just bundling in a caching layer and hoping that the
system behaves in the same way.

It only takes somewhere which writes data (perhaps in a way that bypasses the
caching layer so the cache doesn't know it has changed) and re-reads it back
quickly for software which used to work suddenly breaks.

Now you might look at that and go "omg refactor it! That's horrible code, that
should never ship" etc, but not everywhere is the s.v. bubble with endless
amounts of the best developers to throw at problems. Code which worked and
solved a business problem ended up shipping, possibly without testers and
probably without code reviews.

So adding a caching layer suddenly "breaks" those reports, now who's going to
have to fix it, not the person who wrote those reports even if the very
behavior of side-effected data changes and db re-reads is precisely a cause of
data layer slowness that led to wanting to implement caching...

~~~
huhtenberg
Nobody's saying that one can just throw in a caching layer, touching nothing
else and it will just magically work. There's obviously some thought and due
consideration required, but it is NOT "really difficult". And in a lot of
cases it _is_ in fact as simple as adding a handful lines of code.

PS. It is an urban legend, because "cache invalidation is hard" gets repeated
a lot, initially as a joke, but it doesn't preclude people who aren't familiar
with the subject from taking it as a fact and then repeating it as such.

Voice recognition from scratch is hard. Some lock-free data structures are
hard. Caching is not hard. It's knowing what the heck you are actually doing
and doing it well is what's hard. By the same measure, C macros would be hard,
because some idiot can do _#define true false_ and everyone else will spend
the same 10 months trying to understand why the hell things break now and
then. Caching is hard is when someone starts messing with other people' code
without fully understanding it. But then anything is "hard" under these
circumstances.

~~~
falcolas
The difficulty in implementing voice recognition and lock free structures does
not make caching easy. Yes, you can implement caching in 10 lines of code.
Anybody can do that - it's the writing the correct 10 lines of code which is
very hard.

Sure, if you're implementing Fibonacci, memoization is simple. But if you're
trying to memoize the results out of a database, things are going to be a lot
more complicated, really fast.

It requires knowledge of what can go wrong, what will go wrong, the use cases
associated with a bit of data to be cached, how the application will be
deployed, what other caching will be implemented across the system, and a
dozen other bits of information unique to each use-case.

Knowing what questions to ask, and of whom to ask them, requires experience.

------
critium
While this article is itself worth a read, remember the programming mantra:

    
    
      * Make it Work
      * Make it Right
      * Make it Fast
    

And this is the last step. While this does _not_ apply to all projects but it
does apply to the majority.

~~~
brianwawok
This is really an overly simplistic way to code. I urge people to think deeper
up front about performance.

Know your performance goals going in and code accordingly. If you require 100
micro average latency and you coded in Node.js, step 3 will be a rewrite.

Every single line of code I write, I can tell you my performance goals. If
indeed it is a simple crud screen by a user, the goal may be "meh,
document.ready called when viewed from 100ms browser lag within 1 second".
Backend trading code would have different goals..

~~~
critium
My posit was that 80% of the software an engineer will write will not require
optimization and based on the responses i've been getting I should have used
"rule of thumb" vs mantra. My intention was to warn the ones that read this
and say, oh i gotta do all of this for every piece of software that I write,
which will undoubtedly lead to overly complex code when a simple solution
would have worked just as well.

    
    
      "Know your performance goals going in and code accordingly."
    

I this is key sentence here and its worth repeating. Know your performance
goals before your fingers touch the keyboard.

~~~
brianwawok
> My intention was to warn the ones that read this and say, oh i gotta do all
> of this for every piece of software that I write, which will undoubtedly
> lead to overly complex code when a simple solution would have worked just as
> well.

No that is not true at all.

I am writing a crud app to be used by 1 person, some manager of a widget
factory. I say "my perf goal is to have page loads in 10 seconds or less". How
will that make my code more complex? If anything it will make my code MORE
simple, as I can relax all kinds of constraints like "making 72 database
queries per page is generally bad".

~~~
breischl
I believe the GP's point was that if you know that your performance goals are
very relaxed, then you can make the code appropriately simple from the
beginning. Conversely, if your performance goals are stringent, then you can
take an appropriate approach from the beginning.

------
adrianratnapala
What is the difference between "performant" and "fast"?

If we were talking about cars, then I would say performance includes not only
speed but all kinds of things to do with handling. But in computing it always
seems to mean "speed"[1], and "performant" always seems to be a bizarre
neologism for "fast".

I think people invented it because speed is a many faceted thing. There is
latency, there is thoughput, there is user-visible responsiveness, all at
multiple interacting levels. But these complexities and vaguenesses apply
equally to the quasi-word "performant".

[1] Actually the main article is a partial exception as she inconsistently
includes stability within "performance". In one sentence she says
_"...performance degradation, including crashes, and the unbounded use of
resources."_ But later she says _" Code that doesn’t perform, or that
crashes."_

~~~
TickleSteve
fast == absolute.

performant == relative (efficient on the hardware available).

e.g. an algorithm may be performant on a little Cortex-M but certainly is not
fast compared to the same code on an I7.

~~~
adrianratnapala
The problem with this definition is that it isn't true.

17km/hour is an absolute speed. It is fast for a runner, OK for a cyclist and
slow for a car.

"Fast" is relative.

~~~
TickleSteve
I agree in that context....

but in the context the original sentence was used, the statement stands.

English is not the most precise language....

------
YZF
\- Programming language does matter. You will not be able to get the same
performance in any programming language you choose.

\- Abstractions are the enemy of performance. You can't get high performance
through many layers of abstraction. This relates to my previous point.

\- Algorithmic complexity matters when dealing with large data sets.
Abstractions can hide the true complexity. (e.g. the famous string append
example from IIRC Joel Spolsky).

So the key to high performance software is:

\- As close as possible to the hardware.

\- The right data structures/layout (taking into account the hardware).

\- The right algorithms.

\- Measure properly and optimize.

~~~
sshumaker
I think your point around abstractions could be more nuanced. The wrong
abstractions can be deleterious to performance, thoughtfully chosen ones can
actually be beneficial. Every programming language is several layers of
abstraction above the hardware it runs on, which gives the compiler
opportunities for optimization, or even the superscalar CPU opportunities for
parallelization / reordering. And so on.

~~~
YZF
I've yet to see a compiler that can beat a human hand-optimizing. It's true
that for zero effort the compiler can usually do better but if you _really_
care about performance you need to get around those abstractions. I worked on
a wavelet video decoder, after a while we had a pretty optimized C code that
used SSE intrinsics for large portions of the decode. Some clever guy spent 3
months and rewrote the entire decoder in assembly. It involved careful data
layout and optimizing the instruction sequences. It ran more than twice as
fast than our hand optimized C code. Same algorithm, input and output.

------
overgard
Rule #1 is bogus. If you're starting with a language that's 2-10x slower
you're always going to be behind the curve. There's a reason C and fortran
haven't gone anywhere.

~~~
squeaky-clean
> If you're starting with a language that's 2-10x slower you're always going
> to be behind the curve.

I got involved in a huge discussion about optimizing a web app recently.
Things that were learned from the conversation?

1) It turns out moving more frequent cases to the top of an 'if-else' chain in
JavaScript offers a greater speedup than I expected.

2) It doesn't matter if you shave 4ms off a request by optimizing your if-else
chain if a little later in your code you make 27,000 database queries when you
could have made 2.

Knowing how to write a performant app will always be better than writing
sloppy code in the fastest means possible.

~~~
overgard
That's kind of besides the point though, yes your bottleneck may not be CPU,
but if it is, then language matters a lot

~~~
squeaky-clean
It's exactly the point. If you don't know what makes an app performant or not,
moving to C++ is not going to magically make it any better if you're doing
brute force operations everywhere.

> yes your bottleneck may not be CPU, but if it is, then language matters a
> lot

Even if your bottleneck is the CPU, it does not mean it's the programming
language. A quicksort in python is faster than a "while not sorted randomly
shuffle this list" in C++.

Sure, moving to another language may be faster. But _that_ is beside the
point. The article even says "The programming language doesn’t matter as much
as the programmers’ awareness about the implementation of that language and
its libraries."

So this whole discussion is under the assumption that a developer does not
know what it takes to make an app fast. If you're using arrays in operations
where you have to insert into the middle a lot, and never have to iterate over
all of it sequentially, you probably should be using something like a linked
list. Choosing the proper data structure (and learning why) should take
priority over the faster language.

------
foxhop
Performant is not an adjective, it is a noun. "One who performs". Software can
perform poor or well. Performant does not mean it performs well.

[http://weblogs.asp.net/jongalloway/performant-isn-t-a-
word](http://weblogs.asp.net/jongalloway/performant-isn-t-a-word)

This matters. Software can perform well at its job (not crash, get the correct
answer) but may not perform efficiently (takes a long time, has unbounded
resource use, uses a brute force pattern).

~~~
ulber
If you subscribe to descriptive linguistics at all (which you must if you
believe languages can naturally change) a quick google search reveals that
performant is actually very much an adjective (i.e. it is commonly used as one
to the point that one would be considered obtuse to refuse to recognize it as
such).

To be fair one can also find a fair number of discussions about whether
performant is a word in that google search, so "performant as an adjective" is
clearly a newish use of the word.

Edit: after reading your reference I can comment that journal/book editors
mostly do not subscribe to descriptive linguistics and it is probably good
that they do not (to not take chances with parts of language that might turn
out to be fads).

~~~
gsnedders
There are plenty of published journals going back decades that accept
performant as a term of art.

------
daemonk
From a data analysis perspective (which seems like some of her rules are
related to), my number one rule that I always try to follow is to: work with
the data, not with the tools. It can be really easy to go down the rabbit hole
of messing about with the perfect tool for your analysis and end up with no
results.

Optimization is important so far as you need it. If I can launch a bunch of
aws instances to get a single-run job done in an hour, then I'll throw
hardware at the problem instead of worrying about my code. I care about the
analysis results, not necessarily how performant my method is.

If I'll need to run the analysis multiple times or I plan to publish it as a
tool, then thats another story.

~~~
gwbas1c
A common recurring theme in my career are horribly-performing applications
because the original programmers worked with an ORM instead of the database.

Database code isn't hard! An HBM file is just as complicated as handling a
data reader! Stored procedures (or in-line SQL) are simpler in the long run!

~~~
ansgri
I also don't get the ORM stuff: typically you work either (1) with lots of
different objects organized in some kind of document (in NoSQL sense), and you
want all these objects to be predictable, free from side effects and clearly
serializable to e.g. JSON, or (2) with bulk data which you transform and
aggregate to come with a small collection of fully constructed (also,
denormalized) objects, and for this purpose SQL is beautiful. Or else they
wouldn't've invented Linq to Collections.

------
tofflos
I believe there should be a stronger distinction between decent performance
and high performance. Most frameworks coupled with idiomatic code will give
you decent performance.

I find controlled experiments / microbenchmarks to be a useful method for
finding the initial bottlenecks of a system. Once I know the initial
bottlenecks I can make an informed descision on whether that performance is
sufficient.

Controlled experiments / microbenchmarks are also essential in establishing a
ballpark number for the theoretical maximum performance of your system. From
that point on, the performance is yours to lose.

------
catnaroek
> It is critical that people question “how does this magic actually work?”

No. It's critical that people question “why do I have to rely on magic?” in
the first place. Even in high-level languages, perhaps _especially_ in high-
level languages, it's a good idea to write straightforward code.

> there is an appalling lack of candy in the C/C++ ecosystem (...) performance
> isn’t hidden

C and C++ are very different languages, and there is no shortage of C++
libraries full of incomprehensible magic. When debugging template-heavy C++
code, it can be very hard to determine where expensive and unnecessary object
copies are created. Thank you C++, for making copy construction implicit!

> Are you producing too much garbage unnecessarily?

The high-level programmer's best defense against creating too much garbage is
programming with values (whose redundant representations in memory can be
automatically deduplicated by the runtime system) rather than object
identities.

> Are your dictionaries too big to the point of being inefficient?

The problem isn't the dictionary abstraction, but rather the implementor's
choice of underlying data structure. If you have a really big dictionary whose
keys are strings (a common use case), you want tries, not hashtables.

> string concats can be replaced by string builders in the same amount of
> lines,

Using string builders is a low-level chore, and defeats the point to using a
high-level language. A better alternative is using a persistent list/string
data structure that actually supports efficient concatenation.

> Does your program start ad-hoc threads? Use a threadpool with fixed size.

Again, too low-level. A programmer working with a high-level language should
be able to spawn as many green threads as she wants to, and let the language's
runtime system handle multiplexing those green threads over OS threads.

> Unless you are 100% sure the lines are always of reasonable size, do not use
> readline.

This is terrible advice. If readline is causing you problems, you are using
the wrong string data structure.

\---

Okay, I'm done bashing what could be bashed. The last two items in the OP are
actually good.

~~~
perfmode
> Again, too low-level. A programmer working with a high-level language should
> be able to spawn as many green threads as she wants to, and let the
> language's runtime system handle multiplexing those green threads over OS
> threads.

Do this in Go and you'll be surprised how quickly you run out of memory.

~~~
catnaroek
Sounds like a reason not to use Go as a high-level concurrent language.

~~~
perfmode
Sounds like infinite threads is wishful thinking.

~~~
catnaroek
You understand the difference between “as many green threads as the programmer
wants” and “infinitely many threads”, right?

~~~
perfmode
Yes. Not finite. Unbounded. Finite == bounded == pool.

~~~
catnaroek
Unbounded isn't the same as infinite. For example, if you have a fair coin,
the number of times you have to flip it until you get 5 tails is unbounded but
finite.

The problem with creating infinitely many threads isn't even space. Even if
you had an infinite amount of memory, programs are supposed to complete their
tasks in a finite amount of time, so you shouldn't spawn infinitely many
threads, because it's an unreasonable thing to want. On the other hand,
spawning 100k green threads is a perfectly reasonable thing to want. It's the
language runtime's job to multiplex these 100k green threads over 4 or 8 or
however many OS threads make sense.

------
marknadal
Glad this submission is being upvoted but I think there is room for more
practical and direct advice. Here are some resources for JS:

\-
[https://github.com/petkaantonov/bluebird/wiki/Optimization-k...](https://github.com/petkaantonov/bluebird/wiki/Optimization-
killers)

\- [https://github.com/amark/gun/wiki/100000-ops-sec-in-
IE6-on-2...](https://github.com/amark/gun/wiki/100000-ops-sec-in-IE6-on-2GB-
Atom-CPU)

\- [http://danieltao.com/lazy.js/](http://danieltao.com/lazy.js/)

------
sillysaurus3
_Do threads pass data among themselves? Use blocking queues whose capacity is
a function of the max amount of data that can potentially be waiting without
exhausting the memory._

I.e. a ringbuffer. The Disruptor is an efficient and simple way to coordinate
this. [https://lmax-exchange.github.io/disruptor/](https://lmax-
exchange.github.io/disruptor/)

~~~
brianwawok
Disruptor is nonblocking. I think the OP is talking about actual blocking
queues which are generally bad in high performance code.

~~~
sillysaurus3
Disruptor blocks when the ringbuffer fills up.

~~~
brianwawok
Touchè. Most people size it large enough to never block on insert, so it is
usually considered a non-blocking queue. Reads never block for example.

------
MyNameIsFred
While I agree with the author and like the article, I suspect that it would
only make sense to people who already accept these things as true.

~~~
hackits
If you chase two rabbits you will not catch either one.

Little bit of background here before I begin. I did my comp-sci and
mathematics degree and even though they have been useful in some degree or
another they both were mostly a complete waste of time in my experience. For
the vast majority of the clients and projects that I've had since finishing my
university they've all revolved around fixing framework problems or bugs
within large code bases. Just today I fixed a massive Land Titling (Enterprise
Java/Oracle) bug where the original underlying framework would leak
connections/memory until it fell over after 3-4 hours of usage. Other bugs
such as off by 1 problems, data translation, data encoding. The vast majority
of the time its been incorrectly designed frameworks or replacing frameworks
with another framework.

For most of the work out there (I would guess 90% of it) its maintaining and
supporting clients to achieve their business goals. Completely un-interesting
stuff but get a good reputation of getting stuff done and they don't even wink
at your asking price.

------
gens
gens's* law of writing software that performs well: Don't be smart.

Do simple data structures where you can. Arrays, AoA, SoA, AoS. Process data
in bulk where you can. Put complex algorithms only where needed (kd-tree when
doing something like raytracing, threading only when necessary and even then
only with bulk processing, complex locking or timing mechanisms only when you
have to, etc.). Don't follow paradigms blindly.

Basically data should be processed only in a functional way or in "waves".
When there are only a handful of variables, a functional way will keep them in
cpu registers. When there is more data, going over an array will make the cpu
load the data that is to come next into the cache.

Oh, and write C.

[*] gens is a hobby programmer and possibly an idiot

------
kod
Isnt item 4 basically encouraging microbenchmarks? Microbenchmarks can be
misleading, and there's no guarantee you're benchmarking the important things.
It mentions profilers in passing, but i find it more productive to start with
a profiler.

------
cs702
I would add another law: Code organized in small, easy-to-read chunks >> code
organized in long, dense chunks.

~~~
xixi77
I would even say code organized in small, easy-to-read chunks >> code
organized in long, dense chunks >> code organized in lots of tiny, impossible
to read (because calls go back and forth all over the place) chunks :)

~~~
cs702
OMG, Yes!

------
syngrog66
related to this you may want to check out my cheatsheet PDF on software
performance & scalability:

[http://synisma.neocities.org/perf_scale_cheatsheet.pdf](http://synisma.neocities.org/perf_scale_cheatsheet.pdf)

welcome feedback and ideas for things to add or change

------
signa11
but there is no such word as 'performant' ! wiktionary doesn't count :) note
that efficiency and performance are _not_ the same thing.

improving efficiency implies doing less work for the same end-goal. thus, when
a program is 'efficient' then it is doing the minimum amount of work that the
computation demands. or in other words, we have the best algorithm around for
some kind of complexity argument for the task at hand. an algorithm which is
efficient is not wasting anything.

performance on the other hand, implies, how quickly the work that is to be
done is actually done. basically performance improvement would allow you to do
the same amount of work faster (in time).

is the program at any given point in time (during its execution) doing maximal
speed of work ? that doesn't seem to make much sense.

infact, in _practical_ terms, what is the 'maximal speed' of work ?
<theoretical constructs like Bremermann's Limit doesn't count :)>

programs can perform 'well enough', but that doesn't mean we are all done with
it...

~~~
wodenokoto
Performance us actually much,much broader than the speed at which a task is
completed.

The performance of a classifier can be anything from how fast it runs, to its
F1 classification score.

~~~
signa11
> The performance of a classifier can be anything from how fast it runs, to
> its F1 classification score.

but classification performance would be the efficiency of a classifier
algorithm right ? i.e. can i get better classifications / generalizations etc.
in lesser time.

~~~
wodenokoto
Yes and no. It can be better in less time, it can also be just better or just
faster (but worse) all depending on what is important.

Classifier A gets F1 score of 0.93 on test set has a lower performance than
classifier B, which gets an F1 score of 0.94, regardless of time.

However, you can also say that classifier C, which only achieves an F1 score
of 0.929 has better performance, since it is four times as fast.

------
rahilb
Forget _laws_, I just want to know how the dlang forum is so fast.

