
The Fallacy of Premature Optimization (2009) - valand
https://ubiquity.acm.org/article.cfm?id=1513451
======
mewpmewp2
People seem to always talk in absolutes, black and whites, concerning this
topic.

In my experience most engineers take a situational approach. Sometimes it is
worth it to optimize early, sometimes it is not. Some things are worth
optimizing, some not. Also some things are worth optimizing to certain level.

These types of articles and claims conjure up a debate because everyone is
imagining a different scenario in their heads. It is entirely plausible that
each claim could be the best solution in a different scenario.

Do these articles assume that engineers are incapable of thinking flexibly and
need to follow some absolute truths that have to be debated? Have I been
around too good engineers that I have not noticed this issue?

~~~
tmcb
I read somewhere else yesterday that “it’s easier to throw hardware at a
problem than people.” It certainly does not hold true for all cases, but
turning this aphorism into a question seems to give us some good heuristics---
provided you do know the what the root cause of your performance issues could
be, of course.

~~~
Laakeri
It's easy to throw hardware at a problem that can be parallelized efficiently.
It's very hard to throw hardware at inherently sequential solution.

~~~
TeMPOraL
It _was_ easy to throw hardware at a sequential problem, a decade ago. CPUs
were still improving their single-core performance year after year, and memory
access and all kinds of IO devices kept getting faster and better. I'm
guessing that time was where this idea originates from.

Today, things are different. Single-threaded performance isn't going to be
improving much in foreseeable future; the effort shifted to improving parallel
performance and, more recently, power consumption. So if you write slow code -
perhaps by choosing a slow software stack - your code will _remain_ slow.

------
chrisweekly
One thing I haven't seen in the comments yet is the simple observation that
premature optimization is, by definition, a mistake -- otherwise it'd merely
be "optimization". The real question is, what makes it "premature"?

I think Knuth (in his famous "... root of all evil" quote) was referring
specifically to programming, and to spending time on optimizations prior to
functional completion. The mantra, "First make it. Then make it work. Then
make it better." [or similar] holds water. But it's pointless to debate in
general terms what precisely constitutes a "premature" optimization, vs a
merely timely one, or adherence to best practices, given the infinite
combinations of circumstance and context relating to software development
projects.

~~~
username90
> "First make it. Then make it work. Then make it better."

That only works when you do simple things, for harder things the "make it
better" part usually requires a complete rewrite if you didn't properly think
things through from the beginning.

A better mantra would be "First make it, then make it again, then make the
real version".

~~~
yahwrong
No. Engineering is an iterative process. Even if you're 're-writing'
everything, you're still not starting again from scratch, rather you're
building off your previous approaches.

------
candiodari
So recently John Carmack posted:

[https://twitter.com/ID_AA_Carmack/status/1210997702152069120](https://twitter.com/ID_AA_Carmack/status/1210997702152069120)

"My formative memory of Python was when the Quake Live team used it for the
back end work, and we wound up having serious performance problems with a few
million users. My bias is that a lot (not all!) of complex “scalable” systems
can be done with a simple, single C++ server."

And so a lot of people seem to feel the need to once again bring back all the
discussion in absolutes of optimisation being useless, or the only thing that
matters.

I've found the same as John Carmack btw. Taking some component of a customer
system, translating it to C++, can make a system that needed 10+ servers
suddenly run a lot faster on a single server (meaning higher throughput AND
lower latency, because in addition to the raw speed advantage of C++ there's
so much you can do in C++ that's not really feasible in, say Python. For
example, mmapping files).

But C++ is not exactly the first thing I reach for when something new springs
to mind, or I want some data analysis done, or ... (even though C++ is
excellent for running production data analysis jobs)

~~~
andy_ppp
Python is probably the wrong tool for this though, that doesn’t make it bad.
Writing this in erlang, elixir or scala or even Golang would be a better
choice. That’s not premature optimisation, that’s understanding what problems
are requirements up front. Choosing most of these when you have zero users is
probably wrong; python might be better as with its libraries it could get you
to market faster!

------
dana321
Premature Optimization can hold back the release of software, and make it more
complicated to debug.

.

Say you already have a function that returns the list of children in a node..

.

But you need a function that brings back only the first child if it exists.

.

1\. Do you copy the function and return just the first item when you have it
(with the associated code around the function) for efficiency sake.

2\. Or.. Call the existing function and return the first item if there is at
least one element?

.

1 Will be the best answer if you have a million records to load from disk.

2.Will be the best answer if you only have 20-30 records on average.

.

But 2 is the best answer before debugging, check that the code works properly
before duplicating it and modifying it.

Answer 3 would be to add a limit parameter, and call the other two functions
with it. That way, if limit=0, return all records. if limit=1, return one
record maximum.

.

Sometimes the answer is to think differently about the problem altogether.

~~~
unlinked_dll
To be pedantic, the answer is neither. You use an abstraction that supports
either answer with similar ease, provided by the language/framework/tools you
used.

You bring up a very real point, but the most obvious cases are also most
obvious to the people developing the stack below you and have almost certainly
gone through the trouble of solving those problems.

This issue rears its head when edge cases are non-obvious and typically
manifest through profiling, not through design meetings.

------
londons_explore
When starting a project, I prefer to have a "performance budget'.

For example "The application must start in under one second".

Then during development you can divide up the budget - 400 milliseconds for
loading dependencies, 200 milliseconds for rendering the UI, etc.

Then you can decide if any given optimization is needed or necessary to meet
your goals.

~~~
chaboud
The problem with this way of thinking is that those budgets can lead to
pathological behaviors. Start time is a great example. Let's say you have 1
second to start, but everything you need to do takes 1.2 seconds.

Okay... We'll defer some work and show a dummy screen to be "started" in less
than 1 second, except now we're loading/drawing the dummy screen, so we're
usable in 1.4 seconds, and after a double-start flash, so let's smooth that
out with a transition to the interactive UI and we're usable in 2.2 seconds,
but we "started" the app in 0.4.

Yay?

It may sound contrived, but I've seen this very response to strict start KPI's
before. People end up optimizing the micro-goal (KPI) rather than the macro-
goal (better experience).

The flip side problem is, when you have multiple people responsible for parts
of the work, people will stop optimizing when they hit their budgeted
allotment, rather than working to better the whole. A Dev with a 400ms budget
might sleep for 390ms to buy himself three years of squeezing out
optimizations...

More likely, he'll stop when he gets to 350ms, even though he could have
gotten to 250ms without too much effort.

~~~
thrower123
Reminds me of the apocryphal story of the game that was trying to get under
it's memory budget to be able to run on one of the early consoles, and after
scrimping and saving and crunching, they were still over. Until one of the
senior engineers commented out an unused buffer that allocated a couple kb
"just in case"

~~~
pieterr
This one?

[https://news.ycombinator.com/item?id=776296](https://news.ycombinator.com/item?id=776296)

~~~
thrower123
That's the one, although I've seen many different variations on the theme.

------
cjfd
How much optimization you need depends on the situation. The first question is
for how many concurrent users are you going to write it for. In most cases
O(n^2) is too much unless you know that in practice n is always going to be
less than 10. Even then you should be quite certain that n is never going to
become bigger, which you very often aren't. Removing constant factors, i.e.,
turning 3*n into n, should only be done if it is not difficult and is not bad
for code quality in most cases. But at some point you should start removing
these constant factors. Maybe if you need to handle more than 10000 requests a
second. These numbers are highly variable depending on the application, the
language and the use.

~~~
wruza
Upvoted you for the first half. Hell, if algo is cubic or exponential out of
pure simplicity, then as little as x3-10 input will bring it to its knees. It
is not worth revisiting it later with all surrounding costs, if that’s
inevitable anyway.

Optimization is premature only if the code is not already fantastically
stupid, which may happen too often irl to ignore that.

Couldn’t agree with constant factors though. These are pretty random,
n-independent (i.e. scalable) and the net expense of making code less obvious
is usually bigger than throwing more/better hardware at it, but ymmw.

~~~
samatman
This falls apart when you consider one common scenario:

We are loading a number of data files and the source of truth is remote. This
is O(n) the amount of data.

Two ways to do it: always request the remote data, or cache it, and only ask
the remote for changes.

These differ by a constant factor, and it's always a good idea to maintain a
local cache when it's possible. Otherwise there's a very good chance that
startup time will be dominated by network requests.

~~~
wruza
Yeah, it depends on how big that const is.

But this also falls into “fantastically stupid” category. Just like all web
2.0 e-stores I have to use, which rerequest their dataset every time you touch
sort or filter controls. When their largest category is 150kb json and the
entire site json is 10x smaller than their ui/ad frameworks.

------
ncmncm
This article is more current than when it was written.

We are deep into the Post-Moore's Law era, where a new generation gives, now,
120% rather than 200% of the previous generation's performance. Next
generation we might get 115% of this, or 110%.

Another consequence is that the improvements we get have become dodgier and
less reliable, so that tiny, irrelevant-looking changes in the code may mean a
2x speedup, but more commonly a 2x slowdown. (This is in no way an
exaggeration.) The bargain we get from pervasive penetration of more kinds of
caches is that sometimes our programs are faster, but we no longer know how
fast they should be, or whether another factor of two or ten has been left on
the table.

Sorting algorithms are quite mature now, so that a 20% improvement in a
relevant case is important, yet a factor of 2 or more may come from the
compiler choosing one instruction over another.

Compiler regression bug reports now routinely complain of a 2x performance
loss from that instruction choice, but fixing them would result in 2x losses
in some other set of programs, instead. We get new unportable compiler
intrinsics to patch the failure, that often don't, for obscure reasons.

~~~
BubRoss
Moore's law was always about transistor density increases, not performance,
and definitely not about serial performance in a single core, no matter how
much people want to reframe it. Transistor density is still improving, just
not as quickly and CPU speed is still increasing, just not on a single core.

~~~
ncmncm
Not so. Earlier generations came with higher, often doubled clock speeds. Many
programmers today have never seen such a doubling, yet policies still assume
them.

~~~
BubRoss
Clock speeds have nothing to do with Moore's law, which was about transistor
density. I'm not sure how you can say that a doubling in density hasn't
happened when there are "7nm" 64 core CPUs out there. Transistor density has
slowed but not stopped. Moore also predicted an exponential increase in price
for density improvements which has also seemed to happen.

~~~
ncmncm
Previously, die shrinks enabled faster clocking. It is a shame you missed
those days, but to deny they happened is just sad.

We did not go from 1MHz 6502s to 4GHz Pentiums without a generous series of
doublings. Twelve, in fact.

~~~
BubRoss
I get that you are being intentionally condescending and obtuse here, but you
were talking about 'the end of Moore's' law, which only talks about transistor
density increases in addition to cost, and both of those are still increasing.
Clock speeds, instructions per clock, cache, latency, prefetching, out of
order execution and many other aspects of CPU performance were not the trend
Gordon Moore outlined. You are conflating things that stem from transistor
density with Moore's Law.

~~~
ncmncm
While Moore's original, concise expression was in terms of transistor areal
density, the faster clock rate was implied and expected, just as lower latency
is implied by the faster clock. Thus, the stagnation of clock rates is rightly
recognized as the beginning of its end. Allocation of the newly available
transistors to caches, functional units, execution units, and ultimately extra
cores was also implied: _the transistors are not decorative_ : they are there
to be used.

With feature sizes approaching a single lattice unit cell, its final stage
will be reached shortly.

To insist that transistor count is the only point of Moore's Law is to be
deliberately obtuse: it was the exactly the extra value provided by the extra
transistors and the machinery built of them, and the faster clocks, that would
(and did) generate the capital investment needed to develop each succeeding,
more expensive, generation.

~~~
BubRoss
This is all rationalizing whatever you were trying to say about performance
stalling. People only talk about 'the intent' when they realize they have been
regurgitating news headlines and haven't looked any deeper. The point is that
performance hasn't stalled at all just because clock rates aren't going up.
Faster clocks are diminishing returns in performance due to memory latency.
Transistor density was the point and performance is still increasing due to
transistor density, even if you don't know how to use multiple cores.

~~~
ncmncm
It is an objective fact that performance increases are much, much smaller than
20 years ago. It is easy to see why, and why continuing (for now) shrinkage of
transistors is failing to deliver as much as before.

You are welcome to die on your "Moore's Law is about feature size and nothing
else" hill, but you will have sparse company there.

~~~
BubRoss
"Moore's law is the observation that the number of transistors in a dense
integrated circuit doubles about every two years. "

[https://en.m.wikipedia.org/wiki/Moore%27s_law](https://en.m.wikipedia.org/wiki/Moore%27s_law)

It doesn't matter how forcefully you repeat yourself without backing it up.

Also performance has increased with transistor density with more cores.

~~~
ncmncm
I see that you are convinced that Gordon Moore was an idiot who cared only for
counting transistors, and had no interest in how they are used.

I also see that you have confused "much smaller increase", which is observed
by everyone, with "no increase", which literally no one has said.

~~~
BubRoss
I quoted Moore's Law and linked you the actual history, you have some reading
to do.

~~~
ncmncm
I read it in the '80s. And understood it. You could, too, with some thought.
It's not too late.

~~~
BubRoss
Don't you think maybe you should take a step back when you repeat the same
things over and over, never back them up and have multiple people link you
different Wikipedia articles to correct you?

~~~
ncmncm
Now you are multiple people?

Anybody can read Wikipedia, but evidently not everybody can understand what it
says.

~~~
BubRoss
[https://news.ycombinator.com/item?id=21926724](https://news.ycombinator.com/item?id=21926724)

[https://en.m.wikipedia.org/wiki/Moore%27s_law](https://en.m.wikipedia.org/wiki/Moore%27s_law)

[https://en.m.wikipedia.org/wiki/Dennard_scaling](https://en.m.wikipedia.org/wiki/Dennard_scaling)

[https://en.m.wikipedia.org/wiki/Cognitive_dissonance](https://en.m.wikipedia.org/wiki/Cognitive_dissonance)

[https://en.m.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effec...](https://en.m.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect)

[https://en.m.wikipedia.org/wiki/Wishful_thinking](https://en.m.wikipedia.org/wiki/Wishful_thinking)

~~~
ncmncm
And anybody can link to Wikipedia, but very evidently need not understand what
it says.

~~~
BubRoss
[https://en.m.wikipedia.org/wiki/Proof_by_assertion](https://en.m.wikipedia.org/wiki/Proof_by_assertion)

~~~
ncmncm
Proof by Wikipedia link does not even make the list of fallacies.

2015, Gordon Moore: _" I see Moore's law dying here in the next decade or
so."_ <[http://spectrum.ieee.org/computing/hardware/gordon-moore-
the...](http://spectrum.ieee.org/computing/hardware/gordon-moore-the-man-
whose-name-means-progress>)

Brian Krzanich, the former CEO of Intel: _" Our cadence today is closer to two
and a half years than two."_ <[https://blogs.wsj.com/digits/2015/07/16/intel-
rechisels-the-...](https://blogs.wsj.com/digits/2015/07/16/intel-rechisels-
the-tablet-on-moores-law/>)

John L. Hennessy; David A. Patterson (June 4, 2018): _" The ending of Dennard
Scaling and Moore’s Law also slowed this path; single core performance
improved only 3% last year!"_
<[https://iscaconf.org/isca2018/turing_lecture.html>](https://iscaconf.org/isca2018/turing_lecture.html>)

~~~
BubRoss
You didn't confront anything I linked. Again, Moore's law is about transistor
density, which hasn't stopped yet and has gone into to more cores. It was
never about single core performance. Now it seems like you are including
dennard scaling after someone else mentioned it.

You have a quote about the time frame increasing which has never been disputed
either.

I'm sure you can find links to some tech blogs that have the same
misunderstandings as you do, why don't you go hunt those down too?

~~~
ncmncm
I will pass your comments along to Mr. Moore.

~~~
BubRoss
I showed him already and he wondered why someone would say computers aren't
getting faster when AMD released a 64 desktop.

------
chrisbennet
If optimization is going to be a concern, the thoughtful choice of the
algorithm(s) up front is not "premature".

Currently, I'm working on a project with a ring buffer (a kind of FIFO
buffer). It isn't fast (not lockless) but has a well defined interface so I
can swap it for fast one later if I need to.

------
loopz
Wikipedia has the quotes:
[https://en.wikipedia.org/wiki/Program_optimization#When_to_o...](https://en.wikipedia.org/wiki/Program_optimization#When_to_optimize)

We shouldn't reinvent a phrase. Premature optimization has always been about
programming efficiently, ie. a newbie programmer or even expert "elite"
programmer could easily fall into the trap of optimizing prematurely. That
means: Spending too much time on very little gain performance-wise, thinking
it's important to eke out every little drop of performance. It's a programming
lesson only. Experience even show that premature unproven tweaks can both
reduce performance and make code harder to read and maintain.

Then there's the business lesson: Since for the last decades we've had Moore's
Law and then more, it's been very hard to make the effort pay off by
programming for performance. So businesses have learnt to focus on reducing
programming time. Coincidentally, it's been shown that quick lead times are
beneficial in the competitive marketplace as well. This is a business lesson
(not about the programming phrase "premature optimization"!).

We're seeing a reintroduction of optimization and performance, because meeting
a ceiling in CPU cycles and the need to utilize more cores efficiently. So we
have Golang, Rust, WebAssembly, etc. Having a snappy website and not annoy
your users will pay off as well (hello there multiple choice agreement-
notices!).

But the programming lesson still stands: For many types of workloads, it
doesn't make sense to spend too much time optimizing, _performance wise_ ,
unless you see proven benefit outweighing the costs. In most cases, it's
actually better to optimize afterwards (ie. prototyping and iterating), unless
one has a specific algorithm or solution in mind beforehand.

The business lesson is similar, with the added twist of being a criteria
wether you make it or break it. For programming, you could just do it for fun
or a hobby, and can premature optimize to your hearts content if you like. But
it'll be harder _now_ to fall into that trap instead of making something more
worthwhile! ;-)

------
mattxxx
I've never heard anyone actually say that optimization is evil... probably,
because people seem to actually enjoy optimizing code.

What's "evil" is optimizing without benchmarks/profiles/etc. Naive
implementations are usually easier to write, and can give you a baseline for
how much there is to gain (as well as providing "correct examples" for more
complicated algorithms).

------
tylerjwilk00
For some parts of a system if you write it correctly the first time there will
be no need for optimization, premature or otherwise.

If you know you will need to cache something just cache it, it's not
premature, it's best practices.

Exercise the Pareto principal during development and the areas that truly need
optimized will present themselves in time.

------
honkycat
My team at work is extremely guilty of this, and half the time they aren't
even correct.

I have coined the term "voodoo optimization" in response to a point Martin
Fowler made in his book "Refactoring 2nd Edition": When running Javascript,
for example, you are working with a compiler and engine that has had thousands
of man-hours and millions of dollars invested into it.

The compiler is throwing away variable declarations. It's merging for-loops.
What you put in is not what comes out. You need concrete numbers and FACTS to
optimize your code. Deciding "this looks slow" is not an effective
optimization method.

------
kerkeslager
This article is terrible, just full of straw man arguments and holier-than-
thou attitude. Meanwhile, the article completely fails to present any data,
completely fails to identify why organizations fail to optimize, and
completely fails to provide any solutions. The article basically just pretends
there's some large anti-optimization movement so he can knock it down. And
even while proving the obvious (that optimization is important), the article
uses appeals to authority rather than actual data. Ugh.

> Today, it is not at all uncommon for software engineers to extend this maxim
> to "you should never optimize your code!" Funny, you don't hear too many
> computer application users making such statements.

Funny, you don't hear too many computer application users asking for
optimization, either. The rare, high-profile cases where people complain that
something is slow get a lot of attention, but the majority of the time, the
feedback you get from users is a resounding silence. I _wish_ users would give
me unsolicited actionable feedback on my software, but the reality is, it
takes effort to get feedback, and usually people don't complain about
performance, they just have a general feeling of malaise about the software
that doesn't rise to the level of an explicit complaint. In 11 years of
software development, I've had only a handful of users ever complain about
performance. The times I've optimized are almost always a result of
performance logging. Performance logging lets me say to stakeholders, "This DB
query is taking a full second, and 30% of users are exiting the application on
that screen--can I spend time to optimize this?"

Funny, you don't _actually_ hear too many software engineers _actually_ saying
"you should never optimize your code", either.

I stopped reading at the "observations" section, which were just too full of
cringe.

I skimmed the rest, though, and noticed that the author's suggests a few books
on assembly language. Good call, guy, I'll be sure to tie my application to a
specific processor architecture so I can make it _extra_ difficult to
understand why memory is getting thrashed or thread switches are happening at
such inopportune times.

------
BubRoss
I think a lot of programmers still don't realize that this idiom comes from
doing micro optimizations early.

Software does need to be architected for performance up front. If you aren't
able to work on data with a lot of locality or have latency between
fundamental operations, you won't get fast software until you address these
issues.

If you are really starting from scratch you will barely understand the
problem, but once you do, you can make sure your architecture will align with
what you are doing. After that, optimization becomes much lower hanging fruit.

------
JoeAltmaier
On a modern full-sized laptop etc, sure code efficiency doesn't matter much.
But in anything smaller e.g. embedded device, radio card, cheap phone, it
means the world. The difference between a product and a fail.

Embedded programmers can probably be identified by their insistence on code
efficiency. Speed, space, storage, all critical to them.

~~~
FpUser
"On a modern full-sized laptop etc, sure code efficiency doesn't matter much"

Well it really mattered to me when I lost my patience and replaced a piece of
code written in Python that was doing some data manipulation/processing with
the native one. Suddenly what used to take an hour got down to less then 1
minute.

~~~
JoeAltmaier
Yeah big data can always beat the cpu. For apps, not so important I guess.

~~~
FpUser
I have made quite a few enterprise grade desktop apps. They'd normally involve
device control, real time data processing, graphics multimedia etc all running
in many threads communicating using my own internal publish-subscribe and in
memory EAV database. For each one performance mattered a great deal.

Yes I understand it may not matter much for many database front end apps but
the world does not end on those.

~~~
JoeAltmaier
Well you're a saint. The average desktop app repaints with stuttering, hangs
for seconds, and has all modal controls. So few care about these things any
more.

Is 'performance' the right word for this? Its more like user experience or
latency - the controls/graphics should be run on a different thread than the
data processing.

~~~
FpUser
_" Is 'performance' the right word for this? Its more like user experience or
latency..."_

It is both. All my desktop apps are strictly native and written with care.
Many use DirectX graphics for output. Also I've never bought into this
Microsoft's move to .NET on desktop propaganda. Electron etc. - would not
touch such frameworks with the wooden pole.

------
dr_j_
Brilliant article. I am currently implementing a programming language and have
had to think very hard about representation (in particular). I would highly
recommend this for any serious programmer wanting to learn a bit more about
performance.

------
valand
This is a reup of
[https://news.ycombinator.com/item?id=13505721](https://news.ycombinator.com/item?id=13505721)

I think that this is important to be shared once in a while

------
Nursie
Premature optimisation, to me, is spending days working out a new algorithm
for 5% saving on something that looks inefficient to you but you haven't
actually any idea if it's really a problem.

I've never met any of these engineers who think optimization is always wrong.

~~~
mehrdadn
To you premature optimization means spending days eking out 5%, but that's not
how everyone sees it. Lots of people I've seen would also call e.g. spending 1
hour vs. 30 minutes on a solution that'd be (say) >= 3x faster "premature
optimization" due to the fundamental design decisions (not micro-
optimization).

~~~
rightbyte
Dogmas are a poison to programming.

"No gotos" "You aint gonna need it" "No benefit for the customer"

And of course, the topic of this link.

~~~
ahartmetz
Yes. Experts (and salespeople, when asked for a price) know that the answer to
almost everything is "it depends" ;)

