
Reflections on Software Performance - cespare
https://blog.nelhage.com/post/reflections-on-performance/
======
tluyben2
> It seems increasingly common these days to not worry about performance at
> all,

You don't even have to continue there. People, who should know better, assume
that 'modern cloud stuff' will make this trivial. You just add some auto-
scaling and it can handle anything. Until it grinds to a halt because it
cannot scale beyond bottlenecks (relational database most likely) or the
credit card is empty trying to pull in more resources beyond the ridiculous
amount that were already being used for the (relatively) tiny amount of users.

This will only get worse as people generally use the 'premature optimization'
(delivering software for launch is not premature!) and 'people are more
expensive than more servers' (no they are not with some actual traffic and
O(n^2) performing crap) as excuse to not even try to understand this anymore.
Same with storage space; with NoSQL, there are terabytes of data growing out
of nowhere because 'we don't care as it works and it's 'fast' to market, again
'programmers are more expensive than more hardware!'). Just run a script to
fire up 500 aws instances backed by Dynamo and fall asleep.

I am not so worried about premature optimization ; I am more worried about
never optimization. And at that; i'm really worried about my (mostly younger)
colleagues simply not caring because they believe it's a waste of time.

~~~
vlovich123
There's also something to be said for building better tooling in this area.
Not everyone can achieve expertise in everything. Better tooling helps level
the playing feel (& eventually outperform experts when the tooling becomes
indispensable).

You may think that's a cop-out, but consider something like coz[1]. Sqlite is
managed and maintained by experts. There's significant capital behind
investing engineering effort. However, better tooling still managed to locate
25% of performance improvement[2] & even 9% in memcached. Even experts have
their limits & of course these tools require expertise so a tool like coz is
still an expert-only tool. The successful evolution of the underlying concept
for mass adoption will happen when it's possible to convert "expert speak"
into something that can be easily and simply communicated outside CPU or
compiler experts to meet the user on their knowledge level so they can dig in
as deep as they need to/want to.

[1] [https://github.com/plasma-umass/coz](https://github.com/plasma-umass/coz)
[2] [https://arxiv.org/abs/1608.03676](https://arxiv.org/abs/1608.03676)

~~~
gameswithgo
>Not everyone can achieve expertise in everything.

But if every young or beginner programmer who asked a performance question on
reddit, or stack overflow, could get good answers instead of lectures on how
what they are doing is "premature optimization" every single time, the world
would collect quite a bit more expertise on making things perform well.

~~~
BubRoss
I try to remind people whenever I can that Knuth was talking about noodling
loops in miniscule way. Between optimizing compilers, out of order super
scalar CPUs and very different performance characteristics of modern CPUs,
what he was talking about basically doesn't exist anymore.

~~~
ska
I don't think this is a helpful/accurate view. At a high level the type of
optimization activity Knuth was talking about is alive and well, although the
details of what people spend that time on has sometimes shifted.

I agree this quote is often abused but the fundamental idea behind it is
intact and important: Sure, if you don't at least thing architecturally about
performance early on as your problem domain reveals itself, you can make some
poor decisions with long reaching performance implications. But on the other
hand, if you spend a bunch of time tuning code when you don't know what the
use will look like that time can be a dead loss.

This latter point was what Knuth was referring to - and in 2020 teams are
still prematurely optimizing; i suspect about as much as they were back then.

~~~
gameswithgo
This idea is important when trying to finish an actual product of some kind,
but when kids (or adults!) are learning, let them fiddle with the loops! Let
them learn some intricacies and encourage the curiosity.

~~~
ska
Nothing wrong with fiddling with loops either, in that context.

I was objecting to the idea that premature optimization isn't a real problem
anymore because technology. That's just not true.

~~~
BubRoss
No one is saying that. Knuth was frustrated with people wasting time on micro
optimizations. Those don't really exist in the same form any more.
Architecture is far more important to the speed of software and that does need
to be dealt with up front.

The problem is when people YOLO their way through a program thinking
optimization is for suckers because of an outdated quote from a different
context.

------
branko_d
Yes, performance is a feature.

You have to plan and architecture for it, and you can't just tack it on after
the fact by profiling a few hot codepaths (though you should do that too).

Performance can be different from "scalability" though. Sometimes, there is
tension between the two.

~~~
sokoloff
As someone who has probably wasted more time than optimal agonizing over
performance (I used to be a game dev for console and PC), what you say is
absolutely true, but I think engineers have a tendency to think about Facebook
scale before they have triple-digits of users. That is usually a mistake.

~~~
GlitchMr
A web application doesn't have to be scalable. Stack Overflow for instance
could run on a single web server (source:
[https://nickcraver.com/blog/2016/02/17/stack-overflow-the-
ar...](https://nickcraver.com/blog/2016/02/17/stack-overflow-the-
architecture-2016-edition/)), and this is a very popular website with Alexa
rank of 39.

~~~
saagarjha
Hacker News runs on a single machine, apparently.

~~~
karatestomp
Your average "web scale" cloud system with Node and lambdas and VMs and
distributed databases galore feels slow and clunky as hell before it's even
under load, to those of us who remember "bad" old LAMP stacks running on a 1U
server.

Not that I want to go back to that, exactly, but our performance expectations
have gotten really screwy.

[EDIT] or, hell, take "Web 2.0". Piles of code and frameworks and shadow DOMs
and shit all chasing and touting "performance" while full-page-loading low-JS
sites like Craigslist and Basic HTML Gmail (or HN) leave them in the dust.
Know what those are doing? Handing HTML to the browser and letting it render
it. No JS render step, no fetching JSON then passing it through Redux and then
making twenty function calls to eventually modify a shadow DOM to later apply
to the real one. The browser is fast. Your JS is what's fucking slow and
eating all my memory.

~~~
adossi
I would argue JS is a lot faster than you think, and the sluggishness you feel
is due to the massive number of files being downloaded. Dozens of JS libraries
(think jQuery, Bootstrap, etc.), several CSS stylesheets, and a hundred images
or more. If each one of those files is even a few kilobytes each, there is
still a 10ms (or even 100ms) download time on each of them, and unfortunately
its very common for these files to be downloaded sequentially. JS on its own
is quite performant.

~~~
FridgeSeal
Whilst JS might be somewhat fast, you know what’s even faster?

Designing your application so that it doesn’t need it. If I never have to
download, parse and execute the JS, I’m already way ahead. With better privacy
to boot.

~~~
adossi
I understand the appeal of rendering websites on the serverside, in their
entirety (HTML and CSS), before the response is returned to the user's
browser, which is what would need to happen if we all decided to stop using
JavaScript today. However, using JavaScript in the user's browser to compute
things like dynamic construction of the UI, animations, etc. has its benefits.
For one, using the user's machine via JavaScript leverages their CPU and
reduces the CPU consumption of the server. This can save cost, and when done
correctly provides an overall better user experience. Things like AJAX (or
XMLHttpRequests) are also a blessing and vastly improve the usability of
websites. I'm comfortably sitting on the fence - I agree JS is used too often
for things that don't need to be done on the user's machine, and anything that
can be done on the serverside easily should be done there, but there are times
when it is useful. Because of that I disagree with disabling it or not using
it entirely.

------
simonw
This piece is excellent. I really love how it challenges the "optimize last"
philosophy by pointing out that performance is integral to how a tool will be
used and designing it in as part of the architecture from the very start can
produce a fundamentally different product, even if it appears to have the same
features.

~~~
jmull
I think premature optimization remains as bad as always.

But you design for performance. The proper time to address it is at design
time. That's not premature, that's the right moment.

I wish we could reserve the word "optimization" for the kinds of things you
can do after implementation to improve the performance without significantly
changing the design.

That is, let's continue to optimize last, but not try to make the word
optimize mean address performance in general. That's not what the word means,
after all.

~~~
gameswithgo
premature optimization is bad since it is a tautology.

but does it ever happen? =)

~~~
jmull
I don't think you mean tautology.

~~~
gameswithgo
I definitely do, but maybe I am wrong! If the optimization wasn't bad, it
wouldn't have been premature.

~~~
jmull
I see. Well, when I said it was as bad as always, I was referencing the quote,
"Premature optimization is the root of all evil." I take that has hyperbole,
but the point isn't that premature optimization is bad. (So I think I get what
you mean: premature includes the idea of being bad. It's one way of being bad.
So in a sense "premature optimization is bad" means "a bad kind of
optimization is bad". The point is premature optimization is particularly to
be avoided. Treating it as a tautology misses the main point.

Suppose you're a snake charmer talking to an old-timer and the old-timer says,
"The venomous king cobra is venomous." That's just a tautology. But if the
old-timer says, "The venomous king cobra is the most venomous snake you'll
ever handle and a single bite can kill an elephant," then hopefully you don't
get stuck on the tautology and can see the important, actionable warning in
there.

------
bcrosby95
I've heard that performance is a feature but I feel like that understates the
effort involved in seeking performance for a piece of software.

If you want to call it a feature, its closer to N features: 1 for each feature
you have. If you have 10 features, and add performance, the effort involved
isn't like having 11 features. It's like having 20 features. The effect is
multiplicative.

This is because performance is a cross cutting concern. Many times cross
cutting concerns are easy to inject/share effort with. But not with
performance. You can't just add an @OptimizeThis annotation to speed up your
code. Performance tuning tends to be very specific to each chunk of code.

~~~
gameswithgo
If everyone on the team makes it a habit of worrying about it, everyone gets
better at it. It becomes a part of the review process - "this looks correct,
but is there a faster way?" or "this looks very fast, but we could make it a
LOT simpler and only a little slower, maybe we should."

------
luord
> And while the SQLite developers were able to do this work after the fact,
> the more 1% regressions you can avoid in the first place, the easier this
> work is.

That mention of regressions seems, IMO, a slightly out of left field attempt
at dismissing how the SQLite example shows that you can, in fact, "make it
fast" _later_. Maybe he should've a picked a different example entirely
because it undermined his point a little bit.[1]

All in all, his entire thesis comes from talking about a typechecker, which is
indeed a piece of software whose each component in general contributes to the
performance of the whole. It isn't a set of disparage moving parts (at least,
from what I remember of my time studying parsers in college), so it's very
hard to optimize by sections because all components mostly feed off each
other. Most software is _not_ a typechecking tool, plenty (dare I say, most)
of software does have specific bottlenecks.

Though I do agree that, even if we aren't focusing on it _right away_ , we
should keep performance in mind from the beginning. If nothing else, making
the application/system as modular as possible, so as to make it easier to
replace the slowest moving parts.

[1] Which is a good thing IMO, as it highlights how this is all about trade-
offs. "Premature optimization is the root of all evil", "CPU time is always
cheaper than an engineer’s time", etc., are, in fact, mostly true, at least
when talking about consumer software/saas: it really doesn't matter how fast
your application is because crafting fast software is slower than crafting
slow software, and your very performant tool is not used by anyone because
everyone is already using that other tool that is slower but came out first.

------
magicalhippo
> What is perhaps less apparent is that having faster tools changes how users
> use a tool or perform a task.

Important here is that for a user, "faster" means with respect to achieving
the goal.

At work we've created a module where, instead of punching line items by hand
and augmenting the data by memory or web searches, the user can paste data
from Excel (or import from OCR) and the system remembers mappings for the data
augmentation.

After a couple of initial runs for the mapping table to build our users can
process thousands of lines in 10 minutes or less, a task that could take the
better part of a day.

It's not uncommon with some follow-up support after new customers start with
this module, so I often get to follow the transformation from before to after.

They also quickly get accustomed. We'll hear it quick if those 10 minutes
grows to 20 from one build to another, not much thought is given to how 20
minutes is still a lot faster than they'd be able to punch those 8000 lines :)

------
ken
> the SQLite 3.8.7 release, which was 50% faster than the previous release

Nit: the link says it’s 10% faster than the previous release. It’s 50% faster
than some arbitrary point in the past, perhaps the time when they began their
CPU-based profile optimization.

------
alexeiz
Nice and clean static layout. A rarity these days when blog post web pages
tend to be overloaded with headers, footers, and various crappy interactive
elements.

------
igouy
> I’ve really strongly come to believe that…

I’ve come to believe really strongly that…

------
PouyaL
Great stuff. We need to work on the fact at the moment, though that happens at
time goes by.

