

Stop Misquoting Donald Knuth - franzb
http://www.joshbarczak.com/blog/?m=201501

======
davidtgoldblatt
The longer I work on performance teams, the more I agree with the "performance
always matters" point of view[1]. In large binaries (or at least, in the large
binaries I've worked on), we don't see a few tight loops taking the bulk of
the time. Instead, there's a death by a thousand cuts in which many small
inefficiencies add up, and have to be clawed back slowly and painfully, often
by people with less understanding of the semantics of the code in question
than the original authors. Most performance work I see isn't stuff like "write
the tight loop in assembly; save 30% on execution time", it's stuff like
"reuse the locale object to avoid construction penalties; save 0.4%".

For reasons like this I'm skeptical of e.g. python advocates who say the speed
difference doesn't matter since you can always rewrite "the hot code" in C.
That works when you're truly using python as a scripting language; the glue
that ties you matrix multiplication routines or whatever together. But when
you're going to have a large, flat, performance profile, you're better off
just writing your program in C++ (or its friends) to begin with.

[1] So I suppose, you can take this entire comment as "person in specialty
thinks everyone else should change to make his life easier". Maybe I just have
a warped view of priorities.

~~~
nostrademons
The benefit of using Python to start out with is that you can get data on
actual usage patterns _before_ committing to an overall system architecture.
If you just rewrite individual hot segments of the code without considering
the overall system, you're missing a lot of the point.

To pick on Twitter since their evolution of the product is fairly public -
when they started, the product concept was "Oh, let's post a status through an
SMS and it'll be visible on a webpage." A database-backed Rails architecture
is perfectly reasonable for this. Except that then they were like "...and you
can follow people", and then people started using it as a broadcast medium,
and they wanted to search for trending topics, and then they opened it up to
developers who all wanted an API, and then they closed it to developers, and
now it's a big brand-advertising platform designed to engage directly with
your customers.

They ended up switching to the JVM, and got a huge amount of flack in the
meantime around "Why the _hell_ would you build a messaging architecture in
Ruby on Rails?" But the point is that they _didn 't_ set out to build a
messaging architecture; they set out to build a site where you could post your
status update on a website. If they had built a site to do that in J2EE, they
would've been equally fucked, probably even moreso. The reason they can build
an efficient system _now_ is because they have a lot of data about exactly how
it's going to be used, which operations need to be fast, which operations will
be performed frequently, and how much data total will be flowing through the
system.

~~~
SamReidHughes
You don't need to start off with an agonizingly slow language like Python in
order to quickly and flexibly build your software.

~~~
nostrademons
What's the alternative? Go? I've used both Python and Go for prototypes and in
terms of quickly and flexibly building your software, Go still falls very
short.

(If you were about to say Common Lisp or Ocaml, you might have a point, but
these often have library issues that make them much slower than a more
mainstream language for getting a prototype out.)

~~~
jonsterling
ML.

------
carsongross
_” We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil.”_

As I've gotten older, I've developed the following interpretation of this
statement:

Getting system architecture right is hard, but crucial. When it is right,
there is very little code to it, as little as necessary for the job. Rather
than sprinting ahead and micro-optimizing your initial architecture, you
should go slow, focus on minimal-viable units within the system and then, once
contact with the real world has occurred, step back and ask if your
architecture is going with the problem, akido style, or fighting it, J2EE
style. Then, and only then, consider the perf fixes appropriate to whichever
situation you find yourself in.

~~~
shasta
As I've gotten older, I've read what you've written here.

------
nostrademons
The author is very welcome to start a company that builds software according
to these principles. In some domains (databases, network stacks, middleware),
he could probably make a killing. In other domains (say, social web
applications), he'd get clobbered.

I think this is the problem with appeals to professional ethics. Software is
not one profession, and not one industry. It's labeled as such because the
software industry changes much more quickly than the job categorization
industry does. Nowadays, there's a huge difference between people who write
software that helps planes stay in the air vs. those who let you search the
web vs. those who write critical storage infrastructure for other companies
vs. those who write software that lets you throw sheep at your friends vs.
those who let you order from the grocery store with your phone. The best
practices and rules of thumb for one industry don't transfer over to another
industry. And it's not fair to consumers to make them deal with blindingly
fast software that doesn't actually do what they want when they're quite
willing to put up with slow, bloated software that does.

~~~
vezzy-fnord
_And it 's not fair to consumers to make them deal with blindingly fast
software that doesn't actually do what they want when they're quite willing to
put up with slow, bloated software that does._

I'd dispute the "quite" part. Begrudgingly willing, certainly. I'm not sure of
any consumers/end users who actually enjoy low performance and bloat, as those
almost inevitably hinder getting the task in question done via wasted time and
cognitive overload, respectively. Depending on use case, some people might
prefer going through a few more hoops in a shoddier UI than gazing for long
times as a slicker one, if total time spent is lesser, or even if merely the
_perception_ of total time spent is lesser (e.g. when individual context
switches are shorter, even if there's more of them)

If anything, it's programmers who are able to appreciate inefficiency far
more, especially if it's their own doing that saves them some mental resources
while having the trap of complacency with mediocrity at their side.

That said, you are correct about software being hugely diverse. Software is
the mechanism, not the end goal. Even still, I believe that _no_ field
employing software should value low performance as a virtue. Not social web
applications, people actually spend lots of time on those.

~~~
nostrademons
Which is where the "Start a company to develop software along these
principles" part comes in.

In _some_ industries, saying "It's just like [popular product you use] but
much faster" is enough to make the sale. That's the value proposition behind
uTorrent, Skylight.io, Apache Spark, Akamai, Cloudflare, and others. In other
industries, people don't care. I suspect that if you created a messenger
that's "Like WhatsApp, but faster", nobody would care - and making something
like AirBnB or Thumbtack faster gives only a marginal improvement when most of
the bottleneck is waiting for a human to respond.

By focusing on what _you_ can do to make a buck and exploit your technical
advantages instead of what _other_ people should do, you end up teasing out
the parts in the industry where it matters. That benefits everybody, while a
blanket pronouncement of what everyone should do doesn't.

~~~
VLM
"Like WhatsApp, but faster"

Machine cycles are time; they are also energy.

"Like WhatsApp, but your battery lasts an extra day longer"

That would be a killer app.

It would probably be a lie unless someone knows some space alien technology I
can't even imagine, but if it were possible it would be a killer app.

------
andrepd
I couldn't agree more. It's rather alarming really, the mindset that most of
the professional developers have on this matter. Making the life of the
developer easier is the highest objective, low-level is derided, ease of use
for the dev is readily bought with inefficiency, massive layers of abstraction
and inefficiency and bloat are _everywhere_ , and above all performance in the
end product is sacrificed for the convenience of the developer.

Now, I'm not saying to spend substantial amounts of time and money chasing
small micro-optimizations. I'm saying it makes me think for a while that it
seems like everybody just simply _stopped giving a shit about performance_ ,
and then we end up with orders of magnitude of crud on our software.

~~~
userbinator
_Making the life of the developer easier is the highest objective_

It's particularly ironic when these developers themselves use software written
by other, wasteful developers. Developers are users too. They're basically
shooting themselves in the foot indirectly, but I could see how someone might
think it's better to spend the day being paid to mostly wait for software than
to be productive...

I think it's all rather selfish.

------
mikeash
I thought it was a bit silly until I saw that he's talking about C++.

The smart-everything, RAII-everything approach that most C++ uses these days
tends to make everything a little bit slow. In other languages, there's
usually a small number of places that are slow, and everything else is
inconsequential. In C++ you end up spending a lot of time spread among a
million different bits of code twiddling smart pointers and RAIIing things
that don't really need it.

I completely lost it at this paragraph, though:

"If you tell yourself, _it’s only a malloc, it’s nothing_ , and you do this
often enough, you will end up with 25000 temporary allocs for a single
keystroke. They may only take 50ns each, but I type 529 characters per
minute."

That works out to 1.1% CPU utilization when typing at full bore. (50ns * 25000
* 529/minute = 0.011.) In the abstract, that's high. But in a practical sense,
it's completely irrelevant.

Shaving CPU cycles off the keystroke handler of an app that only uses 1% CPU
in the hands of a fast typist is a massive waste of the programmer's time.
This paragraph is followed by:

"When these sorts of things are pointed out, people aught to respond by fixing
the problem, so that they can deliver a better product. Instead, they
typically start arguing with you about why it’s not that big a deal."

Well yes, because it's _not_ that big of a deal. There are better places to
spend your time. A product that uses 0.01% CPU while typing at full bore is
not noticeably better than one that uses 1% CPU. You won't "deliver a better
product" by addressing this, you'll waste a bunch of time you could have spent
building a product that's actually better.

~~~
VLM
If my engineering estimates and mental math are correct (which is always
suspect) that software bug you're describing that wastes 1% CPU on my 95 watt
TDP desktop times 5 million users working full time jobs (some kind of
keyboard driver I suppose) means 5 tons of CO2 emissions per year just from
one little software bug.

Obviously most keyboard drivers have a lot more than 5 million full time users
but as a crude engineering estimate (again assuming I didn't screw up the
math) 1% of wasted CPU for a million users is a ton of CO2 emissions per year.

Little things add up... This is what leads to things like banning transformer
based wall warts, each may only waste 10 watts but you eliminate them from the
world and you can like, close an entire coal mine due to reduced electrical
demand.

~~~
GregorStocks
A ton of carbon sounds like a whole lot, but according to the EPA
([http://www.epa.gov/cleanenergy/energy-
resources/refs.html](http://www.epa.gov/cleanenergy/energy-
resources/refs.html)), a typical coal-fueled power plant puts out 3.8 million
metric tons of carbon per year. Kinda puts it in perspective.

~~~
Retra
That's one of the major complaints about consumer-focused environmentalism.
Sure, it helps to recycle your trash. But if you don't deal with industrial
waste, you're beating a dead horse on the margins and letting all the live
ones run by on a giant racetrack.

~~~
ams6110
It's why I stopped recycling. The amount I contribute to the waste stream is
insignificant. In fact I would wager that the fuel I use to drive to the
recycling center more than offsets the environmental benefit. To say nothing
of the wasted time, the hassle and low level stress caused by having bags of
trash sitting around until there's enough to haul away.

~~~
mikeash
Simpsons did it, of course:

"A half-ton of newspaper and all we get is 75 cents? That won't even cover the
gas I used to go to the store to buy the twine to tie up the bundles."

I recycle a lot of stuff, but it's convenient for me, because my trash service
picks it up weekly. I think we actually recycle more than we throw away at
this point. But without people coming by regularly to pick it up for me, it's
hard to see how it would be worthwhile.

------
jkoudys
I'm glad this article exists, mainly because calling any and all optimization
work a "micro-optimization" has become highly fashionable among the lazy. For
every 1 developer who actually understands what Knuth was saying, there are 5
more who think that concern for performance during implementation is
"premature", and will take any opportunity to criticize others, loudly, so
that they appear intelligent.

I recently had someone chastise me for micro-optimising, before even seeing
the code, understanding the use-case, or knowing that my load-tests already
established this as a bottleneck. I'd barely said more than "I'm looking for a
more efficient implementation of this" before being shamed as a micro-
optimizer. It's out of control.

------
alecco
They keep telling you:

> "We should forget about small efficiencies, say about 97% of the time:
> premature optimization is the root of all evil.

But they always omit the following part of the quote:

> Yet we should not pass up our opportunities in that critical 3%. A good
> programmer will not be lulled into complacency by such reasoning, he will be
> wise to look carefully at the critical code, but only after that code has
> been identified

~~~
EugeneOZ
The issue is not in 3%. Whole system should be designed with "performance in
mind". It's architectural decisions, not just 3% of code, hidden in few fancy
functions. Web app should be designed to insert caching layer or replace
logging tool or error handling or email sending. When all is hardcoded into
one huge monolith "it works", it will not be so important where are these slow
3%. And when each piece is optimized to be as performant as possible, only
then whole app will be awesomely performant. It's not always necessary on the
beginning, but to make it possible, each part should be replaceable to more
performant and whole architecture should be designed "with performance in
mind".

------
kenjackson
The key part of the quote is the word "premature". Optimization is important,
but you need to be optimizing the code that matters.

------
amelius
> There is a whole class of developer out there who rejects any optimization
> beyond occasional attention to asymptotic complexity. They believe that
> their responsibility ends once they’ve picked the theoretical best
> algorithm, and that at this point things are as good as they’re likely to
> get.

And there's also a whole class of developers that don't seem to care about
asymptotic complexity. Take for example the React crowd: in React, in its
rawest form, every little update to the DOM will require O(N) time to update
the display, even though the update is usually fast because of the diffing
algorithm. However, the O(N) complexity puts a strong limit on the maximum
complexity of the DOM tree, and it is imprudent to just ignore that.

~~~
pdpi
I strongly suspect you can't do much better than O(n) on the actual redraw of
the DOM, so having an O(n) diff acting before isn't making the asymptotic
behaviour any worse.

------
neduma
Couple of quotes comes to mind related to abstraction.

    
    
        Premature Optimization is like fart: Premature Abstraction is like a taking dump in another developer desk
    

More less blunt version of that

    
    
        Premature optimization, that's like a sneeze. Premature abstraction is like ebola; it makes my eyes bleed.

------
Houshalter
One example is Skype on Windows. It now takes several minutes to load when it
used to be a few seconds. Also I find mobile hard to use for everything
because there are 10 second delays between every trivial task you want to do
and it adds up.

------
nailer
Knuth:

> The conventional wisdom shared by many of today’s software engineers calls
> for ignoring efficiency in the small; but I believe this is simply an
> overreaction to the abuses they see being practiced by penny-wise- and-
> pound-foolish programmers, who can’t debug or maintain their “optimized”
> programs

I think ignoring efficiency is bad, but ignoring efficiency is very different
to not prematurely optimising. The author seems to be misquoting Knuth to a
certain extent by equating them.

Give me the fast enough, no performance issues, no repetition maintainable
code with those intermediary variables please.

------
zinxq
"premature optimization" is statement setting itself up for negativity. Of
course "premature" optimization is bad - so is basically anything else that
happens prematurely.

That's not to deny thoughtful optimization when logical, even before the
program runs. I've often found Knuth's quote to be interpreted as "Performance
doesn't matter - just get it to run".

Which as Knuth points out, for "one shot" programs is probably fine, but for
anything in the longer term is damaging.

------
buster
Totally agree. I always read "but memory and CPU is not a concern" and i can
cry.

------
devy
I would really love to hear if Mr. Donald Knuth could reclarify his quote 40
years later again.

~~~
pakled_engineer
He did, in one of those 'All Questions Answered' talks and I can't remember in
which talk he answered it

[https://youtu.be/CDokMxVtB3k](https://youtu.be/CDokMxVtB3k)

[https://youtu.be/xLBvCB2kr4Q](https://youtu.be/xLBvCB2kr4Q)

[https://youtu.be/pa7sEVRYV7U](https://youtu.be/pa7sEVRYV7U)

Prof. Hegarty (Stanford) explains what premature optimization is and why you
shouldn't do it to further clear this up.
[https://youtu.be/mFhiaTW2jgg](https://youtu.be/mFhiaTW2jgg)

------
sinwave
May I please have your faster than nlogn sorting algorithm

~~~
asgard1024
Sure thing:
[http://en.wikipedia.org/wiki/Radix_sort](http://en.wikipedia.org/wiki/Radix_sort)

~~~
alecco
Radix sort is amazing, but it's kinda O(n log n), see

[https://en.wikipedia.org/wiki/Radix_sort#Efficiency](https://en.wikipedia.org/wiki/Radix_sort#Efficiency)

    
    
      > Radix sort complexity is O(wn) for n keys which are integers of
      > word size w. Sometimes w is presented as a constant, which would
      > make radix sort better (for sufficiently large n) than the best
      > comparison-based sorting algorithms, which all perform O(n log n)
      > comparisons to sort n keys. However, in general w cannot be
      > considered a constant: if all n keys are distinct, then w has to
      > be at least log n for a random-access machine to be able to store
      > them in memory, which gives at best a time complexity O(n log n).[2]
      > That would seem to make radix sort at most equally efficient as the
      > best comparison-based sorts (and worse if keys are much longer than
      > log n).

~~~
vvanders
This is one of those cases where O(...) notation belies the actual performance
on real systems.

The big advantage of Radix sort is not O(wn), it's that Radix sort linearly
accesses its values in memory. This means that you can take full advantage of
DDR read speeds since the prefetcher will run ahead of your cache misses to
get you the data(or if you're smart you can prefetch yourself).

DDR fetch speeds are on the order of hundreds to thousands of cycles and
that's where Radix's performance gain comes from.

~~~
alecco
Yeah, BigO is kind of dumb outside CS classes and interview questions. A Ludic
Fallacy! Real top sorting records are most often based on radix (GPU) and
merge (CPU+SIMD).

------
mhurron
"No." -Donald Knuth

Edit: Now, now, I'm pretty sure he has said that at least once in his life.

