There's an interesting talk by one of bzr's (formerly Bazaar's) lead developers on why bzr failed and lost to git.[1] In his view, much of the blame rested on the decision to defer optimization until the product was more mature. But by the time it was mature, key architectural decisions were made that were hard to change. Here's a salient quote:
"Premature optimization" is a dangerous concept. It tends to imply you can
defer all consideration of performance until you have the behaviour and UI
you want.
But, it turns out if the UI, core APIs or formats are hard or impossible to
implement quickly, it is hard to unwind them.
Now I'd have a more nuanced view of what "avoid premature optimization"
means: something more like "avoid premature micro-optimization."
The key question is: how expensive will it be to come back and change to a
faster implementation later on, and how much does it save you to write a
simple or naive one now.
If there's one function or routine that is written in a straightforward way,
going back to rewrite it should cost only whatever time it takes you to
rewrite it, and then to test, debug and release that change. So it will
often make sense to defer that work until you’re sure the requirements for
the code really satisfy the user, your understanding of the problem is
correct, there are no unknown bugs or edge cases, etc.
However, there are some things that can make changing the code much harder
and we ran into several of them...
Having seen teams try and fail to optimize late - I fully buy the concept it is never too early to optimize the architecture.
If you get to the point you can't optimise because the fundamentals are too baked-in to realistically change - cut your losses that day and start pasting the usable parts in to a new project.
It's important to differentiate between architectural optimisation (which should be optimised from the outset) and code optimisation.
I do believe there is an argument for minimal code complexity (optimisation) in early phases - even if that leads to some latency - it will assist refactoring and onboarding new developers - later in the project team can revisit code that can benefit from being more complex.
> "Premature optimization" is a dangerous concept. It tends to imply you can defer all consideration of performance
At a deeper level, the problem here is cargo culting of rules or heuristics.
Another example is the proscription of "security through obscurity". It's perfectly valid as a component of a larger, more comprehensive security strategy. But you're gonna get hurt if it's the only component.
You'd be surprised (or maybe not) on the number of experts who fight out about security through obscurity these days - because it's cool to do so.
One particular example is using non-standard database ports. I have to fight all the time.
"It's pointless. You make it harder for users (well, not really, the one-time setup of an application maybe), and it's not any more secure because it can be port scanned."
Personally I disagree because I think port scans are the kind of thing you should, you know, pick up through other software. Their response to this is that nobody tracks internal port scanning, and that it's always external.
I give up by that point. Because you can't complain about insider threats but then decide not to bother looking at anything internal.
Regarding your broader point, I think this immensely important and a very powerful framing to use in many different pursuits. I'd describe it as always striving to acquire and refine the deepest cores of conceptual understanding that can be accurately and meaningfully applied in the widest set of scenarios.
And then for two equally useful mental models, prefer the more concise and accessible one. Kind of like the concept of "parsimony" as used in the physical sciences.
In software, one begins to see where "best practices" come from and how they can be valuable, but also where such heuristics break down.
Getting philosophical, I believe this is a lifelong search -- we will never reach full understanding. Perfect truth is irreducible, so the best we can hope for is an strong approximation.
Which is exactly what the quote says. If performance is going to matter, you absolutely have to get your overall architecture right so after-the-fact tuning with a profiler works.
If you get your architecture wrong, you're in for a world of hurt and might have to start over.
I'm not sure. I have vague recollection of pushback against git on Windows in the early days due to poor performance on that platform, which seems to have evaporated since.
Arguably, that's what happened with Python. The single-minded focus on programmer convenience for its first 20 years of existence made it a very difficult language to compile.
Python2 now compiles to Go.[0] It is quite the development and I consider it the future of Python. It's pretty great stuff, definitely recommend checking it out if you haven't already.
"Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."
Above the famous quote:
"The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny- wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies."
All this from "Structured Programming with Goto Statements"[1], which is an advocacy piece for optimization. And as I've written before, we as an industry typically squander many orders of magnitude of performance. An iPhone has significantly more CPU horsepower than a Cray 1 supercomputer, yet we actually think it's OK that programs have problems when their data-sets increase to over a hundred small entries (notes/tasks/etc.).
Anyway, I write a lot more about this in my upcoming book: "iOS and macOS Performance Tuning: Cocoa, Cocoa Touch, Objective-C, and Swift"[2]
In some Sharepoint 2010 deployments I worked on, it was possible to create Sharepoint workflows that would would bog down, fail to process new entries, etc. once the list of items grew beyond 100-150 entries.
Admittedly, this was probably related to misconfiguration and database issues (i.e. having zero oversight or administrative maintenance of the underlying MS SQL Server). That specific local minimum might not apply to the context of the article (optimization in code and systems design).
I've seen Hive take minutes (OMG!) to count a table with 5 rows ... but (other) people still think it's OK, because it scales well. It's latency sucks for small data sets, but it can handle very large data sets.
It's true, the startup costs of a MapReduce job are immense. I'm surprised by minutes, but I'm not sure this counts since there are and always will be different solutions and different tradeoffs for problems of different orders of magnitude. Any solution built for massive scale is often considers cumbersome for a small scale problem.
For instance, I find test cases that spin up in-memory, in-process Spark extremely slow, but the spin up is quite fast overall in the context of a job that processes gigabytes of data per task.
It's interesting how much that reads like the exegesis of scripture, interpreting and reinterpreting quotes handed down by the prophets of some by gone age as if they're revealed wisdom.
I don't think that it required all the verbiage about what Knuth said or what he really meant. Make the argument you want to make. Whether it's compatible with knuth or not is rather irrelevant to whether it's correct.
From what I've seen, the problem of inefficiency most often manifests when the entire system of software is taken as a whole.
On a developer's machine with local SSD, a small database and a really small working set, even large inefficiencies can become invisible because the computer is just that fast. However, add in some network latency between the application and the database, some unpredictability about the timing of requests, and a much larger data set so that you won't hit data in memory 100% of the time, and suddenly code that was working fine grinds to a halt as it crosses a threshold where it can no longer process requests faster than they come in.
Unfortunately, with modern libraries helping "abstract away" underlying architecture like databases and the network, code like this is very easy to write without even realising that there might be a problem until it hits you in production.
>Unfortunately, with modern libraries helping "abstract away" underlying architecture like databases and the network, code like this is very easy to write without even realising that there might be a problem until it hits you in production.
I think that modern library abstractions are a tip of the iceberg. There is entire copy-and-paste class of developer who doesn't understand what's going on under the hood with most anything they're doing. We're operating on a stack of a abstractions a dozen layers deep, and people rarely take the time to look more than one layer down.
These folks are competent developers who can get things done. Even big things. But they are either incapable of envisioning how the entire system will interact, or they lack the appropriate functional details of how the abstractions work "behind the curtain" to be able to form an accurate mental model. As a result, they just wire things up without understanding how everything will interact.
The Mythical Man Month talked about the "surgeon" role on a team. The programmer that's now known as the "10x" developer. That's what's needed for any project that will eventually need to scale, or you're almost certainly going to have major inefficiencies that make everyone's lives difficult for years. I'm actually considering marketing my consulting services as being that person for startups -- the architect you really need to design a solid foundation, so that you don't accidentally create a house of cards that will fall over the first time you hit the front page of HN.
Not to say that anyone can foresee all interactions in a complex system. You still have to profile and test carefully. But I do consulting and have seen systems that had, to me, obvious bottlenecks that were poorly handled. One recommendation I made brought hosting costs for a small company down from $10k/month (and spiraling up out of control) to less than $500/month. And their product wasn't even that popular; it was not nearly at a Reddit, HN, or Quora level of usage. They peaked at about 2000 requests per second. I was handling more than that on a $10/month VM for my own backend (different load, and different requirements, but theirs could have been handled by a pair of app servers and a paid of database servers, two of each entirely for redundancy).
I wish more software provided appropriate metrics for performance analysis in live systems. I've seen people blaming the CPU or the disk or the database or the network for issues, but when you actually look at the graphs, the CPU is mostly idle, IO is almost nonexistent and the database sees a few dozen selects per second at worst, but somehow the problem is not in the software... And it's really difficult to get people to believe otherwise when all you have is data from around the software itself, not from inside it.
I never see the network blamed, it's usually ignored. I've seen way to many instances of slow database queries being replaced by slow cache reads when the problem wasn't slow queries but 1000's of fast ones.
This is actually my biggest complaint about the commonly cited benchmarks at TechEmpower. [1]
Their "multiple queries" and "data updates" benchmarks seem to be more of a test of the database and the specific database bindings than anything like production differences in database access. If you look at the data, the database is clearly being pushed to its limits, and is the constraining factor for all of those tests.
It's fine to test the bindings for speed, but they should be clear that's what they're doing. Instead they claim to be testing the speed of the frameworks ("Web Framework Benchmarks" is the topmost title on the page). I think the binding speed is rarely the constraining factor in production, and that if you had a properly scaled database backend you'd find the same languages on top of that comparison that you find on top of the "PlainText" and "Single Query" benchmarks.
In a production environment you typically have a database that is running on different hardware, sometimes in shards, and that is scaled to handle your access patterns. In that environment all queries have higher latency than a database on localhost, and you can push a single application server (where the "framework" runs) a lot farther when you're using, say, Go or Node.js, which can more efficiently pipeline requests, than you can when you're using a thread-per-connection language like Ruby or PHP.
Finally, the TechEmpower "Fortunes" benchmark is nearly 100% a benchmark of the default templating approach in each framework; Go has template libraries that are literally 10x faster than the one in the standard library and could be among the top performers on that page as well.
I guess the moral of the story is that I should contribute to the discussions around TechEmpower benchmarking... :)
Abstractions and libraries are surely not the problem. The higher the abstraction the more opportunity for high level optimizations. And they always trump low level optimizations.
The problems are that people dont know how to optimize things or are blocked by imposed architecture.
> Software engineers have been led to believe that their time is more valuable than CPU time; therefore, wasting CPU cycles in order to reduce development time is always a win. They've forgotten, however, that the application users' time is more valuable than their time.
I've no words to express how much I agree with this. Developers who ship bloated software that chugs painfully even on high end computers, with the defense that "the developers time is more valuable than the CPU's", the developers who prefer to use brutally slower, but easier to use tools, they need to be reminded that this is their job. Your objective shouldn't be to make your own job easier. It should be to deliver the best product you can, and the best experience to the users. Not make them suffer long loading times, slow applications, unresponsiveness, and massive size bloat so you can write your shitty app in your breezy to use language.
This is one of those classic statements that you only understand correctly after you have the experience it intends to impart. It teaches you nothing by itself. To really understand it, you need a clearer understanding of "premature" and "optimization". (Keeping track of the preceding text that clarifies that the quote is about small efficiencies doesn't hurt either.)
If a performance choice affects the overall architecture, a public API, the user experience, or the success of your product in the market, optimizing now is not "premature". Now is exactly the time to get it right because if you don't, there might not be a later time to optimize. It is only premature if it taking the slow path doesn't matter much today and the technical debt won't accrue quickly such that it really matters in the future.
If you can safely optimize it tomorrow, without it being significantly more costly to do so than it would be today, it's probably premature. Otherwise, it's probably not.
Likewise, "optimization" doesn't just mean any code where you think about performance at all. You should always be aware of the performance of your code, just like you're aware of its maintainability, robustness, and test coverage. Performance is a fundamental aspect of the craft.
Optimization is really about improving the performance of the code at the expense of other valuable things. Optimized code either takes significantly longer to develop (burning time you could have spent on other features or robustness), or is significantly harder to maintain (usually because the optimization relies on certain assumptions that you can no longer be flexible about) than non-optimized code.
If the faster version doesn't take much longer to develop and doesn't lead to a more calcified codebase, then doing that isn't optimization. Not doing is just silly.
I've spent much of my career trying to optimize latency of long analytic-type queries; and I've been wrong on both sides of the issue. While it sounds stupidly tautological to say, I've come to think the important thing is being right in what I choose to optimize and when. With that bit of "wisdom" out of the way, there's a lot of ways the "premature optimization" phrase can be helpful:
* I might think I know what's important/critical-path and be wrong (it's happened)
* Users might be willing to live with slowness in some areas I didn't expect and demand performance in others
* More performant solutions might come to light later
* Particularly before product-market fit, features and their optimizations might be discarded
* New development might change the critical path
OTOH, I've found it's often beneficial to organize data and processing in a way that can take advantage of parallelism like multi-threading and vectorization. Often that's the kind of thing that's harder to do later.
What can be worse than premature optimization? Premature generalization :-)
What is so special about efficiency? Efficiency is just one of the concerns the software development orgs are facing, together with cost, utility, usability, deployability, maintainability and so on. The tradeoff between these (and the ability to choose this tradeoff correctly early in the cycle) often defines the success of a project or an organization. Focusing on one while disregarding others can be lethal.
And yet, the large organizations with R&D firms—making the most exciting greenfield software—almost exclusively focus their engineering efforts on squeezing the most software out of the lowest amount of hardware. They care deeply about every CPU cycle.
You could argue that the fallacy of premature optimization has fallen many a line of business app. But if you're an engineer and you aspire for greatness, microoptimize. That's the rarer, harder-earned, harder-to-learn skill.
1. Modern CPUs have "mini OS-es in them", difference between two generations of CPUs from the same manufacturer, even two models in the same genration, can mean that approach A is faster than approach B, let alone cross platform and cross architecture optimization.
2. Stuff you're running is usually not running in a vacuum but as a part of a bigger system, up front micro optimizing without measuring is again blind guessing which ends up being wrong and a waste of time.
Micro optimization is a relic of the past, something that embedded/console/driver developers working on specific hardware need to care about - for the rest this stuff it's just a pointless guessing game - I've seen a few old C developers stuck in their ways try to do this to look smart and when you actually go to measure it ended up being wrong because their assumptions are outdated. Spending your time on high level system optimization seems much more worth while (picking the right algorithms, exposing the right kind of interfaces for efficient consumption, etc.).
> embedded/console/driver developers working on specific hardware need to care about
In other words, things that great engineers need to care about :)
I'm not an embedded developer, but to reiterate, I'm just advocating for a focus on performance if you aspire to be a great engineer. This is different than being a great businessperson, or a great software developer kit / library designer.
>In other words, things that great engineers need to care about :)
Really ? I feel like most cutting edge engineering is done in building distributed systems, ML, high performance computing, etc. Micro optimization is where it was last century.
>But, when you have to ship 50k IOT devices and they have to be under $20.
You're working on couple M $ project total ? Meanwhile a guy shaving off a % off performance in a FB data center saves them millions a year in power consumption alone.
But FB guys don't micro optimize (they use PHP FFS), they measure production systems and look for bottlenecks. Premature optimization is a gut feeling based guessing game not quality engineering.
FB engineer here; worked on HHVM for several years. We absolutely do "micro-optimize", e.g., our string hashing code is written in hand-tuned assembly.
Where we apply those optimizations is, of course, driven by profiling, so those statements aren't entirely in conflict.
Well they are in the context - the article talks about premature optimization - and the OP argues you should micro optimize to be a good engineer.
What I'm saying is the only way to do optimization correctly is by doing the things you and everyone else is doing when it matters - measure before and after - the rest is just assumptions that rarely pan out the way you expect them to - at least that's my experience with optimization when I was still doing game dev. You can really do micro optimization up-front
And I've heard it said that lowering latency is harder than increasing throughput ... though whether it's more important probably depends on the context.
The Facebook app is a nightmare of terrible design that has something like 100Mb of generated code to do something that should take a couple megabytes at most. And they are a large organization with R&D by any definition.
To your particular point, Facebook maintains multiple versions of its client, including a "basic mobile" version (https://mbasic.facebook.com) and a "touch" version (https://touch.facebook.com), among other idiosyncrasies between its various platforms. They put in a lot of engineering effort to make the best client for the appropriate device.
With respect to the backend of their main application, they created things like Cassandra to meet their unique database needs.
But in terms of R&D, you could argue that Oculus is their major public R&D project. That project requires lots of performance optimization. Indeed, that is their greatest obstacle.
Maybe you could call that computer graphics, but that's missing the point.
Making the sensors and signal processors work on battery-powered hardware (the end goal) is really challenging. And then consider all the backend services that will serve these peripherals.
This Premature Optimization Koan is really just a side effect of enumerative thinking. Like if you think about the problem hard enough, you'll be able to enumerate all of its aspects and select an appropriate solution.
That could mean enumerating all the possible applications of high performance software, or it could mean enumerating all of the strategic goals of a piece of business software and identifying that performance isn't one of them.
In the former case, you're going to feel like you've enumerated enough (e.g., three examples) and then come short of the actual space of high-performance problems.
In the latter case, in my personal experience developing a line of business application, it turns out that performance eventually starts to matter. It sort of becomes the hardest problem, like people have said elsewhere.
I'm sure at Facebook they share your concern. They probably work extremely hard to reduce the huge amount of code their iOS client is built from. I'm advocating that people train to improve performance, not to ignore the problem or dismiss it as "premature optimization," at least if they aspire to be engineers.
I do game for a living. I guess I assumed that people knew that games were optimized, though they usually don't fall under "enterprise."
Oculus falls under games for me, which is unusual to find in an enterprise company, and frankly probably isn't being run much like the rest of Facebook anyway, so the point is rather moot.
>That could mean enumerating all the possible applications of high performance software, or it could mean enumerating all of the strategic goals of a piece of business software and identifying that performance isn't one of them.
I'm a big fan of optimization, and yet I've done exactly the latter and come to the conclusion that a particular admin backend that would never have more than, say, a couple dozen users EVER would be fine to be written in Ruby. It was an internal tool, and the available engineers were efficient in Rails, so I gave a green light for them to write the admin tools in Rails. There wasn't even a debate. I just said yes, go.
But when you're writing a service that should have a solid front-end that can scale, you need to design scalability into the architecture. It seems to rarely happen.
>With respect to the backend of their main application, they created things like Cassandra to meet their unique database needs.
Cassandra is good and well optimized. Point.
>In the former case, you're going to feel like you've enumerated enough (e.g., three examples) and then come short of the actual space of high-performance problems.
Those are the things that came to mind. No, I'm sure I didn't hit all possible high-performance issues. But that's probably the 80th percentile.
And at least 95% of "enterprise" problems are not treated with the respect they should be toward optimizing. That I have from several personal sources.
>They put in a lot of engineering effort to make the best client for the appropriate device.
They put in 100x as much engineering effort as they should have needed to to create a solid mobile app, and the result is a monstrosity of garbage.
Several indie developers have produced replacement Facebook apps that are far better, faster, and would actually be more robust if Facebook weren't actively trying to prevent them from working. So you can't tell me you need a hundred developers writing generated code to create two apps (the main Facebook app and Facebook Messenger; if you want to read messages, the main app has that DISABLED with a nag screen to download Messenger if you want to read messages) are actually even remotely competent (as a group) if some developer in their bedroom can crank out a better app in their spare time.
I've also written an entire app from the ground up that had about 80% of the Facebook app's functionality, written against some of the worst backend APIs you can imagine (a mix of non-standardized, inconsistent XMLRPC, JSON, and other APIs) and it took me, writing code entirely by myself, about 5 weeks. To a finished, polished product. And that's at only about 30 hours a week.
>In the latter case, in my personal experience developing a line of business application, it turns out that performance eventually starts to matter. It sort of becomes the hardest problem, like people have said elsewhere.
I don't dispute that they should pay attention to performance. I just dispute that enterprise developers (excluding exceptional situations like Cassandra) actually optimize architecturally. Sometimes they can't for structural reasons, since ideal optimization needs to happen across teams and domains, which can be difficult in an environment where each team controls a domain.
Your statement is absolutely wrong, at least for Google. Google, above anything, put readability and architecture soundness first. And that pays off, as fixing micro performance issue in a sound architecture is always cheap, the reverse, however is expensive. (Fixing problematic architecture while maintaining good performance)
Is that really why programs are slow: that programmers are misinterpreting a quote from Sir Tony Hoare? I thought it was because programmers tend to use fast computers, and they neglect to test on slow ones. Also they load their databases with test data that is much smaller than what the average user might have after using the program for a year or two.
Both Knuth and Hoare would never think that carefully choosing the optimal or near-optimal architecture or algorithm is premature.
Unless the number of things you are processing is small and it doesn't matter what you do, you may as well do the easiest thing. Or, if the optimal thing is super hairy, are you sure it will be worth it?
They are just arguing for a bit of common sense: identify the most important performance issues, and make sure you work on those. There are all kinds of optimizations that people bring up that are way down the list in terms of impact.
Whether to optimize or not is still no excuse to write poor code or to put no thought in your code at all. I deal with both ends of this all the time :(