Hacker News new | past | comments | ask | show | jobs | submit login
Rules of optimization (humus.name)
281 points by benaadams 9 months ago | hide | past | web | favorite | 162 comments

> But if performance work has been neglected for most of the product development cycle, chances are that when you finally fire up the profiler it’ll be a uniformly slow mess with deep systemic issues.

Totally true, and I have observed it IRL on large, old projects whose architecture was hostile to performance. And these products were doing computationally intense stuff.

> Performance is everyone’s responsibility and it needs to be part of the process along the way.


Everyone repeats "only optimize after profiling". It's true: if your program is running too slowly, you should always profile. But that rule only applies when your code is already slow. It doesn't actually say anything about the rest of the development process.

Developers should have a solid grasp of computer hardware, compilers, interpreters, operating systems, and architecture of high performance software. Obliviousness to these is the source of performance-hostile program designs. If you know these things, you will write better performing code without consciously thinking about it.

If this knowledge is weighed more heavily in the software dev community, people will put in the effort to learn it. It's not that complicated. If the average dev replaced half of their effort learning languages, libraries, and design patterns with effort learning these fundamentals, the field would be in a much better place.

Honestly, I think you're setting too high of a bar.

If developers learn 2 basic things:

1. How SQL queries work

2. Network latency exists. Moving stuff between your database and your server can get expensive.

You'll solve _most_ of the issues that I've seen in my career to this point. There are a few big ones that came from abusing API's too, but it's really the same root issue:

You're either thinking about how much information is being moved around or you're not.

For CPU bound stuff, the 80%/20% rule is to learn that an array/vector is typically much faster than a linked list, and this is due to how CPU caches work. (this is thinking about how much information is being moved around, but between the processor and RAM)

Also, for parallel stuff: locks tend to be bottlenecks. Writing parallel code is tricky overall. Stick to libraries that provide high level lockless data structures like mpsc queues if possible.

Who uses linked lists? Outside of niche algorithms that splice lots of lists together there are practically zero occasions where linked lists beat dynamic arrays.

I think it's a cultural thing rather than a technical one. I haven't used a linked list in 15+ years, because in C++ a std::vector is the 'default'. Before that, I wrote (some) C for Gnome, and the 'default' in Glib is a linked list. I don't know/remember if there is a reasonable, easy to use data structure that wraps a dynamically allocated array in Glib, but most of the 'sample' code at the time used the glib list. So that's what everybody kept on doing.

I've seen LinkedList often used as the go-to list in Java, instead of ArrayList. They're also common in C, for example, a huge number of structures in the Linux kernel are linked lists.

> for example, a huge number of structures in the Linux kernel are linked lists.

Which allocator is used for these? I'm not familiar with the Linux kernel, but since there is no malloc() in the kernel, I would guess that they allocate from an arena that's going to be page-aligned and thus exhibit the same locality characteristics as a vector.

Oh, they have malloc, it's just kalled kmalloc.

I think GArray is the wrapper for a dynamically allocated array.

Ah yes it is, thank you. I should have know about this 15 years ago :)

What about any time you're working with frequent adds/deletes in arbitrary positions, rather than just at the end of the list?

That's not a "niche algorithm" to me, but rather a reasonable use case. The add and delete operations should be faster than for the array case.

(Not that I actually use linked list much myself, but I mean I see the use of them.)

If the list fits in cache it is often to faster to insert into a dynamic array and shift the elements. It of course depends a bit on the allocator that you use how expensive creating a new list node is.

That's a good point, thanks.

Even then, linked lists are only faster than arrays if the list has at least 100s of items.

That's pretty much a standard use case. IIRC the rule of thumb is that if the container isn't expected to hold many elements then the default choice is to use std::vector no questions asked. Otherwise, random insertions and deletions in a sizable container requires a std::list.

I use deques A LOT, e.g. interthread messaging, and decaying/windowed time series calculations.

Like, if you want a running mean of the last 250ms, you put a timestamp and value pair in the back a deque, and every update you check the front of the deque to see if it should be popped (i.e. the timestamp is older than 250ms), and the mean updated.

I suppose you could use a circular buffer with a vector as well, but you have to guess what your max size will ever be, or handle dynamic resizing. Maybe it would be worth it in some circumstances.

If those lists are anywhere on your hot path benchmarking with a deque build from two vectors, or a circular buffer as you suggested, would be worthwhile imho. The constants in linked lists are such that the O(1) operations are often slower than the O(n) operations in vectors, at least for lists that fit in cache, and depending on your allocator etc. Chasing pointers to more or less random memory locations is pretty much the worst case workload for a modern CPU.

Pretty much anyone using a functional language or immutable data structures.

The default immutable "list" types in current mainstream functional languages (eg. Scala, Clojure) have much better performance characteristics than naive head:tail linked lists. A typical implementation is a n-ary balanced tree where n is typically on the order of 100, making operations "effectively O(1)" and also keeping cache behavior acceptable in many common cases.

Haskell still uses head:tail linked lists a lot. However, a lot of effort has been spent in optimizing away the data creation entirely. For example `sum [1..100]` will not actually allocate the list in memory.

Linked lists are very common, especially for non performance critical code. Dynamic arrays either carry a large performance cost at expansion, or is rather complicated to implement in a amortized way. Linked list in comparison is simple, and since it doesn't require a large continues block of memory, the memory pressure is smaller.

> Who uses linked lists?

Isn't the linked list pretty much the main collection data structure in the Linux kernel?

A dynamic array has an average (armotized) time of O(1) but a worst case of O(n) – whenever your next insert is longer then the size of the list. Furthermore, it needs a lot of ram whenever copying elements. A linked list is always O(1) and its ram usage is consistent (although not better – normally it is twice or three times as big).

A linked list is basically only better on embedded systems because you have much better knowledge about the used ram/runtime. It’s also useful if you have a very large data structure and you need real time performance (a lag of a few hundred milliseconds are unacceptable for some applications such games or financial applications). But whenever your insert becomes too expensive, a linked list is probably not the best choice either and you should consider why your data structure is so big.

Lastly a linked list is rather simple to implement but that’s also only really useful for embedded stuff.

> You're either thinking about how much information is being moved around or you're not.

Not to put too fine a point on it, but I think you meant how often information is being moved around. That's the scenario where latency has a disproportionate effect on performance.

(Bandwidth is usually not a problem these days unless you're doing something at the very high end.)

3. When [not] to use a database.

4. When [not] to use the network

A lot of the performance wins I have been able to pull off (1000x) were from removing unnecessary distribution and unnecessary databases.

I'm curious (maybe some lessons or good horror stories there), what are people using databases for that they shouldn't?

$current_product has a table with 3 rows that is used for config. Every other table has a foreign key to this one. It was designed as a multi-tenant application that never got over 3 tenants. Other tables are over 1billion rows already. The 3-row table is mostly json columns, and is basically untouchable now.

There's more to computing than DB queries over a network.

Of course there is, but we're talking optimization here and most of the 3% is crappy DB queries.

Or just the fact there is no caching involved in a lot of cases, so you end up just banging the db so hard it just ends up with 5 minute response times.

IME when people don't know how sql queries work and how network latency works caching tends to just move the problem, not solve it. Instead of n+1 database queries you end up with n+1 cache queries. I deal with this death by a thousand cuts daily at the moment.

Every tiny bit of data was being loaded as needed, often from EAV tables. Obviously this sucked or performance so we ended up with an elaborate memory cache that is effectively it's own DBMS which uses a real database as a backing store. This didn't really solve the performance issues though because Hash maps might be fast but they are still far from free and doing 10's of thousands at a time is still quite slow.

Fast forward a decade and we have caches in front of caches in front of caches in front of our database instead of just loading courser chunks of data at once.

The example We had to deal with in the real world was the onslaught of visitors during the American Thanksgiving holiday. The company I worked for would post the ads for Black Friday and Cyber Monday. Yeah, nothing like getting >500K RPS on a 2 DB server setup (Main & Backup) that was highly tuned that would drop to it's knees. The traffic we got, didn't really turn in to that much overall profit, which meant, we couldn't have a huge cluster of DB's to handle the load. Having a caching system using Memcached (120 second query cache), we were able to handle that load like it was nothing. We used caching on other parts of the system, but the DB cache is what saved us from having to leave the dinner table every 15 minutes.

There is, but this captures a whole large class of software work - including pretty much the entirety of web development, probably most of mobile development, and also some desktop development.

We need a name for this. The old advice is to ignore performance becaus it’s not important. When it becomes important there is now very little you can do. You get the low hanging fruit (terrible analogy. Ask a fruit grower how stupid this is), or the tall tent poles.

Eventually you have an unending field of nearly identically tall poles, everyone claims victory (we’ve done all we can!). But your sales team is still mad because your product is several times slower than the competing products.

What do we call this? Picket fence? Blanket of doom? Death shroud? Forest for the trees? Stupidity?

The old advice is to ignore performance because it’s not important.

The old advice is to optimize intelligently, based on actual data, not preconceived notions. If your program isn't correct, then it's hard to have valid performance data, so optimizing before your program is correct is problematic. Kent Beck has an anecdote about being called in to optimize the Chrysler C3 payroll system. He arrives, then asks if there is a valid input/output dataset. The Chrysler people tell him that the system isn't producing valid outputs yet. His reply: "Well in that case, I can make it Real Fast!"

(As it turns out, a lot of the performance problems came down to naive string concatenation!...O(n^2)...yadda yadda.)

That's like saying we can't design a fighter jet so we're going to melt a ball of steel into some shape, put an engine on it, and try to fly it, and only then start optimizing. That's not engineering, that's trial and error.

You can absolutely aim for both correctness and performance in your design. In fact performance is (can be) part of correctness, if it doesn't perform to spec then it's most likely not useful. If you don't care about performance, fine, don't design for it and don't optimize.

Your starting point for optimization has to be a solid design with specific performance goals in mind. If you have a solid design maybe you won't need to do any optimization, if you do it's going to be optimizing on top of the design.

That's like saying we can't design a fighter jet so we're going to melt a ball of steel into some shape, put an engine on it, and try to fly it

Very bad analogy. We have 80 years of past knowledge around designing fighter jets. The advice is to use actual data and knowledge, not suppositions. There is actual data around building certain kinds of servers. What I'm advocating is using actual data and experience. What you are advocating is to "melt a ball of steel into some shape, put an engine on it, and try to fly it." There's a wealth of aerodynamic engineering knowledge. Why not apply that and make a better shape, based on past art designing aircraft of the kind you want to build?

However, you shouldn't try to flight test and speed optimize that airplane if it can't yet fly properly, like if it's a vaguely shaped ball of steel with an engine on it. That would be counterproductive, if not dangerous. That is what I'm advising against in my comment and with the Kent Beck anecdote. Are you advocating to flight test and speed optimize that airplane if it can't yet fly properly?

(EDIT: Note that, even with this built up knowledge, there is still a lot of testing done, because often what engineers think they've built is not what they've actually built. Isn't what you're advocating more like just assuming you've built what you think you've built?)

We also have a wealth of knowledge about computer architecture and what software design decisions will result in performance problems or benefits. At a high level, trying to design, eg, a computer vision application that has to upload gigabytes of data to the cloud, will result in poor performance. At the low level, iterating over linked lists is slower than memory contiguous arrays, especially if your data is too big to fit in cache. The low-level knowledge that inlined methods let the compiler eliminate branches and vectorize code more effectively, informs the mid level architects that if everything is done by inheritance you lose this advantage since everything has to go through the vtable.

For too long we've been able to make poor performance decisions and be rescued by Moore's law; no more. We will find in the upcoming years that performance is an architectural decision that can come back to bite us if we don't design for it up-front.

I'm advocating for coming up with a software design that we think can support all the requirements including performance. Not writing random code in the hope that it can be optimized to meet the performance requirements once the other requirements have been met.

The fighter jet design is upfront about all its requirements. Does the design require some tweaks one you start flight tests, you bet, but if it's on the completely wrong path the project is doomed. Just like software projects.

> That's like saying we can't design a fighter jet so we're going to melt a ball of steel into some shape, put an engine on it, and try to fly it, and only then start optimizing. That's not engineering, that's trial and error.

Sorry, but that's a terrible analogy. "Performance", by definition of the context of this general discussion, is outside of spec. A ball of steel with an engine strapped on that doesn't fly isn't within spec. A more appropriate analogy for a fighter jet would be cost optimization (and yes, I realize that's not perfect, but bear with me). The first priority is getting the jet doing what it needs to do. If you design the thing to be made of 90% gold and platinum, then yes, optimizing cost is going to be impossible. But if you keep things semi-reasonable materials-wise, you can (presumably) optimize the manufacturing process post-facto to be more economically friendly.

Performance can't be outside spec. Performance is part of the spec. Maybe it says "don't care" but it's gotta be in the spec and if it says "don't care" it better be you really don't care. If you have performance targets you have to design towards them - don't meet all the other goals and only then start thinking about performance.

Terribly OT, but the 'low hanging fruit' comes from a time where full sized fruit trees were the norm, which is something today's (professional) fruit grower wouldn't know anything about. In old-style orchards (i.e., with full size trees), it's actually quite common to just leave 'high hanging fruit' because there's no somewhat safe or practical way to get to it. Unless you're talking about something else, which I'd be interest in hearing about, as this is quite relevant to me at this time of the year.

It’s not a simple problem to be sure.

I rub elbows with landscapers in one of my hobbies and the general consensus seems to be that leaving fruit or leaves under your tree as pathogen vectors is a bad plan. That it’s better to hot compost everything you can’t grab before it hits the ground. We have orchard ladders (the ones with the flared base) and telescoping handheld pickers and dwarf varieties (trees with better modularity?) that keep most of the fruit within picking distance.

Also leaving ripe fruit in the trees just means the wildlife or storms have a chance to get it before you can.

I don’t have any mature fruit trees now but my plan is similar to my friend who has five apple trees. Harvest as much as I can process at a go (she makes cider). And then whatever the animals don’t get after that is a bonus.

Sure, fallen fruit can cause problems; which is why in 'neat' orchards (only short grass under trees, very little other vegetation that can house predators that eat the insects that are attracted by fallen fruit) you pick as much as possible and pick up and discard fallen fruit. And 'production' orchards use dwarf or semi dwarf varieties anyway, along with planting patterns that let them harvest with telescopic lifts.

My point was, the analogy is still valid. In full size orchards, you'rr not going to be able to get any of the non-low hanging fruit. I know of cherry trees 25+ m high with kilos and kilos of cherries in absolutely unreachable places. And many fruit trees that I know of, only the low hanging fruit is picked at all most years.

My own apple trees are 6ish years old now, I'm hoping I'll get my first proper (if somewhat modest) harvest this year. But even now, I already know I won't be able to reach much of it.

Either way thanks for the comment, I think we're on the same page here - I was just wondering if there was something I missed in your original comment. I always like learning about peculiarities or practices of people in other places, although I think in this case there isn't much different after all.

Now if there were a way to turn all computer problems into low hanging fruit as well...

> The old advice is to ignore performance because it’s not important.

I think you misunderstood the instructions and anecdotes that you were given about how and when to optimize. Alternatively, your definition of "old" in "old advice" is a very short time in the past.

The only yardstick of teaching that means anything is to look at what people go and do with what you taught them.

I spent a big chunk of my early career doing performance analysis work in a language stereotyped as being slow. My thesis was that it was down to developer skill and not the language. By the time someone spoke Knuth's aphorism at me it was people who were standing between me and the strategic goals of the company. Using it as a shield to get out of doing the hard work to match a competitor or customer expectations.

As I quickly discovered, there are a whole set of performance optimizations that also improve readability, and that got around the friction with the foot dragging Knuth quoters. There's a host of other things never show up in the literature. It's a shame, really.

A lot of this is due to misunderstanding the quote and knowing the bounds where it applies. Premature optimization being the root of all evil means those misapplying the addage are using it as a crutch. Good code is both performant and also easy to scale because of how it's designed (as mentioned elsewhere in this thread).

See [0]. It is suspect that most folks ever got it to "Make it right" in the first place. Executive pressure and lack of support from managers needs to be called out as well as contributing to problems with quality of code ad teams oftentimes are not given the chance to make it right under pressure to deliver under unrealistic deadlines.

[0] http://wiki.c2.com/?MakeItWorkMakeItRightMakeItFast

After a few hours to think about this, I’m afraid you may be right about nearly everything here.

But I have a big problem with developers needing permission from business to do the right thing. That’s not how ethics works, and I’ve caught too many devs in the lie when we do get free time and they still don’t want to do the hard work they dodged because of time constraints.

There are people who want to appear high minded but have no interests beyond virtue signaling.

There's some truth to that, but I'm still going to stick by my my first line:

    The only yardstick of teaching that means anything is to look at what people go and do with what you taught them.
I consider "premature optimization" a spectacular failed experiment.

> The old advice is to ignore performance because it’s not important.

> The only yardstick of teaching that means anything is to look at what people go and do with what you taught them.

> I spent a big chunk of my early career doing performance analysis work in a language stereotyped as being slow.

Something doesn't compute with those statements. Can you profile what it might be?

If it ain’t broke don’t fix it. Even though we just spent ten minutes detailing how it’s broken.

As someone else stated, make it work make it right make it fast, but people never get past make it work. They fear changing the code. It’s one of the things I loved most about XP (and I barely got to use it before it was already going out of fashion). Get over the fear.

You replied to the wrong comment. None of that is related to what I wrote.

> My thesis was that it was down to developer skill and not the language.

It's not the language, it's that one first example that came with the language, and everybody has copied it.


...you are not wrong. Lack of imagination then?

Agreed that this advice is not very old. Always reminds me of the Story of Mel


That's always a great read. Thank you.

> Ask a fruit grower how stupid this is

I don't know a fruit grower. Can you tell me?

You want all of the nearly ripe fruit, not the stuff that is easy to reach. It's a tremendous waste of resources (and rotting fruit on the ground can carry pathogens or parasites that hurt the tree next year).

So either you pick all of the ripe fruit, or you pick everything. By shaking the tree and catching everything that falls out.

So the analogy breaks down in this:

That we say "low hanging fruit" to mean "its easy and good enough; it's okay to stop and look for something else".

But a fruit farmer would say "low hanging fruit" to mean "you better design your system to get more than just that".

Given the costs of fruit farming, I think it would also be a terrible idea to come to pick the fruit from your tree twice. You better not go out till you're ready to get it all (whether that's "everything" or "all of the ripe fruit").

As for me, I don't think we need to be accurate though in our figures of speech. When someone speaks of "writing on the wall" it's no objection that someone's fate was announced verbally, or that customarily, writing on the wall has no relation to future events but serves to identify something in the vicinity of the wall. Likewise for "low hanging fruit".

The thing is, I have a strategy that looks a lot like tree shaking, works very well for years at a time, keeps the testers and business people very happy and makes me look like a miracle worker.

You pick one of the slowest modules and you concentrate on fixing all of the performance problems in it. Even the little 2% issues that would never make it onto the schedule. Then you tell your testers to retest “everything” about that one module. You get maybe a 20-30% overall improvement that way. Next cycle you do the next module. And the next. By the time you run out of modules you’ve had 2 years of steady quarterly improvements, 2 years of performance regressions, 2 years for the usage profile of your customers to shift, and 2 years to think up or learn new ideas about performance and code quality. You start over for another cycle of 15% gains.

I think the analogy can still make sense if you think about it differently.

A fruit farmer will collect a lot of fruit at once, but me a fruit eater that is only hungry for one fruit don’t want to collect extra fruit.

So instead of shaking the tree and causing more fruit than I need to drop to the ground, I grab for just one.

Because I am lazy I don’t want to bring a ladder to get one piece of fruit. So I reach for the low hanging fruit :)

Likewise, when I program I am looking for one thing to do. And sometimes that leads me to reach for the low hanging fruit there as well. Personally I like to do low hanging fruit as start of the day work only and then try to do something more important for the rest of the day until I get tired.

Me neither. As a fruit eater, I would definitely prefer picking low-hanging fruit than high-hanging fruit.

But then your yard ends up smelling like a failed brewery, and the dog keeps getting sick.

I always call it "death by a thousand cuts".

The cuts are all O(1). Nothing to worry about.

Software Engineer: After a week of hard work, I halved the runtime of our program for all possible inputs!

Computer Scientist: That's too bad. All that work and nothing to show for it.

In my Google interview, toward the end of the day I told the guy who I really didn’t want to work with anyway:

What’s the hardest thing about software development? Removing the constant time overhead factors from performance critical code.

He didn’t like that answer. But I had kinda already decided he was an asshole.

> The old advice is to ignore performance becaus it’s not important.

I've never heard that "old advice" and I've been doing this for 20 years. If you are referring to the quote about "premature optimization" you are misinterpreting it badly, or had it explained to you badly.

And I hear it all the time. Here on HN, in my work life, all the time I see people arguing that "yes optimization is important, but now is not the time", for any value of "now".

The gamedev people care, because in games, you have hard performance constraints (60+ FPS) and try to cram in as much stuff as possible, so you always have resource use on your mind. Embedded people probably care too (I hear they do, but don't have much personal experience there). Most other branches of this industry couldn't give two fucks about it, as evidenced by software we use daily on the web and our phones. There is no market pressure to optimize, so people can get away with releasing slow and resource-intensive software. General population has been conditioned to think that if an app or webpage works slowly, they need to buy a faster computer/phone. They should instead be concluding that the app is most likely garbage.

(Note that optimization is not just about CPU cycles. Other resources are important. These days, it's more likely you'd exhaust RAM and make everything slow by forcing system to swap too much than it is that your application will use 100% CPU.)

> Embedded people probably care too

Sometimes we do. Sometimes developer time is more important. Sometimes reliability is more important. Sometimes nothing is important(because so many embedded systems are woefully overpowered these days... Dual A9's running linux to do some low-speed IO, etc).

Threads like this remind me that there are many, many kinds of programmers. Web guys face different issues than game devs who face different issues than embedded guys. I like to read these threads but usually leave frustrated at the myriad of comments from people who assume everyone's programming experience is roughly similar to theirs.

I still see more value in a slow but helpful app, than in a fast but nonexistent app, even as a user. Also, making it work first may help in prototyping, and I may even find out that the prototyped idea doesn't work, so I'll scrap it. Speed would again be wasted in this case. Or the customer may say he wants something changed. This is more critical to find out initially.

Paraphrasing Stroustrup, if you're complaining that something is slow, it means it's so valuable to you that you're still using it anyway. (Ok, there are limits to that, but I think you get my point.)

edit: Now that I think of it, I'd say that speed is a spectrum (i.e. non-discrete). One may be at various places on it; and one may have varying needs as to where it's acceptable to be.

> And I hear it all the time. Here on HN, in my work life, all the time I see people arguing that "yes optimization is important, but now is not the time", for any value of "now".

That's literally not what you said or what I was asking about.

But it seems the second part of my comment was accurate: " If you are referring to the quote about "premature optimization" you are misinterpreting it badly"

That's not Knuth's advice. People suggesting that have no sound reason to stand on and should be informed of their ignorance.

To avoid repeating myself, see my other reply.

I can count on my fingers the number of times I've heard Knuth quoted properly (and always to jr developers). And I've been at this for 25.

Architect for performance

terrible analogy. Ask a fruit grower how stupid this is

What about the fruit rotting on the ground? That's one of our favorite analogies.

> Everyone repeats "only optimize after profiling". It's true: if your program is running too slowly, you should always profile. But that rule only applies when your code is already slow. It doesn't actually say anything about the rest of the development process.

Except for unusually throwaway or easy to make fast enough projects, software should also be profiled when it runs fast, before there are performance problems.

Profiling the application from the beginning ensures readiness when profiling is urgently needed, provides a reference point for what good performance looks like, and most importantly allows easy and early detection of performance mistakes and regressions, which can be troublesome and expensive in the long run, growing into "deep systemic issues" even if they are harmless when they first appear.

I thought it was don't optimize the code prematurely, but at the architechture level, scale and performance must definitly be defined at design time.

My experience writing large scale search engine, NLP and computer vision services, generally as backend web services that have tight performance constraints for other client apps that consume from them, is basically fundamentally the opposite of what you say and what you quoted from the article.

It is virtually never the case that profiling reveals a uniformly slow mess of code, and in many cases this doesn’t even make sense as a possibility because you write modular performance tests and profiling tools the same as you write modular unit tests. You would never fire up the profiler and just naively profile a whole backend service all at once, apart from maybe getting extremely coarse latency statistics that only matter in terms of what stakeholder constraints you have to meet. For analyzing cases when you’re not just gathering whole-service stats for someone else, you’d always profile in a more granular, experiment-driven way.

> “Developers should have a solid grasp of computer hardware, compilers, interpreters, operating systems, and architecture of high performance software. Obliviousness to these is the source of performance-hostile program designs. If you know these things, you will write better performing code without consciously thinking about it.”

While it’s good to know more about more things generally, I think the specific claim that any of this will help you design better code in the first place is totally and emphatically wrong.

Instead it leads you down wrong tracks of wasted effort where someone’s “experience and intuition” about necessary optimizations or necessary architectural trade-offs ends up being irrelevant to the eventual end goal, the changing requirements from the sales team, etc. etc.

This happens so frequently and clashes so miserably with a basic YAGNI pragmatism that it really is a good heuristic to just say that early focus on optimization characteristics is always premature.

In the most optimization-critical applications I’ve worked on, getting primitive and poorly performing prototypes up and running for stakeholders has always been the most critical part to completing the project within the necessary performance constraints. Treating it like an iterative, empirical problem to solve with profiling is absolutely a sacred virtue for pragmatically solving the problem, and carries far, far less risk than precommitting to choices made on the speculative basis of performance characteristics borne out of someone else’s “experience” with performance-intensive concepts or techniques. Such experience really only matters later on, long after you’ve gathered evidence from profiling.

It sounds like the services you are describing basically take the form of "evaluate a pure function on an input". I am not going to argue with your stance there. I am more referring to programs that run interactively with some large mutable state (native GUI apps for me). The choices of how that state is structured has a huge impact on the performance ceiling of the app, and is often difficult to change after decisions have been made. Sibling example of a linked-list of polymorphic classes is the kind of thing I'm talking about. Once you have 100,000 lines of code that all operate on that linked list, you are stuck with it.

The services I’m talking about are not as you describe. It’s usually very stateful, often involving a complex queue system that collects information about the user’s request and processes it in ways that alter that user’s internal representation for recommender and collaborative filtering systems. It’s not like an RPC to a pure function, but is a very large-scale and multi-service backend orchestrating between a big variety of different machine learning services.

> “Once you have 100,000 lines of code that all operate on that linked list, you are stuck with it.”

No, I think this really is not true, and as long as the new “performant” redesigned linked list that you want to swap in can offer the same API, then this is a relatively easy refactoring problem. I’ve actually worked on problems like this where some deeply embedded and pivotal piece of code needs to be refactored. This is a known entity kind of problem. Unpleasant, sure, but very straightforward.

You seem to discount the reverse version of this problem which I’ve found to be far more common and more nasty to refactor. In the reverse problem, instead of being “stuck” with a certain linked list, you end up being stuck with some mangled and indecipherable set of “critical sections” of code where someone does some unholy low-level performance optimization, and nobody is allowed to change it. You end up architecting huge chunks of the system around the required use of a particular data flow or a particular compiled extension module with a certain extra hacked data structure or something, and this stuff accumulates like kruft over time and gets deeply wedded into makefiles, macros, and deployment scripts, etc., all in the name of optimization.

Later on, when requirements change, or when the overall system faces new circumstances that might allow for trading away some performance in favor of a more optimized architecture or the use of a new third party tool or something, you just can’t, because the whole thing is a house of cards predicated on deep-seated assumptions about these kludged-together performance hacks. This is usually the death knell of that program and the point at which people reluctantly start looking into just rewriting it without allowing any overcommitment to optimization.

(One example where this was really terrible was some in-house optimizations for sparse matrices for text processing. Virtually every time someone wanted to extend those data structures and helper functions to match new features of comparable third-party tools, they would hit insane issues with unexpected performance problems, all boiling down to a chronic over-reliance on internal optimizations. It made the whole thing extremely inflexible. Finally, when the team decided to actually just route all our processing through the third party library anyway, it was then a huge chore to figure out what hacks could be stripped away and which ones were still needed. That particular lack of modularity truly was caused specifically by a suboptimal overcommitment to performance optimization.)

“Overcommitting” is usually a bad thing in software, whether it’s overcommitting to prioritize performance or overcommitting to prioritize convenience.

The difference is that trying to optimize performance ahead of time often leads to wasted effort, fast running things that don’t solve a problem anyone cares about anymore or lack that critical new feature which totally breaks the performance constraints.

Optimizing to make it easy to adapt, hack these things in, have a very extensible and malleable implementation is almost always the better bet because the business people will be happy you can accomodate rapid changes, and more able to negotiate compromises or delayed delivery dates if the flexibility runs into a performance problem that truly must be solved.

Your points are good. I did not intend to suggest that software developers should make the kind of decisions that lead to "mangled and indecipherable set of 'critical sections'". More that developers who understand their tools at a lower level are likely to choose architectures with better computational efficiency. Simple things like acyclic ownership, cache locality, knowing when to use value vs. reference semantics, treating aggregate data as aggregate instead of independent scalars, etc.

Often these choices do not present a serious dichotomy between readability and performance. You just need to make the right choice. Developers who understand their tools better are more likely to make the right choice.

My main argument is that this kind of background knowledge should be more widely taught. Performance is often presented as a tradeoff where you must sacrifice readability/modifiability to get it. I think this is not true. Choosing good data structures and program structure is not mutually exclusive with having "a very extensible and malleable implementation".

Why do you have 100kloc operating directly on a single shared mutable structure in the first place? That sounds like an architectural choice that's going to make any sort of change far more difficult than it needs to be.

Avoiding abstraction is another form of premature optimisation.

Ironic that you are displaying the exact degree of critique of the design that you should! Lack of thoughts like this are the issue.

Unfortunately this polymorphic list sort of design is far too common.

Much of the code that cares about this architecture is going to be in the implementation of these classes, not in some single and easily fixable place.

The quote you mention is specifically talking about the architecture / design of the optimal program.

Once a design such as a linked-list of polymorphic classes being iterated on in a loop is established, it is hard to refactor.

Those with performance experience would understand that this design will have caching, branch prediction and similar issues.

However, I totally agree with you that the overall implementation should be driven by iterative coding and profiling.

BUT when it is time to choose memory layouts, data models, etc - planning in advance with an eye to performance is very valuable.

How many of us actually have any knowledge of how our compiler works? I only have a rudimentary knowledge of the Java AOT and JIT compilers. It's enough that with some work I can read `PrintAssembly` output but really, I'm not nearly that familiar with the compiler.

I think you're right in that attempting to write code that specifically matches all these things is likely to prematurely optimize. I develop on Linux on an Intel consumer desktop processor, and OS X on an Intel consumer laptop processor, and I deploy to a containerized instance running on a virtualized Intel processor running on an Intel server processor. This lets me get to market quite fast. And I should be trying to figure out the wide instructions on the Intel server and if they're a good idea? I didn't even know about pcmp{e,i}str{i,m} until a couple of years ago and I have used it precisely zero times.

As it stands, 400k rps and logging every request, for an ad syncing platform, and it's not breaking the bank. Spending time to learn the intricacies of the JVM is not likely to have yielded improvements.

You might find this set of slides from Daniel J. Bernstein interesting

'The death of optimizing compilers'


Key quote:

"Except, uh, a lot of people have applications whose profiles are mostly flat, because they’ve spent a lot of time optimizing them.”

— This view is obsolete.

Flat profiles are dying. Already dead for most programs. Larger and larger fraction of code runs freezingly cold, while hot spots run hotter. Underlying phenomena: Optimization tends to converge. Data volume tends to diverge.

I think the issue here is that the large scale systems you cite usually have performance being dominated by the relatively high cost of making a network hop. They dominate runtime to such an extent that unless you are writing code with which is >= O(n^2), your runtime is almost entirely dominated by network latency and the number of hops you make. Also many problems in these domains are embarrasingly parallel and you can effectively summon large number of machines to hold state for you in memory. Where it does get interesting is situations like games / garbage collectors where you want to minimise latency. On such problems you usually cant bolt on performance after the fact.

No, this is not at all what I’m talking about. I’m specifically talking about cases with tight latency requirements after the request has been received. Like receiving a POST request containing many images, and invoking complex image preprocessing with a mix of e.g. opencv and homemade algorithms and then feeding into a deep convolutional neural network. Latencies of 2-3 seconds per request might be the target, and a naive unoptimized implementation of just the backend code (totally unrelated to any network parts) might result in something 10x too slow.

In that case, it’s absolutely a best practice to start out with that 10x-too-slow-but-easy-to-implement-for-stakeholders version, and only harden or tighten your homemade computer vision code or use of opencv or some neural network tool after profiling.

> “Where it does get interesting is situations like games / garbage collectors where you want to minimise latency. On such problems you usually cant bolt on performance after the fact.”

No, that’s exactly the sort of case I’m talking about. It’s not only possibly to refactor for performance after the fact, but dramatically easier to do so, less risky in terms of optimizing for the wrong things, and less prone to re-work.

> I’m specifically talking about cases with tight latency requirements after the request has been received.

Were these requirements only added/measured after working on the system for a year or two? Or were they there from the start?

Many were there from the start, and then some changed a few months later, and then the whole thing changed radically a year after that... which is part of the point. Business-case software is always a moving target.

So the intended latency of requests in your case is 2-3 seconds?

In one particular application, the required backend calculations were so difficult that a workflow with a queue system was created that would give around 3 seconds as an upper limit on the request time before negatively affecting user experience (it involved ajax stuff happening in a complex in-browser image processing studio).

The type of neural networks being invoked on the images have a state of the art runtime of maybe 1 second per image, after pulling out all the complex tricks like truncating the precision of the model weights, and using a backend cluster of GPUs to serve batch requests from the queue service, which adds a huge degree of deployment logic, runtime costs, testing complexity, fault tolerance issues, etc.

The naive way, using CPUs for everything and not bothering with any other optimizations, had a latency of about 10s per request. Definitely too slow for the final deliverable, but it allowed us to make demos and tools within which stakeholders could guide us about what needed to change, what performance issues were changing, usability, getting endpoint APIs stable, etc.

It would have slowed us down way too much to be practical if we had tried to keep an eye out for making our whole deployment pipeline abstract enough to just flip a switch to run with GPUs months before that performance gain was necessary or before we’d done diagnostics and profiled the actual speedup of GPUs for specific use cases and accounted for the extra batch processing overhead.

And this was all just one project. At the same time, another interconnected project on the text processing side had a latency requirement of ~75 milliseconds per request. And it was the same deal. Make a prototype essentially ignorant of performance just to get a tight feedback loop with stakeholders, then use very targeted, case-by-case profiling, and slowly iterate to the performance goal later.

That is sensible advice, but sometimes you cannot slowly iterate and getting to your performance target from a simple prototype realistically requires a step change in your architecture. Often you know enough of your problem that you can determine analytically that certain approaches will never deliver sufficient performance. In that case you might want a performance prototype after you created a functional prototype, then iterate by adding functionality while checking for performance regressions.

There must be some misunderstanding here, because what you're saying in this post and others just doesn't add up...

Specifically you've said that:

* it's wrong to think that knowing about HW, compilers/interpreters, OSes and high-perf architecture will in fact help you "design better (performing?) code in the first place".

* changing goals from the sales team has a big impact on the performance optimisations and this happens frequently

* in your experience taking a poorly performing prototype and optimising it iteratively was a successful strategy. Poorly performing, as an example, might be 10x too slow.

* "It’s not only possibly to refactor for performance after the fact, but dramatically easier to do so, less risky in terms of optimizing for the wrong things, and less prone to re-work."

My experience doing C++, Java, JS mostly on embedded and mobile does not match your above statements. I'm not an optimisation expert and in fact this is not even my main work area, but I've seen plenty of catastrophic decisions which hobbled the performance of systems and applications without any hope of reaching a fluid processing workflow besides an effort-intensive rewrite. Furthermore, in those situations the development team was significantly slowed down by the performance problems.

In one case an architectural decision was made to use a specific persistency solution in a centralised way for the entire platform and this became a massive bottleneck. Good luck optimising that iteratively, sequentially or any other way. Languages can have a huge impact on performance: in one project an inappropriate language was chosen; non-native languages usually trade memory usage for speed, and this caused problems with the total memory consumption that were never really fixed. I would argue that memory is a typical problem in embedded, along with disk I/O, GPU usage and everything else, not even close to all performance problems can be reduced to CPU usage (or using SQL properly as someone else unhelpfully quipped).

How many LoC do your projects typically have and what kind of languages and technologies do you use? I'd be interested to hear if the heavy lifting is done by library code or your own code and if you're able to easily scale horizontally to tackle perf problems.

And... how do changing sales goals every two weeks affect performance? Is this some stealth start-up situation? :)

The performance antagonistic design they’re talking about can’t benefit from what you’re talking about. Worse, it’s usually antagonistic to other things like well-contained modifications.

For instance, look at the Big Ball of Mud design. When you are looking at a Big Ball of Mud, these per module performance tests are difficult or impossible to maintain. It’s quite common to see a flame chart with high entropy. Nothing “stands out” after the second or third round of tweaking. Yes there are ways to proceed at this point, but they don’t appear in the literature

I realized in responding to someone else that I typically employ a strategy pretty similar to yours. That is, going module by module, decoupling, fixing one compartment at a time, and leaving it better tested (including time constraints) for the next go round and to reduce performance regressions. On a smallish team this works amazingly well.

Haven’t had as much luck with larger teams. Too many chefs and more churn to deal with.

You've made no mention of what strikes me as a pretty central requirement: at least a basic understanding of data-structures and algorithms.

It's a plus to have an understanding of advanced topics like the workings of compilers, but when the foundations are lacking, you're shot.

When developers lack a basic grasp of basic data-structures, it harms not only performance, but complexity, readability, maintainability, code-density, and just about everything else we care about.

1999 called. It wants its yellow text on a blue background back. (On the other hand, today's hipster medium grey text on a light grey background isn't an improvement.)

Knuth made his comment about optimization back when computing was about inner loops. Speed up the inner loop, and you're done. That's no longer the case. Today, operational performance problems seem to come from doing too much stuff, some of which may be unnecessary. Especially for web work. So today, the first big question is often "do we really need to do that?" Followed by, "if we really need to do that, do we need to make the user wait for it?" Or "do we need to do that stuff every time, or just some of the time?"

These are big-scale questions, not staring at FOR statements.

A lot of optimizing advice from old misses some of the realities of old. Things generally ran slow. Performance problems were much more in your face and very common. There was a wealth of "tricks" to do with performance, and people made code difficult and brittle in the name of optimization tricks. So there was a bit of push back and a lot more focus put on getting things correct / simple / well designed. But, because things were slow, and you often didn't need to profile, because often things were in your face slow. So you still needed to get it fast, and while you may not have jumped to tricks too quick, you were still quite well aware of designing for performance. You had to consider data structures carefully.

These days, things are often just not so in your face and a lot of "don't worry about it" advice. Which is often not bad advice in terms of getting things working. But eventually you do find you have to worry about it. Sometimes those worries happen far too late in bigger projects. So I think these Rules are pretty good. I'd also add in benchmarking very simple aspects of the toolset you are using to get an expectation of how fast things should be able to go. Often I have found (especially in the web world) someones expectation of how fast they think something can go is way too low because they have already performanced bloated their project and think its approximately normal.

Replace optimization with security, good design, or any other important facet of software engineering, and you have the same story.

Good software is a multifaceted effort, and great software takes care of the important parts with attention to detail where relevant: great games libraries don't add significant overhead to frame time, great filesystem libraries don't increase read time, great security libraries don't expose users to common usage pitfalls creating less secure environments than had you used nothing at all.

It happens to be that optimization gets deprioritized at the expense of other things, where "other things" in this context is some category I fail to pin down because PMs don't give a shit about what that other category could be, and instead just care that whatever you're working on is shipped to begin with.

Great software developers will respect the important parts, and still ship. And yes, it's always easier to do this from the start than it is to figure it out later. Many things in life are this way.

I have a soft spot for performance, though, so I care about this message. One day hardware will reach spacial, thermal, and economic limits and when that day comes, software engineers will have to give a shit, because too many sure as hell don't seem to give a shit now.

The Rule of Everything in Software Development:

Everything should be subject to cost/benefit!

Corollaries: (...don't do that!)

    - If you paint yourself into an untenable corner, you lose! 
    - If you plant a time bomb that explodes and kills you, you lose!
    - If you sink yourself too deeply into technical debt, you die! You lose!
Avoiding pre-optimization is just the flip side of the intelligent cost/benefit analysis coin. Don't start writing hyper-optimized code everywhere. However, also don't architect your real-time game system such that making string comparisons is inextricably the most common operation.

Here's a solution that I've found. It's not foolproof, but it will get you out of trouble quite often. Code such that you can change your mind. HN user jerf's "microtyping" technique in Golang let me do that with my side project. I committed the egregious error I mentioned above, of using string identifiers for the UUID of my MMO game entities. But because I followed the "microtyping" technique and did not directly use type string, but rather used ShipID, I could simply redefine:

    //type ShipID string // old bad way
    type ShipID struct { // new way
        A int64
        B int64
Where possible, your code changes should be bijections. Renamings should be bijections and never surjections. The bigger your changes, and the more automation is involved, the more you should hew to bijective changes.

what do you mean by bijection/bijective in the context of refactoring and iteration?

Remapping ShipID from string to a struct of two 64 bit integers is bijective. No information was lost in that transformation, and I can reverse it at will. The following is a bijection. It is reversible.

    A ---> a
    B ---> b
The following is not a bijection. It is not easily reversible.

    A ---> b
    B ---> b
So for example, if I rewrite all occurrences of "ShipID" to be "string," those occurrences get lost among all of the other uses of "string." I could always go back to git and back out the change, but what happens in the future, when those changes have become interspersed with other changes?

Presumably GP meant that the mapping from old ShipID to new ShipID must be injective. Bijective would also work, but if the mapping were merely surjective there could be multiple entities in the old code mapped to a single entity in the new code. That seems sure to cause problems.

Actually, a surjection might result in a refactoring in the pedantic sense. Also, some refactorings work out to be surjections.

Fully agreed. I feel for teams working on something that requires all of this. Things are much easier when you do less. For many many reasons.

This matters. I write emulators. (join me: https://news.ycombinator.com/item?id=17442484)

If you want to run something like Windows 10 in a full-system emulator that is something like valgrind, performance matters. For each instruction emulated, you might need to run a few thousand instructions, and you can't get magical hardware that runs at 4 THz. Right from the start, you're deeply in the hole for performance.

Consider the problem of emulating a current-generation game console or smartphone. You can't give an inch on performance. You need to fight for performance every step of the way.

Just recently, I did a rewrite of an ARM emulator. It was all object-oriented nonsense. I haven't even gotten to profiling yet and the new code is 20x faster. Used well, C99 with typical compiler extensions can be mighty good.

It's important to have a feel for how the processor works, such as what might make it mispredict or otherwise stall. It's important to have a feel for what the compiler can do, though you need not know how it works inside. You can treat the compiler as a black box, but it must be a black box that you are familiar with the behavior of. When I write code, I commonly disassemble it to see what the compiler is doing. I get a good feel for what the compiler is capable of, even without knowing much about compiler internals.

Just recently, I did a rewrite of an ARM emulator. It was all object-oriented nonsense. I haven't even gotten to profiling yet and the new code is 20x faster.

The key thing in optimization is summed up by the old adage "you can't make a computer run faster; you can only make it do less." All of those things that make the dev experience nicer use CPU cycles. The more cycles that are needed, the longer your app takes to do things, and the slower it seems.

For most apps that doesn't actually matter because your app is idle 99.9% of the time, but it's good to be able to fix things in when it does.

I always remember this write-up from the original author of GNU grep: https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug.... "The key to making programs fast is to make them do practically nothing. ;-)"

> Not every piece of software needs a lot of performance work. Most of my tweet should be interpreted in the context of a team within an organization, and my perspective comes from a rendering engineer working in game development.

Tempers his universals a bit.

In general, when working on web apps, which is mostly what I do, you don't gotta be quite that ambitious I think. On the other hand, you can't be _totally blind_ either, I've worked on some web apps that were disasters performance-wise.

But in general, while I gotta keep performance in mind all the time (okay you're right OP), I don't really gotta be _measuring_ all the time. The top 3% usually totally gets it.

But, when I worked on an ETL project -- performance performance performance all the way. Dealing with millions of records and some expensive transformations, the difference between an end-to-end taking 6 hours and taking 4 hours (or taking one hour! or less!) is _huge_ to how useful or frustrating the software is. And I had to start at the beginning thinking about how basic architectural choices (that would be hard to change later) effected performance -- or it would have been doomed from the start.

Certainly a game engine renderer is also more the latter.

But I don't know if you need _that_ level of performance focus on every project.

I also work mainly on web stuff. I always think it's good to have a plan for how you would improve performance if necessary, but not necessarily actually do it. I'm currently working on a Laravel app, and beyond writing sensible queries, I've done little performance optimisation. But I have a good idea how I would if I needed to (probably rewrite the only 2 api's that are performance sensitive in a faster language/framework!)

Interestingly it was working on an ETL project that drove me to learning Rust (my first low-level language). I tried to implement it in node, and it wasn't anywhere close to being fast enough.

It depends on what level you think is important. It’s all contributing to user experience at the end of the day. Whether you stop at making your site load fast. Run at 60fps. Or make sure that the user is able to do more than run your website with their machine.

Web in particular as an ‘OS’ sitting on top of an actual OS often gets it in the neck for being a resource hog and it’s not just the browser teams at fault there. The underlying assumption of do whatever, things are good enough and we have enormous amounts of processor power doesn’t help.

I’d argue that if we’re truely making software for people it should have as small an impact on their machines as possible. Or how much energy would we save globally if our code in data centres was more efficient?

The real problem succinctly: people think that quality and quantity are mutually exclusive. Or to go further, that those are also mutually exclusive with inexpensive. That's why the industry is flooded with bugs and low quality, cheap labor. Note, I did not say "junior devs" because I've often found the attributes of a high quality developer are more innate (albeit possibly dormant) than taught. If I can write code twice as quickly, it performs twice as well, and it's much more maintainable, I have a real hard time emphasizing with anyone's legacy/performance woes. It's like people forget the word premature in the common optimization quote. It's not premature to develop with an optimized mindset because it rarely costs anything more than your more expensive salary. You can have a reasonably optimized mindset without needing empirical proof on all but the most nuanced problems.

People who don’t like the word optimization carve out large tracts of land from that territory and call it Design or Architecture or even capacity planning.

And I agree with you on quantity ‘vs’ quality. A team that is faking it the whole time will never make it. Building the capacity of the team to deliver code at a certain quality bar (that is, to stop faking it) keeps the project from grinding to a halt in year three or four.

A refreshing and sane way to think about performance, in a culture of "performance and optimisation are evil and useless; now let us ship our 12MB webpage please".

>performance and optimisation are evil and useless; now let us ship our 12MB webpage please

Web 2.0 in a nutshell

I am not sure how to picking a random metric is going to help. It is better to work towards performance metrics that matter to end user.

This is not "random metric", and I find that talking too much about "metrics that matters to end user" in abstract is muddying the waters. Here are the important performance metrics for end users:

- It's slower than my speed of thinking / inputting. If I can press keys faster than they appear on the screen, it's completely broken.

- I can no longer have all the applications I need opened simultaneously, because together, they exhaust all the system resources. Even though I have a faster computer now, and 5 years ago, equivalent set of applications worked together just fine.

- My laptop/phone battery seems to be down to 50% in an hour.

- Your webpage takes 5+ seconds to load, even though it has just a bunch of text on it.

- Your webpage is slowing down / hanging my browser.

- I'm on a train / in a shopping mall, and have spotty connection. I can't load your webpage even though if I serialized the DOM, it would be 2kb of HTML + some small images, because of all the JS nonsense it's loading and doing.

Most users cannot distinguish when websites are hogging resources. Metrics like the first paint, first meaningful paint and time to interactive are usually more important than memory usage.

Memory usage sure will help but it is mudding water more than more specific user-centric metrics. Sometimes for business applications(Slack) feature richness and in-app performance is more important than memory usage. It all depends on website/application and throwing tantrum about memory usage do not help.

> Most users cannot distinguish when websites are hogging resources.

Of course they can. The user happily works on their computer. They open your site. Their browser (or entire computer) starts to lag shortly after. Doesn't take a genius to connect the dots, especially after this happening several times.

> Metrics like the first paint, first meaningful paint and time to interactive

Those are metrics for maximizing amount of people not closing your site immediately; not for minimizing the misery they have to suffer through.

> Of course they can. The user happily works on their computer. They open your site. Their browser (or entire computer) starts to lag shortly after. Doesn't take a genius to connect the dots, especially after this happening several times.

Most users have 20 and more websites/tabs open at the same time. Most users do not even know what is a memory. Funny enough for most users, after initial load JavaScript rich SPA will be perceived as more snappy than a server-side rendered page but it will use more resources on the client side.

> Those are metrics for maximizing amount of people not closing your site immediately; not for minimizing the misery they have to suffer through.

These metrics are what is important for users. Users do not care how much resources your application consumes. What matter is performance perceived by the users.



Speed matters to end users. 12 Mb will not load in under a second for most users. Unless you absolutely dominate the field or have some specific redeeming feature, your users will move to your competitor services.

Yes! Very much this. This is a lesson that, for example, Apple learned the hard way with Tiger. They now have dedicated performance teams that look at everything throughout the release cycle.

I'd like to refine the advice given a little bit, an approach I like to call "mature optimization". What you need to do ahead of time is primarily to make sure your code is optimizable, which is largely an architectural affair. If you've done that, you will be able to (a) identify bottlenecks and (b) do something about them when the time comes.

Coming back to the Knuth quote for a second, not only does he go on to stress the importance of optimizing that 3% when found, he also specifies that "We should forget about small efficiencies, say about 97% of the time". He is speaking specifically about micro-optimizations, those are the ones that we should delay.

In fact the entire paper Structured Programming with goto Statements[1] is an ode to optimization in general and micro-optimization in particular. Here is another quote from that same paper:

“The conventional wisdom [..] calls for ignoring efficiency in the small; but I believe this is simply an overreaction [..] In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering."

That said, modern hardware is fast. Really fast. And the problems we try to solve with it tend towards the simple (JSON viewers come to mind). You can typically get away with layering several stupid things on top of each other, and the hardware will still bail you out. So most of the performance work I do for clients is removing 3 of the 6 layers of stupid things and they're good to go. It's rare that I have to go to the metal.

Anyway, if you're interested in this stuff, I've given talks[2] and written a book[3] about it.

[1] http://sbel.wisc.edu/Courses/ME964/Literature/knuthProgrammi...

[2] https://www.youtube.com/watch?v=kHG_zw75SjE&feature=youtu.be

[3] https://www.amazon.com/iOS-macOS-Performance-Tuning-Objectiv...

Having optimized others code before. https://magento.stackexchange.com/a/13992/69

Those who slam spaghetti code. Posit: Lasagna is just spaghetti flavored cake. Too many abstracted layers to fulfill a request. Down the layers, & back up, just to fulfill a simple request.

Autoloading can be expensive as well for I/O until cache is involved, especially with large code pools. Caching isn't optimization.

Trim the fat, reduce layers, use leaner pasta & or less special sauce.

If you've started tweaking costs/time from the ingredients & cooking process (lower level, OS, daemons, etc.) to start then move onto frameworks/libraries; it's less to consume & healthier. Ie., cache invalidation is much faster which should be at the end of your optimization journey.

Fundamentals are being abstracted away daily, while it's great for rapid prototyping & easier maintenance of code. It's imperative to understand problem spaces before delivering a solution.

A very highly recommended book is:

The Elements of Computing Systems: Building a Modern Computer from First Principles


The demo scene is a very good example to attempt to mimic with the culture of optimization. It's been bragging rights way early on. https://www.pouet.net/ granted hardware is much faster than it was back in the olden days. But a quick note from admiring their creations you get a taste for older hardware & milking every cycle you can. Maybe forcing younger up & coming devs to use older hardware means they'll appreciate it more & you'll know on current hardware it'll fly.

Another thing to think of is the von Neumann bottleneck & how it was solved (partially) with cache levels or Harvard architecture. I/O is a basis for optimization.

"There's no sense in being precise when you don't even know what you're talking about."


With all of the above said, I'm agreeing whole heatedly with the article & yourself as it's nice to see others focused on this as it seems to be a dieing breed.

I'll leave this here for web developers: http://motherfuckingwebsite.com

"Premature optimization is the root of all evil" is an amusing quote with some truth to it, but it's brought up as some kind of absolute law these days.

I've seen it given as an answer on StackOverflow, even when the question is not "should I optimize this?" but more like "is X faster than Y?"

We need to stop parroting these valuable, but not absolute mantras and use common sense.

Crucially people miss the word "premature", that I interpret as meaning that you should not optimize before you've measured that the piece of code you're considering is actually causing performance issues in your program, but if you have the data then go for it.

Agreed. They're going by the heuristic that every case they encounter is probably premature

The quote really should be something like "optimization without empirical data is the root of all evil" to avoid misunderstandings (first thing that came to mind, I'm sure there's a better but still short description still)

"Fast is my favourite feature" --Someone, maybe from Google? Not sure.

If you look at data relating to user conversion, and users staying on and revisiting websites, fast would seem to be just about everyone's favorite feature!

Can you recommend where I can see such data? It would be a very useful reference.

For any website that's not already deeply entrenched and benefiting from network effects, fast is the only feature.

Both sides have merit. The trick is to find a point in between that works for you. What I tend to do after having to optimize after the fact on numerous projects amounts to:

  - write for clarity with an architecture that doesn't greatly impede performance
  - have good habits, always use datastructures that work well at both small and larger scales
    whenever readily available (e.g. hash tables, preallocating if size is known)
  - think longer about larger decisions (e.g. choice of datastore and schema, communication between major parts)
  - have some plans in mind if performance becomes an issue (e.g. upgrade instance sizes, number of instances)
    and be aware if you are currently at a limit where there isn't a quick throw money at the problem next level
  - measure and rewrite code only as necessary taking every opportunity to share both
    why and how with as many team members as feasible

"write for clarity with an architecture that doesn't greatly impede performance"

I came here basically to say something similar to this. The most important metric is to have a design in the beginning that attempts to identify where the problems (Critical paths at a high level) are going to be and avoids them. That doesn't necessarily mean the initial versions are written to the final architecture, but that there is a plan for mutating the design along the way to minimize the overhead for the critical portions.

Nearly every application I've ever worked on has had some portion that we knew upfront was going to be the performace bottleneck. So we wrote those pieces in low level C/C++ generally really near (or in) the OS/Kernel, and then all the non performance critical portions in a higher level language/toolkit. This avoided many of the pitfalls I see in other projects that wrote everything in Java (or whatever) and the overhead of all the non-critical portions were interfering with the critial portions. In networking/storage its the split between the "data path" and the "control path", some other products I worked on had a "transactional path" and a control/reporting path.

Combined with unittests to validate algorithmic sections, frequently the behavior of the critical path could be predicted by a couple of the unittests, or other simple metrics (inter-cluster message latency/etc).

I find that the code that looks slow often isn't, and the really slow code is always a surprise.

I work on something that uses a lot of immutables with copy-modify operations. They never show up in a profiler as a hot spot. The most surprising hot spot was a default logger configuration setting that we didn't need. Other hot spots were file API calls that we didn't know we're slow.

I think what's more important is to use common sense in the beginning, and optimize for your budget. Meaning: Know how to use your libraries / apis, don't pick stupid patterns, and only do as much optimization as you have time for.

Sometimes an extra sever or shard is cheaper than an extra programmer, or gets you to market on time. Sometimes no one will notice that your operation takes an extra 20ms. Those are tradeoffs, and if you don't understand when to optimize for your budget, you'll either ship garbage or never ship at all.

I strongly agree.

> Sometimes an extra sever or shard is cheaper than an extra programmer, or gets you to market on time.

Sure, you're bailing yourself out by burning money. That's fine if it's all contained on your backend. My problem starts when similar situation happens on frontend - in the JS code of the website, or in the mobile app. Too often developers (or their bosses) will say "fuck it, who cares", as if their application was the only thing their customers were using on their machines. The problem with user-end performance is that suddenly, I can only run 3 programs in parallel instead of 30, even though I just bought a new computer/smartphone - because each of those 3 programs think they have the machine for themselves.

An extra 20ms, or an extra sever, is very different than sloppy JavaScript bogging down someone's computer. That's crap code.

I actually saw this tweet the other day. Amusing how often performance is neglected until it kills something.

I have also felt it would be a fun bingo game in a year to see when a famous quote of someone would come up. This Knuth quote would definitely be on there.

Not everyone is building web browser, compiler or even a e-commerce site. Even on commerce-related website, only pages on customer acquisition path and buy path really matter.

Most of those pages will bubble up when you do your first profiling session anyway.

You can get away with good data structures/good sql queries and a little big O analysis almost everywhere.

Premature optimization, premature nines of stability, premature architecture and abstraction are as evil as ever. They all distract you from moving forward and shipping.

Of course, if your product is BLAS library, database, compiler, web browser, operating system or AAA video game, that does not apply. I mean, for most of us "profile often" is a terrible advise.

(edit: spelling, clarifications)

>You can get away with good data structures/good sql queries and a little big O analysis almost everywhere.

So design databases and algorithms for performance from day 1 & understand how the data needs to be structured? You'd have to verify the assumptions you made about the data were correct and profile often if you didn't want to worry about performance regressions. Bad enough regressions are a failed product and you can't ship something that could never get fast enough without a rewrite.

I don't think everyone in this thread has the same concept of what optimisation means. It doesn't all have to be obfuscating bit-fiddling. Even identifying the important paths is optimisation.

This article reminds me of something I read a couple years back that stuck with me.

The 10x dev is the dev that creates 10% the problems other devs create.

Thinking ahead is a skill the industry at large unfortunately seems to lack

Good performance comes down to using suitable algorithms - not optimization’s after the fact. Thoughtful algorithm choices are never premature.

There are also a lot of times when it doesn’t matter, possibly the majority of the time in some domains. I’m working on a project now where the answer takes a couple of seconds to generate but it isn’t needed for minutes so spending time to make it faster would be a waste of my clients money.

> But if performance work has been neglected for most of the product development cycle, chances are that when you finally fire up the profiler it’ll be a uniformly slow mess with deep systemic issues.

Hrmm. In my experience very good programs also have very flat profiles. I don’t think a flat profile is indicative of bad performance culture.

While I think that writing simple code is preferred to writing optimized code given a choice, I just hate writing obviously non-optimal code, it leaves bad taste in my mouth. I'm trying to find some land in between, even if I'm sure that those optimization efforts won't yield any observable gains.

Something I find helpful is to performance and memory profile your projects on a semi-regular basis and establish a baseline. When things suddenly deviate (esp. memory usage) you catch it early before the problem has time to grow.

Nice article, but my eyes hurt from the colors :) Switching to reader mode in Firefox helps a lot.

Rules of optimization:

1. Optimize only if needed.

2. Premature optimization is the root of all evil.

I don't know why are you being downvoted as these are famous rules and contributes to this thread.

I saw rule 2. ignored almost daily at a job - putting caches where it doesn't matter, taking days on a feature because of analysis paralysis, colleges thinking about using int instead of Integer or not creating another function because of 'overhead'.

I think these are wise rules - of course not when you use a 2 MB js library for an uppercase() function. That is madness.

But the decision if you should use an arraylist or a linkedlist is unimportant almost every time. Of course there is that 1% when it matters and these rules are about the other 99%.

I downvoted it because it adds nothing. If you've read the article, you'd know that those classic arguments are addressed very early on. If you haven't read the article, you learn nothing about what's actually in the article or why it might interest you.

the article had interesting points but i decided to stop reading because the yellow wall of text with black background was not super readable.

Its a dark blue. I think your color balance may be way off. With a more accurate color balance its readable.

Some people just don’t like light text on dark background. It’s often not something you can fix with a minor tweak.

Agreed, but calling the blue black indicates a seriously misconfigured monitor.

Rule #9. Optimize webpage colors so it doesn't hurt people's eyes.

Interesting title but the text was far too small to read on my pixel 2 and wouldn't let me pinch-zoom in. Optimization of some other metric perhaps?

I don't give a fuck, I tell people to buy faster hardware.

Rules of software (or code) optimization maybe? I clicked on this thinking it was going to be about gradient methods.

It's interesting to me that people's opinions on this subject are much like politics. You either make optimization a priority, or you don't optimize prematurely. You write clever well written and commented imperative code, or clear concise functional code.

They're two completely different schools of thought and may work well in either scenario. It depends a lot on your background, and your current context what way you are going to write code.

I find your comment to be much like politics here. Presenting a false dichotomy. People who are obsessed over optimization vs. people who aren't. People who would prefer writing "clever well written and commented imperative code" vs. "clear concise functional code".

The truth is, you have at least three distinctive groups on the performance spectrum: people who are obsessed, people who treat it as one of the things to prioritize, and people who never do it (and tell you it's never the time to do it). The truth is, a lot of performance problems come from imperative code, and functional doesn't mean slow if you know what you're doing. The truth is, most big performance issues can be solved by not doing stupid shit and taking an occasional look at the resource usage. The former requires a little knowledge of how computers and your tools work. The latter requires care about end users, which seems to be in short supply.

I'm not sure quite what you're getting at. I actually agree with you, I'm one of those who does not like imperative code. I think it's completely counterintuitive.

I was making an effort to be unbias, by making clear the "good attributes" of each side.

I wasn't at all trying to make that harsh of a distiction. In fact I was trying to point out the ridiculousness of it. What I was trying to point out with irony, I should have said clearly what my opinion is:

If you're too concerned with speed, and you optimize early, I think your code style is less than ideal. Essentially your code is messed up to the point that it's too difficult to go back to make optimizations if necessary.

I think the reason functional programmer's don't talk about optimization as much is because they don't need to. It's a completely different paradigm.

So much like politics, if you only listen to one side, you're going to get messed up ideas of the way the world works.

I hope this offers a better explanation

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact