Also, when profiling production code, it can be hard to FIND the slow function since the optimizer may inline things.
There is perhaps value in understanding your language and ecosystem very well, so that you have decent intuition for what is fast, and making default-fast choices even if their is a small readability cost. That cost may be made up for by not having to crunch on performance later. As well, many performance choices make code simpler, after all the goal is to do less. and less is less.
You won't see these aspects of poor design on a sampling profiler. You will see it by running e.g. perf on Linux and seeing pitifully low IPC and cache miss numbers.
This contributes to Rust programs generally having good performance characteristics without spending time on optimizations.
That being said of course in almost all of these case you can restructure your program so you don't need to box the values but if it's not performance critical why bother? Repeat a couple dozen times across a large codebase and you have the same pointer chasing issues.
Some patterns of writing code will be really awkward to realize, but there are usually "more rusty" solutions that you start to apply without event noticing. Once you write code with the desired ownership semantics in mind, it's often (relatively) frictionless.
Could you clarify this? It seems like the opposite to me. Borrowing requires lifetimes in type signatures, but boxing yields owned objects, which can be passed around easily like value types.
Too many people have taken the “make it correct then make it fast” advice too far. The definition of “correct” needs to include performance parameters from the beginning because usually the bottlenecks that cause real issues are architectural.
What I will say is that code that is architecturally correct for the performance requirements necessary is largely not less understandable or unmaintainable than code that is incorrectly designed for the performance context. You don’t end up in the kinds of harder to read optimizations you see in this article in those cases any more than you do under the other case.
Another consideration is if your architecture is wrong for your performance space, nothing will help but a rewrite. If your code is optimized in the small too early, you can always rewrite in a way that is more clear.
Your design should indeed take into account performance requirements. But micro-optimizations like these (almost all of these changes are to avoid linear numbers of reallocations, with the exception of string builder) don't give you order of magnitude speedups unless they're in hot loops anyway.
Profile than optimize means profile, then optimize. Designing good software isn't optimization, its designing good software.
You shouldn't micro-optimize before profiling, because it likely won't matter. Bluntly, if you have a flat profile, none of the optimizations in this article are relevant anyway. You'll be able to pull out single digit percentage speedups, maybe.
The optimize then profile argument isn't meant to be about architecture. Yes, you should build performant architecture. Yes, you should take time to plan a performant architecture before building[+]. But the question of profile than optimize is never (except in the strange way you're bringing it up) about doing macro-optimizations before you've written a line of code. It's almost always in the context of "don't just try to optimize what you think is slow, because you're almost always wrong".
Big-O style speedups from architectural changes aren't micro-optimizations, they generally sit outside of that conversation entirely.
As an aside, flat profiles are in practice exceedingly rare. Most (useful) programs do the same thing many times. Its very unusual to see a program that isn't, in essence, a loop. And the area inside the loop is going to be hot. The pareto principle applies to execution time too.
[+]: Maybe startups who gotta ship it to survive as the exception.
Glad we agree.
> The optimize then profile argument isn’t meant to be about architecture
Glad we agree. If only all the people who tell me “correct before performant” agreed with us. In practice, in my experience, this is not the case. People use it in day to day conversations at the earliest parts of conversations about architecture all the time. If they didn’t I wouldn’t have nearly the problems I do with the statement.
> As an aside, flat profiles are in practice exceedingly rare
This seems to be the most controversial part of our disagreement. In my experience, that is flatly untrue. Especially when talking about systems where the performance does not meet the requirements. I can count on 1 hand the number of times I’ve seen systems go from “unacceptable” performance to “acceptable” via micro optimizations. I’ve never seen one go to “great”. I don’t know how to quantify this though, so I’m willing to leave this in the realm of my experience is different than yours.
All that is to say, my experience says that systems that don’t treat performance as first class requirements don’t tend to meet their performance expectations.
All of which is neither here nor there based on the article but is directly related to the question of ‘what do you do with a flat profile’?
Edited to add: Since apparently you also work at Google, you should walk over to Svilen's desk and just ask him if profiles of production software are generally flat, or if they generally have hot spots.
That's what you get after you profile and optimize.