Hacker News new | past | comments | ask | show | jobs | submit login

Most of these techniques sacrifice readability so before implementing them, you should really profile to make sure that the code in question is a hot spot in terms of allocations or CPU usage.

That said, the example using sync.Pool is not quite right. The New function should always return a pointer type to avoid incurring an additional allocation on Get [0]

The code should look like:

  var bufpool = sync.Pool{
    New: func() interface{} {
        buf := make([]byte, 512)
        return &buf

  b := *bufpool.Get().(*[]byte)

  defer bufpool.Put(&b)
[0] see example in https://golang.org/pkg/sync/#Pool

There's a little at the bottom that encourages caution. In a previous draft, I wrote about the importance of proving hot spots before optimising, but took it out as I wanted this to be more a list of techniques.

On sync.Pool: good spot. I'll update it - thanks.

Unfortunately, that's still wrong (I think). Go will still allocate a temporary slice-header on the heap and then de-allocate it when it's dereferenced.

I believe the only way to do this without unsafe code is to pool these slice-headers as in https://github.com/libp2p/go-buffer-pool/blob/master/pool.go.

The risk of write-profile-optimize is that you might write a big slow program with a totally flat profile. If it doesn’t meet performance requirements at that point, then what?

In your hypothetical scenario, you then optimize whatever is easiest to optimize.

I think there is a difference between architectural changes that would inhibit the overall performance of an application vs language specific optimizations like the article is referring to.

My comment re write-profile-optimize was geared towards the latter, i.e. using strconv is usually uglier to read than fmt.Sprintf, you probably should leave it alone unless you've profiled and found it to be a problem.

Does that ever happen?

I've seen examples of actual production systems where there is just a general proliferation of slow practices that make the system slow without any one function standing out yes. In C# such practices could include using interfaces (for dependency injection) and linq (for readability), and reflection for instance. All of these have fast alternatives with varying levels of downsides. The downsides aren't as big when you are practiced at them.

Also, when profiling production code, it can be hard to FIND the slow function since the optimizer may inline things.

There is perhaps value in understanding your language and ecosystem very well, so that you have decent intuition for what is fast, and making default-fast choices even if their is a small readability cost. That cost may be made up for by not having to crunch on performance later. As well, many performance choices make code simpler, after all the goal is to do less. and less is less.

Yes. This is the plague of modern software. It is all over the place. So much software today spends all its time chasing pointers, using the cache poorly, and branching wildly.

You won't see these aspects of poor design on a sampling profiler. You will see it by running e.g. perf on Linux and seeing pitifully low IPC and cache miss numbers.

This is one of the reasons I really love Rust: it not-so-subtly nudges you towards avoiding heap allocations by adding extra boilerplate and making them obvious in type signatures (unless you hide them behind wrapper types).

This contributes to Rust programs generally having good performance characteristics without spending time on optimizations.

I find the exact opposite for rust as it often encourages boxing random objects to satisfy odd lifetimes.

That being said of course in almost all of these case you can restructure your program so you don't need to box the values but if it's not performance critical why bother? Repeat a couple dozen times across a large codebase and you have the same pointer chasing issues.

The NLL improvements of last year have improved things quite a bit. It's still not perfect, but in my opinion it has reached a point where this is mostly an issue for Rust beginners.

Some patterns of writing code will be really awkward to realize, but there are usually "more rusty" solutions that you start to apply without event noticing. Once you write code with the desired ownership semantics in mind, it's often (relatively) frictionless.

It is still an issue for doing UI related coding, due to the way it is common to design such kind of systems.

> This is one of the reasons I really love Rust: it not-so-subtly nudges you towards avoiding heap allocations by adding extra boilerplate and making them obvious in type signatures (unless you hide them behind wrapper types).

Could you clarify this? It seems like the opposite to me. Borrowing requires lifetimes in type signatures, but boxing yields owned objects, which can be passed around easily like value types.

I’d say it happens more frequently than being able to optimize yourself out of a perf problem in my experience.

Too many people have taken the “make it correct then make it fast” advice too far. The definition of “correct” needs to include performance parameters from the beginning because usually the bottlenecks that cause real issues are architectural.

It is my extremely studied experience that this is rarely (read: almost never) the case, and that the far greater risk is of radically less- or entirely un-maintainable code produced by developers who have incorrect intuitions about what needs to be fast.

I suppose this is why they call it anecdotal evidence, as our experiences don’t agree.

What I will say is that code that is architecturally correct for the performance requirements necessary is largely not less understandable or unmaintainable than code that is incorrectly designed for the performance context. You don’t end up in the kinds of harder to read optimizations you see in this article in those cases any more than you do under the other case.

Another consideration is if your architecture is wrong for your performance space, nothing will help but a rewrite. If your code is optimized in the small too early, you can always rewrite in a way that is more clear.

You're talking about architectural changes, this article is about, essentially, microoptimizations. Good artchitecture is often faster than bad.

Your design should indeed take into account performance requirements. But micro-optimizations like these (almost all of these changes are to avoid linear numbers of reallocations, with the exception of string builder) don't give you order of magnitude speedups unless they're in hot loops anyway.

Profile than optimize means profile, then optimize. Designing good software isn't optimization, its designing good software.

It makes a big difference if all your function signatures in a Go program take a string parameter or a []byte parameter, or if they take some other expensive type by value. Refactoring this later can be close to impossible. I would say rather than micro-optimizations these are the choices that must be made correctly in the early phases of development to avoid having an unfixable performance problem later.

Literally the thread I’m responding too is asking what the value of optimize then profile is. My experience is that you are likely hosed if you’ve gotten to this point in the 80% case. That isn’t to say these techniques have no value, I’ve used many of them myself. But the question was, “how often does it happen that your optimizations are for a flat profile”. For me, the answer is “most of the time”.

And the broader context is when talking about micro-optimizations versus reasonable architectures.

You shouldn't micro-optimize before profiling, because it likely won't matter. Bluntly, if you have a flat profile, none of the optimizations in this article are relevant anyway. You'll be able to pull out single digit percentage speedups, maybe.

The optimize then profile argument isn't meant to be about architecture. Yes, you should build performant architecture. Yes, you should take time to plan a performant architecture before building[+]. But the question of profile than optimize is never (except in the strange way you're bringing it up) about doing macro-optimizations before you've written a line of code. It's almost always in the context of "don't just try to optimize what you think is slow, because you're almost always wrong".

Big-O style speedups from architectural changes aren't micro-optimizations, they generally sit outside of that conversation entirely.

As an aside, flat profiles are in practice exceedingly rare. Most (useful) programs do the same thing many times. Its very unusual to see a program that isn't, in essence, a loop. And the area inside the loop is going to be hot. The pareto principle applies to execution time too.

[+]: Maybe startups who gotta ship it to survive as the exception.

> You shouldn't micro-optimize before profiling, because it likely won't matter. Bluntly, if you have a flat profile, none of the optimizations in this article are relevant anyway. You'll be able to pull out single digit percentage speedups, maybe.

Glad we agree.

> The optimize then profile argument isn’t meant to be about architecture

Glad we agree. If only all the people who tell me “correct before performant” agreed with us. In practice, in my experience, this is not the case. People use it in day to day conversations at the earliest parts of conversations about architecture all the time. If they didn’t I wouldn’t have nearly the problems I do with the statement.

> As an aside, flat profiles are in practice exceedingly rare

This seems to be the most controversial part of our disagreement. In my experience, that is flatly untrue. Especially when talking about systems where the performance does not meet the requirements. I can count on 1 hand the number of times I’ve seen systems go from “unacceptable” performance to “acceptable” via micro optimizations. I’ve never seen one go to “great”. I don’t know how to quantify this though, so I’m willing to leave this in the realm of my experience is different than yours.

All that is to say, my experience says that systems that don’t treat performance as first class requirements don’t tend to meet their performance expectations.

All of which is neither here nor there based on the article but is directly related to the question of ‘what do you do with a flat profile’?

Kanev[1] disagrees. Flat profiles are the common case in actual practice.

Edited to add: Since apparently you also work at Google, you should walk over to Svilen's desk and just ask him if profiles of production software are generally flat, or if they generally have hot spots.

1: https://static.googleusercontent.com/media/research.google.c...

You're citing a paper using, as an example, an already highly hand optimized and performance-guided optimized binary.

That's what you get after you profile and optimize.

For me, all the time. But the two points about "write, profile, optimize" that people usually leave out is that profiling/benchmarking is mostly useful as a comparison tool (are your optimizations actually doing anything?) and that part of profiling is asking the question, "how fast can this go?" which is usually a difficult question to answer, but when you get really granular you can usually reason about it.

Your example doesn't seem right. Doesn't the Get perform an allocation anyway because you access the value of the pointer and store it in a new variable b? It's very likely b will escape to the heap.

I agree with nhooyr's analysis. The interface{} will anyway transparently "contain" a pointer-to-the-[]byte, in other words, the []byte value itself will be heap allocated.

(Note for anyone new to this that the "[]byte-value" - we say "the byte slice" - is a distinct thing from the "values stored-in-the-byte-slice", which is a heap allocated backing array)

So maybe we should sacrifice compiler speed for having a better compiler optimize more readable code?

Most compilers skip optimizations when compiling in debug mode, so that it is still quick to iterate/test, then perform the optimizations which take longer to compile for release builds. Obviously it means one can only profile code when it has been compiled in release mode, which I guess has its drawbacks...

I do wish Go would do this. The fast compiler thing is fantastic, but I think we can have the best of all worlds by supporting a --release flag, and I think Go is large enough now that we can justify the extra maintenance cost of dealing with two compilation modes.

Is a "fast compiler" really all that impressive though when a lot of the speed comes from not checking things that other compilers do?

A lot of speed also comes from some decisions about the language architecture. Go packages can be compiled separately and the compilation result in binary form can be just used by code importing the packages - usually no recompilation of these packages is necessary.

The speed isn't from "not checking things that other compilers do"; it's from not performing more agressive optimizations.

Does it really matter how the compiler does it? Faster compilation speed makes programmers happier. And it's not like you can convince a C++ or Rust or Scala compiler to just trust you with the types this once and compile your program ten times faster.

I would say that yes, it does matter how the compiler is faster. Turbo Pascal, for instance, has a fast compiler (still faster than Go's last time I checked) and it is fast because it was hand-optimized in assembly to be so. That's impressive; especially so back in the days of the 286.

Golang's compiler is fast because they don't check everything they could, the language itself was intentionally dumbed down, and because it doesn't bother dealing with dependencies as tightly (or as well) as other languages. I'd say it is important to know that.

Type checking is a very small part of Rust compile cost; most of it is due to the extensive use of monomorphized code (resulting in code bloat which is not always well-controlled, to put it mildly), as well as overhead in the LLVM-dependent part of compilation. Some features in the pipeline for Rust should enable some improvements, particularly const-generics and specialization.

Go is specifically designed for fast compilation. You can't just skip some checks and get similar results.

Nobody is expecting Rust compile times to ever match those of a language whose premier guiding principle is fast compile times, but the question is whether or not Rust compile times can become fast enough to keep Rust programmers satisfied. I'd say that programmer attention span is generally sufficient to tolerate iteration times of five seconds or less for interactive editing. Who knows if Rust will manage to get there, but the Rust developers I've spoken to seem to earnestly believe that an order of magnitude improvement is possible (over the course of several years of development, however).

I think what people are trying to get at in this thread is that they'd be happy if their Go compiles took 0.5 seconds instead of 0.1 seconds, if it also meant that it lessened the need to manually employ the techniques in the OP. Of course, keeping such tradeoffs within reason is a difficult prospect.

> the Rust developers I've spoken to seem to earnestly believe that an order of magnitude improvement is possible

I'd like to believe this is the case, but has that ever actually happened - as in a programming language seeing that much speedup in its compiler? There're quite a few cases of languages that are generally taken to have long compile times (C++, Haskell, Scala, etc) but I don't know of a case where any has been able to improve by as much as an order of magnitude.

Often speedups of that magnitude involve "cheating" by somehow rethinking the problem space. An example of this applied to compilers is the venerable ccache, which absolutely does provide a 10x improvement or more compared to naive use of the compiler (in this case the "cheating" is caching build artifacts so that most things don't need to be re-built at all; this is often devastatingly effective). I also suspect that the reason we don't see compiler developers regularly advertising order-of-magnitude improvements is that these improvements often span a decade or more of work. Clang 2019 may not be 10x faster than Clang 2018, but it very well might be compared to Clang 2004.

As for Rust specifically, a benchmark I saw a while back estimated that modern rustc compilation is about 3x faster than it was as of the 1.0 release in 2015. Other "cheating" approaches to reducing build times for Rust are incremental recompilation (which is kind of like built-in ccache), parallel codegen units (the original rustc, ironically, was single-threaded), a compilation mode that stops after typechecking (thereby avoiding codegen and LLVM entirely; this is invoked via `cargo check` and suffices for much of one's interactive development), and an alternative debug-build backend (Cranelift, still very WIP) which is optimized for extreme compilation speed (like a browser JS engine) rather than compilation quality (LLVM being largely designed for the inverse).

Indeed, the first step to making a fast compiler is "design your language from the start such that it can be compiled quickly" (which can be seen in the design of Go). I can think of a few things in Rust that might have been designed slightly differently (in the name of compilation speed) if the authors had the compiler-building wisdom they do today. But there are still plenty of opportunities to improve here.

C++ precompiled headers easily can give you your factor of ten.

Also, taking ‘its compiler’ literally: Turbo Pascal was over ten times as fast as earlier pascal compilers for the same hardware, which were multi-pass.

Isn’t that the entire point of higher order programming languages?

What are "higher order programming languages"? Did you mean "higher level programming languages"? If so, no, the point is that you can program more abstractly, almost always at the expense of performance.

EDIT: Downvoters, I'm terribly curious which of these points you disagree with? Do you believe higher level languages (like Python, JS, or even Go) are faster than lower level languages (like C, C++, etc)? Or do you believe that HLLs are actually faster (or meant to be faster) than LLLs? Or maybe you are downvoting because I don't know what a "higher order language" is?

The downvotes are because you have chosen to make an irritating pedantic point about word use when the meaning was entirely clear in context.

Wasn't trying to be pedantic; whatever is "entirely clear in context" is not entirely clear to me at all. Did he use a term that I'm unfamiliar with or did he use the term I'm familiar with in a way that is incongruent with my understanding?

Informally, there is no difference between higher order programming languages and higher level programming languages.

Thanks for the explanation; I hadn't heard that term before. :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact