
Simple techniques to optimise Go programs - sjwhitworth
https://stephen.sh/posts/quick-go-performance-improvements
======
kevinconaway
Most of these techniques sacrifice readability so before implementing them,
you should really profile to make sure that the code in question is a hot spot
in terms of allocations or CPU usage.

That said, the example using sync.Pool is not quite right. The _New_ function
should always return a pointer type to avoid incurring an additional
allocation on _Get_ [0]

The code should look like:

    
    
      var bufpool = sync.Pool{
        New: func() interface{} {
            buf := make([]byte, 512)
            return &buf
        }}
    
      b := *bufpool.Get().(*[]byte)
    
      defer bufpool.Put(&b)
    

[0] see example in
[https://golang.org/pkg/sync/#Pool](https://golang.org/pkg/sync/#Pool)

~~~
shereadsthenews
The risk of write-profile-optimize is that you might write a big slow program
with a totally flat profile. If it doesn’t meet performance requirements at
that point, then what?

~~~
firethief
Does that ever happen?

~~~
ryanworl
Yes. This is the plague of modern software. It is all over the place. So much
software today spends all its time chasing pointers, using the cache poorly,
and branching wildly.

You won't see these aspects of poor design on a sampling profiler. You will
see it by running e.g. perf on Linux and seeing pitifully low IPC and cache
miss numbers.

~~~
the_duke
This is one of the reasons I really love Rust: it not-so-subtly nudges you
towards avoiding heap allocations by adding extra boilerplate and making them
obvious in type signatures (unless you hide them behind wrapper types).

This contributes to Rust programs generally having good performance
characteristics without spending time on optimizations.

~~~
tick_tock_tick
I find the exact opposite for rust as it often encourages boxing random
objects to satisfy odd lifetimes.

That being said of course in almost all of these case you can restructure your
program so you don't need to box the values but if it's not performance
critical why bother? Repeat a couple dozen times across a large codebase and
you have the same pointer chasing issues.

~~~
the_duke
The NLL improvements of last year have improved things quite a bit. It's still
not perfect, but in my opinion it has reached a point where this is mostly an
issue for Rust beginners.

Some patterns of writing code will be really awkward to realize, but there are
usually "more rusty" solutions that you start to apply without event noticing.
Once you write code with the desired ownership semantics in mind, it's often
(relatively) frictionless.

~~~
pjmlp
It is still an issue for doing UI related coding, due to the way it is common
to design such kind of systems.

------
kstenerud
The JSON marshaler actually caches struct serializations so that you don't
incur the reflection cost on every run.

------
cafxx
> a previous version of this blog post did not specify that the New() function
> should return a pointer type. This avoids an extra allocation when returning
> through the interface{} type.

While this is good advice, it's not entirely correct. Even with the current go
compiler there are ways to use sync.Pool with non-pointer values without
incurring in the extra allocation, e.g. using a second sync.Pool to reuse the
interface{}. Although I would not recommend it as it's slower, and much less
maintainable.

> The safe way to ensure you always zero memory is to do so explicitly:
    
    
      // reset resets all fields of the AuthenticationResponse before pooling it.
      func (a* AuthenticationResponse) reset() {
          a.Token = ""
          a.UserID = ""
      }
    
    

I think this is safer, in face of modifications to the AuthenticationResponse
structure, and much clearer in its intent:

    
    
      // reset resets all fields of the AuthenticationResponse before pooling it.
      func (a* AuthenticationResponse) reset() {
          *a = AuthenticationResponse{}
      }

------
pcwalton
> During a garbage collection, the runtime scans objects containing pointers,
> and chases them. If you have a very large map[string]int, the GC has to
> check every string within the map, every GC, as strings contain pointers.

This would, of course, be much less of an issue with a generational GC, which
doesn't have to scan the entire heap on every collection.

------
kchr
> In A/B tests, we tried delaying the page in increments of 100 milliseconds
> and found that even very small delays would result in substantial and costly
> drops in revenue. - Greg Linden, Amazon.com

Just curious, do vendors on Amazon get reimbursed for the drops in revenue
during tests like this?

~~~
dymk
They'll imperceptible to individual vendors, as it's distributed across
thousands. "Substantial" is extrapolated from a small (but just large enough
to draw strong statistical conclusions from) A/B rollout.

Should Amazon charge vendors more when they identify where to invest more
engineering effort, as a result of these A/B tests, which eventually lead to
far higher increased revenue?

------
tedunangst
Replacing strings with ints would be more believable with a real world
example.

~~~
ryanworl
Dictionary encoding strings to integers is a common compression technique. In
GC'd languages it fakes out the garbage collector because your dictionary
codes are essentially pointers but not actually pointers from the GC's
perspective. You also have the benefit of potentially using a smaller integer
than a pointer if you know roughly the number of values you're encoding
beforehand.

~~~
weberc2
Does this hold for Go? The tracing is the only relevant cost and tracing is
very, very fast, so I would expect this to be negligible? Or am I
misunderstanding the hypothetical scenario?

------
astockwell
> Allocate capacity in make to avoid re-allocation

Am I the only one who has done this and used append() within a loop, and
resulted in a slice that is 2x as long as the original desired length, and the
first 1x of items are all empty?

I quickly caught it and fixed the approach (instead of append, I used the `i`
as the insertion index into my new empty slice which already had capacity
allocated), but the ease with which that subtlety could be overlooked turned
me off to this approach unless I profile and really find it's a hot path.

Edit to add: I tried his code, and it resulted in the new slice being as
expected.

~~~
capo64
The make function takes the type, length and capacity. If you don’t supply
capacity, it defaults to length. So if you want to use append, do something
like: make([]int, 0, length)

That will allocate a slice that can fit length ints but has a length of 0 (so
append starts at index 0).

You can call len(slice) and cap(slice) to see the difference, but append will
insert an element at index len(slice), growing the capacity if necessary

~~~
astockwell
Ahh thanks for the great explanation! Love HN.

------
0815test
Missing technique: rewrite the performance-critical parts of your Go programs
in a different language, and use Cgo to make them accessible to Go code via
the standard C ABI. K.I.S.S.

~~~
justinclift
Hopefully that's /s, as debugging CGo stuff is a royal pita. ;)

~~~
vishvananda
also the overhead of calling out to c can actually be quite high:
[https://www.cockroachlabs.com/blog/the-cost-and-
complexity-o...](https://www.cockroachlabs.com/blog/the-cost-and-complexity-
of-cgo/)

~~~
0815test
While some overhead and increased debugging effort may be inevitable, it's a
mistake to blame them on the use of Cgo itself; the root cause is Go's custom
ABI and user-level-threading ("goroutines") model. And I really have to
dispute OP's claim that the techniques they mention do not "require
significant effort, or large changes to program structure" of their own. In
many ways, rewriting some portion of the code can actually be simpler.

~~~
jerf
Well, it may be true that the "blame" for why CGo is slow is up for debate,
it's just a fact that it's rather slow to use.

Go isn't unique in this; there are many languages with runtimes that require
similarly intensive amounts of copying and context conversion before C code
can run on whatever the data is. However, most of those languages, like
CPython, are themselves slow enough that the penalty isn't as noticeable
against the general background noise. (In general CPython requires a lot more
copying too; Go is closer to C struct and array semantics and can more often
get by with some form of memcpy, the internals of Python look nothing like
that.) Go is fast enough that it's much easier to get into scenarios where in
a tight loop you're spending 90% on CGo overhead if you're not careful with
data flow. For those languages that have to copy a lot out of their runtime
and are also fairly fast, they'll face the exact same issues. It's not really
a "Go" issue per se, but the challenges faced by any language that wants a
runtime significantly different from C.

(One of the miracles of Rust is building an environment and runtime that isn't
stuck on C's limitations but at the same time can still speak to C really,
really cheaply. Plenty of languages have one or the other of those, but there
aren't very many that have both. I'm not sure there's any other language that
has threaded that particular needle so cleanly.)

~~~
weberc2
Well said. Also worth noting that most of the languages like CPython are very
slow because many optimizations are now prohibitively difficult given the
"easy C-interop" constraint.

~~~
firethief
Isn't that more because the choice to support C interop by giving C access to
interpreter internals, which is the easiest-implemented option but not the
only way to get easy-to-use C interop.

~~~
weberc2
That could be. Although I'm not familiar with any languages that are fast and
have a GC and easy C interop. At least none of the major VM languages or (non-
embedded) scripting languages. Maybe D or some other similarly non-mainstream
language.

~~~
taeric
Lisp? Common lisp's ffi faculties are good, from all I've heard.

~~~
weberc2
Could be. I’m not that familiar.

------
innocentoldguy
For me, the best way to optimize a Go program is to write it in Rust.

~~~
dang
Please don't do this here.

