

An exercise in profiling a Go program - f2f
http://thornydev.blogspot.com/2015/07/an-exercise-in-profiling-go-program.html

======
alanctgardner3
In my experience, almost all overhead in Go programs will result from being
bitten by memmove, alloc and GC. Profiling any program that uses interface{}
and byte buffers will quickly expose you to how much the runtime loves to
reallocate and copy objects. In one instance I got 10x throughput improvement
by switching from a generic interface (take an interface{} and detect the
type) to using typed methods and hand-rolling a shim that called the
appropriate methods.

~~~
ihsw
Do people actually use `interface{}` so liberally? I've never encountered it
in the wild (except by newbies or as an example of what _not_ to do) and I've
always been under the impression that it's been heavily discouraged for a very
long time.

~~~
midpeter444
interface{} is heavily used in many of the Go std libraries. Particularly for
what I call "convenience methods" where you want to be able to pass in various
types and it will do a type switch for you and then determine what to do.

Part of what I was discovering in this blog post is that if you look under the
hood at these convenience methods, and you know in advance what type are
passing (i.e., you are not using interface{}), you can often find the "direct"
call to use that uses a concrete type and likely get better performance out of
it.

------
Sphax
Nice read. I love the pprof tool: being able to pinpoint which line of code
(or line of assembly code even) takes the most time is awesome.

~~~
masklinn
While very neat in my experience this has two issues (using Python's
line_profiler so it may not have the same problems):

1\. it adds significant overhead to execution speed which means savings under
line_profiler and savings without it may only be distantly related, not sure
how much overhead pprof adds

2\. it requires knowing which functions should be line_profiled, because when
you have thousands or millions of LOCs, you've got no idea what to line-
profile

I've never found "usual" whole-program profilers to be great at the latter, it
may just be that I'm bad at reading them but they never really click, and when
you've got a few "leaf functions" called from basically everywhere, they end
up having a low SNR. Recently however I've started using sampling profilers
and flamegraph representations[0] (or sunbursts, but there's no standard tool
for that one) and found it to be a significantly superior way of identifying
bottlenecks with very high SNR.

[0]
[http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html](http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html)

~~~
jerf
"While very neat in my experience this has two issues (using Python's
line_profiler so it may not have the same problems)... Recently however I've
started using sampling profilers"

pprof is a sampling profiler. You can get it from Google for C, too. I'm not
sure if Go has a "port" or just wrote something with the same ideas, but,
well, I guess you could say I don't know precisely because it hardly matters.

~~~
masklinn
> pprof is a sampling profiler.

Good to know.

------
andrewmwatson
You should look into the GODEBUG env var
([https://golang.org/pkg/runtime/](https://golang.org/pkg/runtime/)) for ways
to get an idea of what all your goroutines are doing

------
zupa-hu
the comments on the page seem to go in blackhole

~~~
midpeter444
I'm the blog post author. I haven't disabled comments (I welcome them) and I
don't see any pending ones that I need to approve. I'm not super happy with
Blogger, so I'm going to blame it on that. There is one other comment on this
post, so the problem probably has to do with the "Comment As" choice and
whether you are logged in to that service.

~~~
zupa-hu
I tried posting before authenticating with my google account; it sent me to
google to log in, I did, I returned, comment nowhere, but I _was_ logged in
this time. So I went ahead again, commented again, didn't appear. Weird.

Btw I read somewhere that it's best use N-1 cpu-s for actual work and spare
the last one for the scheduler so that the goroutines will not be reassigned
to different threads thus cpus all the time because that means lots of memory
movement. I suspect that was your 9th goroutine.

