
Performance patches in Go 1.11 - valyala
https://docs.google.com/presentation/d/1tpeJZFObkeick4CF-mx0L3CeCgvT15B96aJeRpxEPcE/edit?usp=sharing
======
obl
Cool stuff. It's nice to have a somewhat mainstream language using its own
backend instead of llvm. You can in fact get pretty far by tailoring your
compiler to your specific needs. You end up with much less code, it's also
easier to make it fast.

You do cut yourself of from the more advanced optimizations simply for man-
hour reasons. I'm guessing it's a trade off they are willing to make.

I have to say that I really dislike the kind of large pattern matching they
added for the hash table clear. C compilers do that as well for e.g.
memset/copy or bit rotate instructions.

IME it's really brittle and would be more beneficial as some kind of a lint
("you could replace this bunch of code by a call to clear()").

It makes the code simpler and won't inexplicably slow down next time someone
does a tautological refactoring.

~~~
dilap
Go actually doesn't have any other way to clear a map!

~~~
stcredzero
_Go actually doesn 't have any other way to clear a map!_

Which, actually, is brilliant design! Now, I can find all map clearings with
one operation. (I need to implement that syntactic search/rewrite tool I miss
from Smalltalk.)

~~~
apta
How is not having map.clear() a brilliant design?

~~~
stcredzero
In a large project, you'll have all ways of doing any given thing represented
in a code base. Reducing that number down to 1 means that searching for all
occurrences is reduced to 1 operation. If you are working on a large project,
searching for usages/senders/occurrences is one of the most common tasks that
you do. In fact, using escape analysis of the same kind used for "Extract
Method" refactorings, you can even catch clearing interspersed with other code
in the for-range loop.

A 50%+ reduction in programmer work is nothing to be sneezed at.

------
morrbo
These are some really cool optimizations. It is nice that they're letting you
put these in, and it's always kind of bugged me that they won't do some basic
optimization of their util classes - like ReadDir sorting by filenames by
default, removing one line of code can speed it up 800% on large directories
iirc, but they won't honour the pull requests. The argument being "it's a util
class, if you don't like it, don't use it.'. Whilst this is true, IMO the
stdlibrary/common functions should be as quick as possible, with anything
extra bolted on (within reason - sorting names being an obvious one). I
wouldn't expect to have to read the actual source code in order to try and
optimize, but i guess i'm wrong here ¯\\_(ツ)_/¯

~~~
grey-area
They take the Go 1 promise quite seriously and do try not to change existing
behaviour, which is a good thing. I'm quite happy about that, because
otherwise we'd see far more breaking changes, and it'd be painful to develop
large apps on top of the stdlib. Perhaps in Go 2 they'd be open to changing it
or moving ioutil out of the stdlib, but I'm quite sure someone somewhere
depends on the ordering, and you're complaining about a 10 line function, you
could just copy it out and remove the offending line:

    
    
        func ReadDir(dirname string) ([]os.FileInfo, error) {
    	f, err := os.Open(dirname)
    	if err != nil {
    		return nil, err
    	}
    	list, err := f.Readdir(-1)
    	f.Close()
    	if err != nil {
    		return nil, err
    	}
    	return list, nil
        }
    

I much prefer a language and stdlib that stays stable over years, even at the
cost of some nice new features/improvements - the place for code to move fast
and break things (if you like that sort of thing) is much higher up the stack.

~~~
morrbo
That's exactly what I did. The problem I was trying to address is you
shouldn't have to expect to read into the stdlib to optimize the code. Adding
a ReadDirNoSort (as suggested in the initial 2012 pull) wouldn't hurt anyone.

~~~
Scaevolus
There are more optimizations than that that you could probably use if you want
fast ReadDir.

Golang's ReadDir reads directory entries, stats all files, and sorts the list.
If all you want is the directory entries directly, try godirwalk (also much
faster for walking!):
[https://godoc.org/github.com/karrick/godirwalk#ReadDirents](https://godoc.org/github.com/karrick/godirwalk#ReadDirents)

------
kodablah
The ModInverse and GCD improvements will be very welcome in a demo I just
wrote for trustless card playing[0].

I'm beginning to wonder if there will become more value in "lifting" binaries
to LLVM IR, applying all opts to them, and re-compiling them. McSema does
this[1] albeit you have to setup IDA and what not. Has anyone applied this to
Go programs and measured its benefits for their specific use cases?

0 - [https://github.com/cretz/go-mental-poker](https://github.com/cretz/go-
mental-poker) 1 -
[https://github.com/trailofbits/mcsema](https://github.com/trailofbits/mcsema)

------
shawabawa3
They mention VictoriaMetrics, a time series metric database - is it open
source? I couldn't find any reference to it on google

~~~
ckocagil
It even claims it's the fastest time series db. Is it faster than kdb+? I'm
skeptical.

------
MaxBarraclough
Looking at Slide 7: doesn't LLVM do function-inlining, presumably with quite
well-tuned heuristics? What's the advantage in having the frontend side of the
compiler doing it too?

Or am I wrong, and does LLVM not do inlining?

 _Edit: I 'm a silly person, Go doesn't use LLVM. I've been wrong about that
for some time!_

~~~
masklinn
LLVM does do inlining (it's the most important optimisation you can implement,
at least for native codegen), but the reference Go implementation (gc) does
not use LLVM, the entire toolchain is custom.

On the one hand, that allows very fast compilation & codegen, on the other
hand the optimisations are fairly limited. It also doesn't benefit from
improvements done by other users, but doesn't saddle you with their issues[0].

There's also a Go implementation on top of GCC (gccgo), however while it
generates high-quality code historically it wasn't really better than gc due
to lacking escape analysis (so it did a ton more heap allocations than the
reference) and having a lackluster GC.

[0] Rust has had its own LLVM fork for as long as the project has existed, for
both fixing issues it hit which haven't yet been fixed upstream and
implementing features it needs which have not been or can not be upstreamed

~~~
pjmlp
I am always torn in such approaches, in one way it means throwing away decades
of code optimization.

On the other hand, in a world that devs only believe in stuff when others
actually implement it in production, it is nice to see that their toolchain is
self sufficient without the usual C or C++ dependency.

Many language designers opt for implementing their toolchains in C or C++ just
for mere convinience, and it is ok so.

However it creates the wrong perception among those without compiler
development knowledge, that C and C++ are the only way to implement a
toolchain.

~~~
masklinn
Seems to me these are orthogonal concerns, LLVM is in C++ but the question I
answered was about using LLVM versus implementing a custom toolchain, not
implementing a custom toolchain in C++ versus some other language.

~~~
pjmlp
Agreed, maybe I expressed myself badly, but that was still kind of my point.

------
agnivade
The newly introduced prove pass is probably the biggest game changer. And it
provides a ground work for new optimizations to be done on top of it.

~~~
giovannibajo1
Prove was already there, what we did is to improve it to eliminate even more
bound checks.

PS: I'm the author of those prove patches.

~~~
hectormalot
This is what I like about HN :)

How did you prioritize which BCE’s to include? (Or were these all that had a
ticket?)

~~~
giovannibajo1
I wrote a Nintendo DS emulator in Go (see my GitHub) and ended up seeing lots
of missing optimizations in generated code. I opened a few tickets and then
focused on BCE. I realized that one of the main problems was the missing
transitivity of the pass, so I ended up writing a largish piece of code to
implement a partial-order set (see the poset data structure) and then picking
on a few low hanging fruits that it opened up to be picked (searching for open
tickets on GitHub issues). These were my first contributions to the SSA passes
btw.

------
bitmapbrother
This was my favorite part:

>This is fun and is easier than optimizing gcc / llvm, since the majority of
the Go compiler, runtime and standard library is written in Go

------
benhoyt
That's great! I'm curious how much these optimizations slow down the compiler,
if any? I remember something about Go only adding optimizations if the amount
they increased the speed of the binary meant that compilation speed was sped
up too (but don't have a source for that right now, and doubt it's generally
true).

------
ax00x
Is there a recording of this presentation by any chance?

~~~
valyala
Unfortunately no. The presentation was shown on the Kyiv Go Meetup -
[https://www.meetup.com/uagolang/events/251712321/](https://www.meetup.com/uagolang/events/251712321/)
and it was in Russian. I didn't notice any recorders during the presentation.

~~~
ax00x
Too bad, thank you regardless :)

------
georgewfraser
Is there a good guide to the go compiler anywhere, that outlines all the
stages from source code to assembler to binary?

~~~
valyala
See
[https://github.com/golang/go/tree/master/src/cmd/compile](https://github.com/golang/go/tree/master/src/cmd/compile)
and
[https://github.com/golang/go/blob/master/src/cmd/compile/int...](https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/README.md)

------
isaachier
IIRC the mapaccess improvement had to be rolled back. There was a bug in it.
Not sure if it will make this release.

~~~
valyala
Not yet. There is a fix for the bug - [https://go-
review.googlesource.com/c/go/+/120255](https://go-
review.googlesource.com/c/go/+/120255) .

------
cup-of-tea
Is this really the best format to post this in?

~~~
MaxBarraclough
Imperfect, but not so bad.

If the slides are available today, and are viewable in a web-browser, I don't
mind too much. Presumably it's either this or nothing.

If there's already a finished blog post in conventional format, then sure, no
sense linking to a slideshow.

~~~
masklinn
> Imperfect, but not so bad.

It's pretty bad. More so because Google Slide has a PDF export, but despite
every browser having a built-in PDF viewer they force the download.

~~~
MaxBarraclough
But what's the problem?

It's JavaScript-heavy and all that, but it works fine, no?

~~~
masklinn
That slides are very content-sparse. Being able to display the decks as a
single continuous document and with several slides per screen makes what
little information it contains much easier to go through (and back to if
necessary).

