Things that make Go fast

hendzen · on June 7, 2014

Escape analysis, dead code elimination, and function inlining are standard optimizations taught in an undergraduate compilers course. Go is cool, but I wouldn't really cite those as justifications for why.

Nitramp · on June 7, 2014

Yes for dead code elimination and function inlining, not so sure about escape analysis. The author acknowledges that, but there's a detail in Go: it does the function inlining at compile time (unlike e.g. Java JITs), but still manages to inline across compilation units (unlike C++, modulo LTO).

That's nice, and presumably what he wanted to point out. It's also nice that in Go, these things are very straight forward due to the overall simplicity of the system (unlike C++). The dead code elimination is just a supporting fact for why that's useful, and again works across compilation boundaries.

I'm not sure about your assertion of escape analysis, at least Java JITs only learned that trick as of lately, and are still pretty bad at it. C++ again suffers from cross-compilation unit visibility; even if your LTO can detect an inlineable call, its AFAIK not possible at that time to move heap allocations to the stack.

This is an interesting pattern in Go, the longer one looks at it, the more you understand that it's a whole bunch of good decisions in various subsystems coming together.

pcwalton · on June 7, 2014

>　C++ again suffers from cross-compilation unit visibility; even if your LTO can detect an inlineable call, its AFAIK not possible at that time to move heap allocations to the stack.

Sure it is. Why not?

C++ compilers don't usually do this because it doesn't help much—explicit memory management encourages people to not allocate unless necessary in the first place.

Nitramp · on June 8, 2014

> Sure it is. Why not?

Do you have a reference for that? I'd expect this to be hard, at linking time you no longer have the C++ source, so it's much harder to make such decisions.

pcwalton · on June 8, 2014

You don't need the C++ source, just the IR. LLVM already removes mallocs if they are unused:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-July/033017....

seanmcdirmid · on June 7, 2014

> at least Java JITs only learned that trick as of lately, and are still pretty bad at it.

Did the author claim that Go was faster than Java? As far as I can tell, the JITs are still kicking the Go compiler's butt on "effectiveness."

> C++ again suffers from cross-compilation unit visibility; even if your LTO can detect an inlineable call, its AFAIK not possible at that time to move heap allocations to the stack.

Is the author really trying to explain why Go is faster than C++?

> This is an interesting pattern in Go, the longer one looks at it, the more you understand that it's a whole bunch of good decisions in various subsystems coming together.

One cannot validate these patterns yet because Go is still slower than the languages it supposedly innovates on in performance techniques.

stcredzero · on June 7, 2014

Go is almost as fast as Java. Which is to say Go is respectably fast. They should get kudos in the language developer man hours metric. Considering how new they are, how few people have worked on the language, and the results achieved, the result can be counted as an accomplishment.

It's a small and elegant way to achieve a particular pragmatic result. That always deserves respect in my book. (The JVM deserves it too for slightly different reasons.)

seanmcdirmid · on June 7, 2014

There is a difference between gold stars and empirical results. Go is fine enough as a language, but it can't claim any performance crowns, you can't describe how go is faster than Java when it's not.

stcredzero · on June 7, 2014

Go is fine enough as a language, but it can't claim any performance crowns, you can't describe how go is faster than Java when it's not.

Did I miss something in the reading? I didn't see the claim that Go is faster than Java. Speed isn't everything, and the rabble that don't have the wherewithal to know that and that Java is quite a fast managed language don't matter. Take it from an old Smalltalker. (Who remembers the days when Smalltalk VMs were kicking the JVM's butt, but people still thought it was 1985.)

seanmcdirmid · on June 7, 2014

That is the weird part of the essay...they talk about why Go is faster than Java, without mentioning that it really isn't.

stcredzero · on June 7, 2014

[Citation Needed]

Just re-read it. Again, I didn't see anyplace where it's actually said that Go is faster than Java. I can see many places where someone touchy and defensive might take it that way, however. Either the apparent knee-jerk reaction is a deliberate attempt to troll by creating a false argument, or it's simply a cringe worthy false argument worthy of a Picard meme.

seanmcdirmid · on June 8, 2014

A blog post talking about why to use Go...for performance...and here is how it is better than Java about performance, even though it's not really better than Java. I don't care about Java, my language of choice is C#, but this smacks of the worst kind of anti-logic promotion. It seems to be quite endemic in the Go community also.

stcredzero · on June 8, 2014

Again, is this a deliberate troll, or is it intellectual sloppiness? The post talks about why Go is fast. There are memory footprint comparisons. AFAICT, There are no language speed comparisons.

There is only text that a genuinely touchy and defensive person or a troll might misconstrue. Did someone steal seanmcdirmid's HN password?

seanmcdirmid · on June 8, 2014

If one talks about why Go is fast, and then compares techniques against other languages ANYWAYS, what the heck is the reader supposed to think? In an academic paper, this would be torpedoed right away: you aren't allowed to take the 5th if you want to beg the question.

stcredzero · on June 8, 2014

If one talks about why Go is fast, and then compares techniques against other languages ANYWAYS, what the heck is the reader supposed to think?

That a short discussion of language feature implementation techniques is provided as a point of comparison with regards to different types of overheads?

In an academic paper, this would be torpedoed right away

Well, since I read "this" in the context of this sentence as an assertion imagined by you and unsupported by facts, then yes, you are obviously right. That would be torpedoed. Note that it's not a product of who you thought it was.

4ad · on June 7, 2014

This article explains a few reasons why Go is fast. This article doesn't claim it uses compiler optimization technology unavailable to other languages, nor does it claim it does anything novel.

Go is fast, and this article makes a pretty good job explaining why. It makes no judgement on other languages, and all comparisons with other languages are only done to exemplify a particular feature of the Go implementation.

This article also doesn't do any Go advocacy, it doesn't claim Go is superior to other languages because of these reasons, it just offers an explanation for some observed performance characteristics to anyone who is interested. Don't attack the article on what it is not.

I also think out it's very dishonest to single out a few things form the article which you say don't matter (because they are "standard"), ignoring the most important bits. Go's real-world performance comes from it's value semantics. Yes, Java has one of the best JITs in the world, yet the very-unoptimized Go compiler blows it in real-life code just because of this.

seanmcdirmid · on June 7, 2014

The article is dishonest or uninformed, it is doing shady things like comparing boxed primitives in Java to unboxed primitives in Go, when Go has boxing and Java has unbowed primitives as well, what!?

Like this crap:

> Similar to Go, the Java int type consumes 4 bytes of memory to store this value. However, to use this value in a collection like a List or Map, the compiler must convert it into an Integer object.

...

> So an integer in Java frequently looks more like this and consumes between 16 and 24 bytes of memory.

> This is a Location type which holds the location of some object in three dimensional space. It is written in Go, so each Location consumes exactly 24 bytes of storage. We can use this type to construct an array type of 1,000 Locations, which consumes exactly 24,000 bytes of memory.

You could have constructed an array of 3,000 ints in Java also for 24,000 + a header bytes of memory.

Go has been around for 4+ years now. There compiler implementation is relatively behind where Java was at around 2000, but perhaps google isn't throwing that many resources at it....or dynamic adaptive compilation really does have an intrinsic advantage. Who knows...

louis-paul · on June 7, 2014

Unfortunately, Go's compiler is not as fast as it could be; most of the optimizations presented here were already made by compilers in the 80s.

The fact that modern compilers are a really complex piece of software that took dozens of years to write and improve to the state we are at doesn't helps. Hopefully, switching to a compiler written in pure Go in Go1.4 (IIRC) will allow code maintainers to benefit of Go's simplicity.

Alupis · on June 7, 2014

The comparison between GO and Java seems unfair, given they compare a primitive variable with an object... which has methods and a bunch of other things to increase it's size (for good reason).

Sure GO may be quick... but a JIT'ed java program will run at native C speed... because it's been compiled down to native code at that point... (and most language performance comparison's I've seen pop up generally ignore this fact and measure "performance" by timing runtime which includes the JVM firing up and executing cold/non-jit'ed code... not real-world scenarios for high performance code.)

vanderZwan · on June 7, 2014

> The comparison between GO and Java seems unfair, given they compare a primitive variable with an object... which has methods and a bunch of other things to increase it's size (for good reason).

Honest question: will it be JIT'ed to the point where you no longer need 24 bytes to store an Integer in a List on a 64 bit JVM? Because if not, that comparison doesn't strike me as unfair, given that he explicitly mentions memory bound situations. Furthermore, will an array of Integers be a tightly packed set of integers, or value references to Integers? In general Go's equivalent to objects are structs, which seem to avoid this overhead of Java objects. Again, honest question; if Java can JIT all that away, awesome! :)

For the record, the people on the Go-nuts board are pretty adamant about microbenchmarks not being all that representative, and that people should give the JVM some time to do its optimisation (Robert Griesemer was one of the original programmers on the HotSpot compiler after all).

pron · on June 7, 2014

Well, the JDK's ArrayList, for example, will box ints into Integers, but there are primitive int lists that won't. With Go, AFAIK, the problem is exactly the same. Can you write, say, a heap or a read-black tree that can store wither ints and strings, that will embed the ints? Also, value types are making it into the JVM in Java 9. I don't think that will solve the packed representation in a generic data structure problem, but neither does Go. OTOH, Java does optimizations that Go never can. For example, Java can inline interface calls, which Go can't (and due to the way Go defines interfaces, method calls are slower than Java even when not inlined in either).

Having said that, there are things about Go that I like better than Java: 1) no implicit primitive promotions (widening conversions), 2) a small standalone executable, 3) excellent start up time. The latter two make Go suitable for quick, small programs, like command line tools etc.

However, there are more things I like about Java the Go doesn't have: 1) dynamic linking, 2) final variables, 3) concurrent data structures, 4) awesome monitoring and profiling tools, 5) generics, 6) state-of-the-art GC, 7) a polyglot VM (I also prefer Java's exceptions and explicit interface implementation, but these are minor considerations). So whenever I need serious, long-running, server-side software, I'd stick with Java over Go. Particularly now that Java has fibers (goroutines)[1] (I'm the main author).

[1]: https://github.com/puniverse/quasar

vanderZwan · on June 7, 2014

> Can you write, say, a heap or a read-black tree that can store wither ints and strings, that will embed the ints?

I'm not sure I understand this sentence. Do you mean a union type? Or a generic datastructure? We all know Go doesn't have generics - that topic has been beaten to death, so let's not derail the discussion and just say this can be a valid reason for Go not fitting your use-case. But just for comparison, here's an implementation of a "generic" LLRB Tree in Go using interfaces and runtime reflection:

https://github.com/petar/GoLLRB

I suppose you could hand-write it to fit the value type of choice? Does that answer your question?

seanmcdirmid · on June 7, 2014

> We all know Go doesn't have generics - that topic has been beaten to death, so let's not derail the discussion and just say this can be a valid reason for Go not fitting your use-case.

Parents point was that Go will still box primitives in a generic collection class...just like Java does. So this isn't a special advantage of Go at all (it has nothing to do with Go not having type parameters). The only languages that don't box in generic collections is C++ and perhaps C# (it doesn't have to because of its template-like type parameter semantics, but still might to reduce code duplication).

pron · on June 7, 2014

Right. Although the template method has its own issues. For example, once the element type size exceeds 64 bits, you can no longer assume an atomic write/read. This isn't a big problem considering that concurrent data structures don't usually rely on the write/read atomicity of the elements themselves (though they do rely on atomicity of modifying internal, structural data), but that's just something to think about.

dbaupp · on June 7, 2014

Rust doesn't, and (I think) D doesn't.

seanmcdirmid · on June 7, 2014

Right. I should have qualified my statement with languages that I'm aware of.

natefinch · on June 7, 2014

You couldn't write a generic tree that packs the ints closely together, no. But you could write a non-generic tree that does.

Also, Dave was talking about Lists, which you certainly can have packed with ints trivially in Go. And every piece of software I've worked on uses approximately a million times as many lists as trees. Obviously, this is application-specific, but I think it's pretty fair to say that most programs will use a lot more lists than trees. If your application uses a lot of trees, Go is probably the wrong language for you, unless you're willing to make a tradeoff by using code generation to generate type-specific implementations.

pron · on June 7, 2014

> But you could write a non-generic tree that does.

As you could in Java.

> Also, Dave was talking about Lists, which you certainly can have packed with ints trivially in Go.

What lists? Array lists? Linked lists? Skip lists? Concurrent array lists? Concurrent linked lists? Concurrent skip lists? Go gives you packed ints for only one kind of list.

> If your application uses a lot of trees, Go is probably the wrong language for you.

Or if it uses any advanced data structure that Go simply doesn't have, let alone concurrent data types (Java has such a rich collection of data structures, usually with state-of-the-art implementations). Or if you need dynamic code loading. Or embedding a scripting language.

I actually think Go is a very nice language, but Java is much more appropriate for serious, long-running, high-performance server-side apps.

pjmlp · on June 7, 2014

That not, but most Java native code compilers (both JIT and AOT) do a relative good job of escape analysis and transform those new into stack allocations if possible.

Additionally, value types are already available as extensions in some JVMs like IBM's J9.

http://www.websphereusergroup.org.uk/wug/files/presentations...

Oracle is also pursuing their metacircular VM project Graal[1], formerly Maxime, which has an improved escape analysis algorithm.

https://wiki.openjdk.java.net/display/Graal/Graal+Partial+Es...

A big mistake many people do when talking about Java, is equating Java with the OpenJDK, when there are lots other implementations to choose from.

[1]Graal is planned to eventually replace Hotspot and is being used for GPGPU support in Java 9

opinali · on June 7, 2014

> The comparison between GO and Java seems unfair, given they compare a primitive variable with an object... which has methods and a bunch of other things to increase it's size (for good reason)

Nope, there's no "good reason" why java.lang.Integer is a full-blown Object with a heavyweight header and on-heap allocation. It doesn't need reference semantics, polymorphism, can't be inherited, it has a monitor but it's not terribly useful since the value is immutable, etc. The only reason why Integer is a java.lang.Object is that this is a constraint imposed by the typesystem so you can put the Integer in a collection or other kinds of code that manipulate "any object".

ewalk153 · on June 7, 2014

The comparison is completely fair; I don't think he's making the general statement that Go is faster than Java. He can legitimately make the claim that the overhead of a slice of ints is far smaller than an array of ints in Java.

You can add methods to an int in Go without any overhead by simply defining your own type that is backed by an int and then adding functions to that type. It still costs same.

Java performance for real-world, long-running, well written applications is greatly enhanced by an excellent JIT that improved performance over time. Go can be faster for looping over a list of integers while Java can win at branch prediction in complex code paths.

seanmcdirmid · on June 7, 2014

> He can legitimately make the claim that the overhead of a slice of ints is far smaller than an array of ints in Java.

No he can't. He was comparing a generic list in java of ints with a non-generic list of ints in Go, when in reality, you could reverse the comparison between languages and make the same conclusion.

stcredzero · on June 8, 2014

You don't need <generic-equivalent-boxing> in Go to get the power of slices, and use library search/sort. Are you saying there's an equivalently powerful set of facilities in Java that doesn't use generics?

seanmcdirmid · on June 8, 2014

Yes, they are called...slices. There is no library implementation in Java, but it is completely implementable (I did the whole thing for Scala way back). Say:

    class Slice {
      int[] Original;
      int Begin;
      int End;
      int Get(int index) {
        if (index > End - Begin) throw new ...
        return Original[Begin + index];
      }
    }

Perhaps Go slices are native and they can eliminate one extra bounds check? Or that you don't have to implement a custom slice class for N primitive/value types (though Java doesn't support value types yet)? Since the code is easily templated, you could just use SugarJ or some other macro system for Java.

It is ugly without generics though, and...beyond defining slices, Go has most of the same usability problems as Java (all operations on slices have to be duplicated!). C#-style generics fix these problem, of course...meaning you can have your slice of cake and eat it also.

stcredzero · on June 8, 2014

Yes, they are called...slices. There is no library implementation in Java, but it is completely implementable...

Perhaps Go slices are native and they can eliminate one extra bounds check? Or that you don't have to implement a custom slice class for N primitive/value types (though Java doesn't support value types yet)? Since the code is easily templated, you could just use SugarJ or some other macro system for Java. Or that you don't have to implement a custom slice class for N primitive/value types (though Java doesn't support value types yet)? Since the code is easily templated, you could just use SugarJ or some other macro system for Java.

You definitely lost the conversation context, or this is some kind of debating maneuver? The point is that the memory overhead for the degree of functionality offered by slices in Go is quite small by design. Yes, you can implement it in Java, but various kinds of overhead (not just memory, btw) in Go are small by design. Then comparative implementation details are discussed to provide context for programmers who may not concentrate on language implementation.

You're also kind of barking up the wrong tree. Go is not really meant to replace Java. It's really meant to "replace" C, or more accurately bridge a gap a bit higher-level than C and lower level than "High Level Languages," particularly where concurrency is of key importance. In the contexts where it seems to try and replace Java or C++, it's really that a locus of language implementation tradeoffs wasn't immediately identified and languages with slightly too many features were used to fill the gap.

C#-style generics fix these problem, of course...meaning you can have your slice of cake and eat it also.

No one is arguing that C# style generics don't fix this, or that certain things are less pretty without them, or that other languages can or don't do certain things more elegantly. In particular, yes, I know C# is very underrated. Camp Smalltalk attendees knew about the cool things about the Common Language Runtime about a year and a half before the public. (One camper declared that it basically meant "Smalltalk had won.") We hear you and believe you and you are right. Validated? Now please stop repeating yourself. That's simply not the key "point" which was made by the post and is not being understood in these threads. That point is not being made nor even being addressed by you.

Take it from an old Smalltalker: doing cool stuff yields 100X more benefit than comparing languages and tooting your own horn. Goes even more for objecting to others tooting theirs.

EDIT: When I say "no one would say that," I really mean, "no one reasonable would say that." As you and I well know, this precludes many programmers while they are in a holy language-war discussion.

melling · on June 7, 2014

What is Java's startup and JIT overhead? Go seems to be a good replacement for when you need a faster Python. For large, long running programs the JIT probably has better optimizations than the current Go compiler.

pjmlp · on June 7, 2014

> What is Java's startup and JIT overhead?

Quite fast if you use an AOT compiler.

On the server side, it is usually doesn't matter that much. And when it does, there are JVMs that cache JITed code.

papaf · on June 7, 2014

This is true for PC's and servers. However, Java startup time on the Raspberry Pi is horrific.

I recently saw a small server go from 3 seconds startup on my PC to 4 minutes on a Raspberry.

pling · on June 7, 2014

That's pretty much because the CPU on the Pi is awful. I mean really bad. The CPU came with the SoC they could get their hands on rather than was selected as being optimal for a desktop/server role.

pling · on June 7, 2014

Not only that but it's an easier to find out what is going on inside a JVM versus a static Go binary. It's not all about performance.

pron · on June 7, 2014

Absolutely, but it also has better performance than Go.

natefinch · on June 7, 2014

For some things. For many things they are pretty equivalent.

http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

And Java uses a lot more memory, which is what the article was talking about adversely impacting execution speed.

pling · on June 7, 2014

It uses memory differently is probably a better explanation.

igouy · on June 7, 2014

http://benchmarksgame.alioth.debian.org/play.php#whymemory

waps · on June 9, 2014

Beating java in memory usage is like beating a tied-up blindfolded turtle you gave cement shoes and dropped into a toxic vat in the 100 meters walking.

You know, if the turtle was disqualified ahead of time.

Now let's compare how many lines it takes to read in 2 csvs into structs, sort them by their normals and write their vector products out into a new file. I bet java will do < 100 lines and go will be 10 files and > 1000 lines. It'd be about as honest.

igouy · on June 7, 2014

>What is Java's startup and JIT overhead?<

Wouldn't that depend on the program?

For a tiny program (meteor contest) that only needs a fraction of a second on a 6 year old quad-core desktop, maybe 99%

For a tiny program (nbody) that needs a few tens of seconds on the same machine, maybe 0.4%

http://benchmarksgame.alioth.debian.org/play.php#java

astrange · on June 7, 2014

Function calls aren't that slow in an OoO processor - they're perfectly predictable branches, so it can just start decoding from over there. There might be a cache miss, but there might also be fewer cache misses, or even better the CPU might skip decoding with a µop cache.

Really, the purpose of inlining is so inline functions can be specialized for their new context, which can easily make the total code size smaller. On x86, size/speed tradeoffs just don't happen like they used to.

gsg · on June 7, 2014

That's not the whole story. There are other costs associated with calls such as spilling and imprecision of data flow analyses around a call site.

HeroesGrave · on June 7, 2014

Things that make Go fast*

*compared to non-native languages like Python and Java.

Could people please stop calling their favourite language fast just because it beats an interpreted/VM language.

pjmlp · on June 7, 2014

Not only that. Usually these comparisons cleverly leave out AOT compilers for the said languages to make theirs look better.

In Java's case there are quite a few JVMs, many of those with AOT compilation to choose from, even implemented in Java itself.

marktangotango · on June 7, 2014

For Java can you name any AOT compilers besides Excelsior JET and GCJ?

pjmlp · on June 8, 2014

GCJ is dead.

Yes, CodenameOne, JamaicaVM, Aonix Perc and J9 all support AOT compilation besides normal JIT.

The Oracle Hotspot replacement project, Graal allows for AOT compilation via SubstrateVM.

There there is RoboVM for targeting iOS applications, with WP support getting added now.

Android is replacing Dalvik with ART, which does AOT compilation at installation time.

Probably a few more that I am not aware.

marktangotango · on June 10, 2014

Thanks for the info, interesting the first four you mention are commercial products. Two you may find interesting: avian vm, and xml vm at one point could translate jvm bytecode to c for compilation with gcc.

4ad · on June 7, 2014

No, compared to nothing. Go is fast (or at least that's what Go programmers feel when they are using it). If they want to know why, they can read this article. This article is not Go vs. Java vs. C++ advocacy, don't try to make it looks like it is. It makes no claim Go employs better optimization technology than C++.

Dewie · on June 7, 2014

Is Go even faster than Java at this point? Last I saw, it wasn't quite there.

The only comparison with regards to Java was with the representation of integers, IIRC, which isn't really a native vs VM issue.

natefinch · on June 7, 2014

For pure processing speed, Go is pretty comparable to Java for code that does not heavily rely on generics.

http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

What Dave's presentation mentions is the memory... note the huge difference in memory footprints of the programs. That's where java loses heavily, and that can affect speed as well, which was his point.

ssmoot · on June 7, 2014

> for code that does not heavily rely on generics

Does adding Generics heavy code move the needle in Java's favor?

As a Scala developer your comment just jumped out at me because even doing a lot of Akka this past week, there's hardly a line of code that doesn't instantiate or manipulate a Generic.

Just curious. I'm sure Go will get faster. I'm sure it's better suited to certain problems (memory constrained especially).

seanmcdirmid · on June 7, 2014

No. If you don't use generics in Java, you get the same memory footprint of not using generics (or since it doesn't have type parameters, using any) in Go. So the author is making an unfair apples-oranges comparison when the apples-apples comparison is quite obvious: one can write an int-list in Java just as easily as they can in Go.

Frankly, this is just strong intellectual dishonesty.

stcredzero · on June 7, 2014

Frankly, this is just strong intellectual dishonesty.

If you're going to harp on rigor, then you need to consider then eliminate the prospect of an honest mistake.

seanmcdirmid · on June 7, 2014

Intellectual dishonesty doesn't mean lying, it could just mean "honest mistakes" that fail to apply proper reasoning and comparisons.

stcredzero · on June 8, 2014

Right now, you seem to be making an "honest mistake" with my referents. You seem (atypically) weirdly consistent with such honest mistakes in these threads. What is the lie, exactly?

zwieback · on June 7, 2014

I was surprised to see stack-check preambles mentioned here. Does that really happen on every function call? Or does it happen on a context switch? Usually stack-checking on function entry is considered something that makes code slow.

4ad · on June 7, 2014

Yes, it happens on every function call. It costs 3 machine instructions. That is nothing.

There is no other "context-switch" other than the one triggered by this check (and other similar mechanisms), Go is cooperatively scheduled; all preemption is voluntary.

zwieback · on June 7, 2014

Wow, what can you do in three instructions and what happens when the stack check fails? Sounds intriguing, think I'll read up on that...

nwmcsween · on June 7, 2014

If the stack check fails it has to 'allocate' a new stack and copy over the old one and point to the new stack?

4ad · on June 8, 2014

Let's take a look at Linux. Other systems are similar.

    ; go tool objdump -s main.main a
    TEXT main.main(SB) /private/tmp/a/a.go
    	a.go:9	0x400c10	64488b0c25f0ffffff	FS MOVQ FS:0xfffffff0, CX
    	a.go:9	0x400c19	483b21			CMPQ 0(CX), SP
    	a.go:9	0x400c1c	7707			JA 0x400c25
    	a.go:9	0x400c1e	e8ddf90100		CALL runtime.morestack00_noctxt(SB)
    	a.go:9	0x400c23	ebeb			JMP main.main(SB)
    	a.go:10	0x400c25	e8d6ffffff		CALL main.foo(SB)
    	a.go:11	0x400c2a	c3			RET
    	a.go:11	0x400c2b	0000			ADDL AL, 0(AX)
    	a.go:11	0x400c2d	0000			ADDL AL, 0(AX)
    	a.go:11	0x400c2f	00			?

On linux/amd64 we can use the Local Executables TLS access procedure. In particular, we use a negative offset from the FS segment register to get a TLS slot (our job is simpler because we are always the main executable).

    MOVQ FS:0xfffffff0, CX

We make use of two TLS variables, g and m (soon we will only use one), a pointer to g is at -16(FS). We access it in this first instruction.

g is an instance of struct G, see go/src/pkg/runtime/runtime.h:/struct.G. It contains many things, but it starts like this:

    struct	G
    {
    	uintptr	stackguard0;
    	uintptr	stackbase;
    ...

In particular the first word (at offset zero) is the stackguard, which indicates the stack limit (it is also used for voluntary preemption, but that doesn't matter here).

This instruction in the stack check preamble:

    CMPQ 0(CX), SP

Compares the current stack pointer with the stackguard. In most cases we have enough stack, so the next instruction just skips past the preamble to the real function code.

    JA 0x400c25

When we don't have enough stack, we call a function in the runtime (one of the runtime.morestack functions). This function allocates a new stack segment (from the heap). Currently we use contiguous stacks, so if we have complete type information in the current stack we can just copy the old stack to the new stack segment fixing any pointers as dictated by the type information, and then we switch the stack pointer.

If we don't have enough type information (or in previous Go versions), we use segmented stacks. We allocate a new stack segment, but we don't copy the stack; we just switch the stack pointer and we take care to be able to do the reverse operation when we return from the function.

Take a look at the next instruction after the call to runtime.morestack.

    JMP main.main(SB)

We just jump to the beggining of the function like nothing has happened. Then the algorithm repeats, but we won't fails the stack limit check again, so it will skip it. Why it jumps to the begining of the function instead of just continuing in the body of the function is left as an exercise to the reader.

We used the Local Executables TLS access model here, sometimes we have to use the Initial Executable model. If we ever allow Go programs to be loaded by C programs as dynamic objects, we would have to use more complicated models.

On ARM we just use a register instead of using any form of TLS. On most systems Go binaries set the FSbase register to some value on the heap, but when we use cgo, or on platforms that don't support static binaries we don't touch FSbase, as it was already set up by libc.

Functions that use little stack (under 120 bytes) can be excepted from this stack check.

zwieback · on June 9, 2014

Thanks, nice writeup.

fiatmoney · on June 7, 2014

Hey, as long as we're talking about Go performance - can we please, please get some kind of wide vector intrinsics (ie, no cgo overhead) in a library, or at least aggressive compiler generation of vector ops that actually use AVX & the ARM NEON equivalent?

Right now peak floating-point performance isn't even within half of what it should be on a very recent CPU, and I'd love to be able to deploy Go that exposes machine learning models to a network interface.

azam3d · on June 7, 2014

Go should replace Java in Android Development

pling · on June 7, 2014

Go should replace Dalvik and the half arsed Java runtime implementation on Android yes, but I'd take a proper mature JVM over both on any device.

pjmlp · on June 7, 2014

I am looking forward to the Google IO presentation about ART.

Looking at the official languages both in the iOS and WP 8.x SDKs, Google should at least give first class support to all major JVM languages.

pjmlp · on June 7, 2014

Not likely to happen.

- The ticket requesting support is open since 2012;

- Android team is very Java biased and see the NDK as something that was imposed on them;

- The Go team is still discussing how to add dynamic loading support to make it work in Android

buster · on June 7, 2014

Rust and python should, for speed, GC-lessness, memory-efficiency and ease of programming.

robryk · on June 7, 2014

Nitpick: goroutine context switch can also happen at function calls (when the stack is being enlarged).

Rapzid · on June 7, 2014

I guess taking advantage of that could be tricky. If your function gets inlined...D'oh!

skj · on June 7, 2014

If your function is inlined (which happens at compile time), then there won't be any stack growth and the point is moot.

Rapzid · on June 7, 2014

This is a form of pre-emptive scheduling. It doesn't happen when the stack size needs to increase, it causes the stack check to fail. A bit of classic Dmitry cleverness: https://docs.google.com/document/d/1ETuA2IOmnaQ4j81AtTGT40Y4... http://golang.org/doc/go1.2#preemption

Anyway, I was just offering this scenario up as a bit of curious humour where somebody might think they are providing an escape hatch but the compiler in-lines their call foiling their plans :)

stcredzero · on June 7, 2014

Curious humor == Classic too clever by half.