This is pretty strange if it's true; shouldn't a switch (especially one that sim...

laumars · on April 16, 2013

I think the Go compiler handles switches a little more like a series of "if else" (In fact I think a few languages do that) but Go supports first class functions, you can map things somewhat differently using that.

Ken Thompson does go into more detail in a link off the SO question I posted earlier.

[edit: i see you've already spotted that link. Sorry about that]

tptacek · on April 16, 2013

There is a general form to switch statements in Go that really is syntactic sugar over if/else chains.

I get that serious emulators invest effort in making dispatch fast, and that a naive for/switch loop is not the fastest way to dispatch instructions, but it's nice for getting the emulator working. :)

daeken · on April 16, 2013

> I get that serious emulators invest effort in making dispatch fast, and that a naive for/switch loop is not the fastest way to dispatch instructions

If you're the kind of masochist that enjoys optimizing dispatch loops (hi, you're not alone), check this out: http://www.emulators.com/docs/nx25_nostradamus.htm One of the best resources out there for this sort of stuff.

asb · on April 16, 2013

Thanks, I hadn't seen that before. My go-to point for all this stuff has always been the papers by Ertl and others http://www.complang.tuwien.ac.at/forth/threaded-code.html

ANTSANTS · on April 16, 2013

Dispatching a table of functions should have roughly similar performance to dispatching a large switch table, yes, but implementing opcodes as a table of functions allows you restructure your bytecode interpreter in a way that significantly improves performance. Here's a much better explanation than I could give of the dispatch-per-opcode technique, how it works, and why it's faster (tl;dr it's all about branch prediction): http://eli.thegreenplace.net/2012/07/12/computed-goto-for-ef...

That article explains how to implement the technique for C programs using a "computed goto"/"labels as values" gcc extension. While Go lacks that feature, dispatching a table of functions as the last statement in each opcode implementation should yield a similar result. As long as Go supports "tail-call optimization" in the trivial case of "functions that take no parameters and return nothing calling similar functions" it should work just fine. Googling suggests that Go does not support TCO, but at least this program didn't explode the stack:

EDIT: Yes, it does.

  package main
  import ("fmt")
  var acc byte
  func main() {
  	fmt.Println(acc)
  	acc++
  	main()
  }

sevenelfen · on April 16, 2013

> I'm not familiar with the implementation of Go, but at least this program didn't explode the stack:

It does, it's just that Go is so slow printing to the console that it would take years to run out of stack space. If you redirect to null it will use up all your memory and swap space in a few minutes:

./main > /dev/null

In most other languages this same code would run for a short time and then abort after exhausting the stack. This is the best behavior since algorithms that use unbounded memory are where you certainly must handle out of memory errors and set limits; using too much stack space is an error that should be caught quickly not postponed. Go on the other hand uses a growable stack, so the code you gave will use up all available memory and swap before finally crashing.

Go uses a growable stack so that programs can use many goroutines on 32-bit machines. This is bad for performance due to extra checks on calling function to see if the stack needs to be grown or shrunk, the overhead to actually do that, and less efficient use of cpu data cache. It makes it complicated to call functions from any other language. It seems like any modern language should work best on 64-bit and make trade-offs for 32-bit, not the other way.

cmccabe · on April 17, 2013

You are mistaken. You don't need extra checks to determine when to grow the stack. You just need to leave some unmapped space after the stack and correctly handle a SIGBUS error. There is no extra overhead above what the operating system is already doing.

Growable stacks aren't about getting optimal performance on 32 bit architectures. That is explicitly a non-goal of Go. They're about minimizing memory consumption for goroutines which don't use very much stack space, which is expected to be most goroutines. You can't have hundreds of thousands of goroutines if you have a high fixed amount of memory per goroutine.

As for your argument that growable stacks make it harder to determine program correctness, it seems like nonsense to me. I could make the same arguments about heap space, but nobody thinks a low fixed limit on heap sizes is a great idea. If you want to test your program under low memory conditions, try mlocking a lot of memory and then running your program. Alternately you could try something involving cgroups or virtual machines.

Evbn · on April 16, 2013

Go uses a growable stack so that each goroutine's stack consumes less memory,not less address space.

pcwalton · on April 16, 2013

On my system that Go program does not seem to be tail call optimized. Look at the memory consumption over time. You aren't seeing an OOM because Go performs stack growth as necessary.

ANTSANTS · on April 16, 2013

Oops, you're right, I was looking at the wrong process in htop: the "go run test.go" process used to build and launch it, not the actual test process, which indeed burned up memory very quickly (but not fast enough to crash when I ran it for a minute earlier). I should have sorted by CPU usage instead of searching by process name.

Can anyone think of any other way to implement this kind of dispatch system in Go? If it isn't possible, I don't think Go deserves the status of being "a good language to write emulators in," as this is a pretty important technique for improving performance of CPU emulation.

cmccabe · on April 17, 2013

If you want the fastest possible speed, what you need is a technique called dynamic recompilation. Essentially, you compile the 6502 opcodes (or whatever) to native x86_64 code, and execute that. It's considered the gold standard of emulation. Nowadays, it's usually done by using LLVM to generate bytecode, and jumping to that.

By the way, I did enjoy reading your link about computed gotos. It sounds like what it comes down to is that the C switch statement does bounds checking, and the computed goto can avoid that. At the end of the day, though, no matter how many hacks you pile on, using an interpreter will have an overhead above dynorec.

N.B. Tail call optimization has its costs as well as benefits. Rust recently announced that it won't be supporting it. I don't know if the Go guys have made any statement about this, but I would imagine they might not consider it, for the same reasons. More here: https://mail.mozilla.org/pipermail/rust-dev/2013-April/00355...

laumars · on April 16, 2013

I'm surprised that even ran. I've had issues in the past where go build refused to compile because it detected the potential for a recursive function call (this was in a table of functions where one func referenced another, so there wasn't any practical risk of getting locked in a loop even though the compiler detected a theoretical risk).

I'd probably rank that compile time 'error' as one of the most annoying I've seen to date because the code was actually fine, it was just the compiler pre-empting a non-existent risk. So I had to rewrite a chunk of code just for the sake of over-eager error catching (this is also why I wish go build would just warn about one or two trivial issues instead of flat out fail)