A lot of snarky and petty comments here I must say. Regardless of how the performance improvement was achieved I think it was quite interesting to see that you could convert a big chunks of C code to Go code with minimal effort and still get good performance.
I think too many here read the story as some kind of competition between Go and C, when it should be more read as how well the conversion process went. For anyone interested in going down this route, surely this is useful information.
We don’t have unlimited time in this industry and thus if something like this can be done with minimal effort while retaining expected performance , then I would say that we s a good thing.
Stop nitpicking as if the author tried to have final word on which language is faster.
> A lot of snarky and petty comments here I must say.
"Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community." https://news.ycombinator.com/newsguidelines.html
I hoped it was clear in the blog post: this is a very specific case where I only needed to convert coordinates to h3 index, with this specific goal this transpiled version outperform the cgo version.
The comment "probably due to Go runtime scaling on multiple cores" was me still seeing very close performance with GOMAXPROCS set to 1, but observing the process behaving differently than the pure C version.
The fact it was not thread safe has been discussed already on Reddit and is not relevant nor an issue when used in a batch scenario like this.
ccgo is awesome, but feel free to discuss something else ;)
Heh, I wonder if anyone has tried transpiling the C generated by Chicken Scheme to Go.
Would be interesting to see how (horrifically?) the garbage collector would fare in performance once it's transpiled to a garbage collected language like Go.
What would also be nice is an ML or Lisp transpiled to Go. Go has a nice compiler with a lot of money thrown into it, but it's locked into one syntax, and set of language features that isn't optimal for every use case, so it's a good target for transpilation.
It would be fun to see Go with a terser syntax, better type inference, nicer error handling, etc.
Something like that would be amazing, like the godsend that Scala was for the JVM back when Java was very, very imperative and totally stagnant, though Scala didn't/doesn't transpile to Java, but compiles to JVM bytecode directly. I don't think the same would be possible for Go, which is a shame, but AFAIK there is no such intermediate to target. And given how limited Go itself is, I wouldn't be all that optimistic that a transpiling approach could support many advanced features without tanking performance because you just can't model them efficiently in Go at all, even if you limit enforcement of most guarantees to the transpiler.
I’m not who you are responding to, but optimizing recursions (e.g. tail-call elimination), loop fusion, etc. Functional languages (at least haskell) does plenty of optimizations not generally possible in imperative codes.
tail call elimination is possible by converting the tail-recursive function to a loop. in fact Go could support it, but the team doesn't want to do that.
I agree that everything Haskell does would be hard, but everything e.g. Clojure does should be possible
Compiling functional languages to other high-level language s generally has pretty awful performance compared to simple native compilation. Even compiling to C has it's limitations.
The fact that Go has a nice compiler doesn't really help, because the code generated by the functional compiler will have usage patterns that are very different from what the Go compiler is designed around.
I’ve used all of those except SML. Go is much easier to set up and get productive in, in my opinion. And if you care about compilation speed, you can forget about F#.
> the old word already perfectly describes translating from one formal language into another
The word “compiler” actually doesn’t do that (like, say, “translator” would). “Compiler” originally referred to something more like what is today called a linker. It “compiled” (put together) a set of subroutines (in assembly/machine code) into a combined executable.
You were arguing that the word “perfectly describes translating from one formal language into another”. The word doesn’t do that, as it’s not descriptive in that sense.
Furthermore, translating from (say) JSON to XML is also “translating from one formal language into another”, but that’s not an example of what we mean by “compiler”.
So what do we mean? That’s actually not easy to define. Defining it as translating from one programming language into a different programming language isn’t quite correct, because machine-code binaries aren’t a programming language. Defining it as translating from a programming language into a different representation preserving the program semantics is also not quite correct, because e.g. pretty-printing to HTML would arguably fit that definition.
Clearly there’s some general definition of what we mean by “compiler”, even if I’m failing to find a precise and correct wording for it here. But there’s also the more narrow meaning of “translating from a programming language into object code”, which is what is usually meant in the majority of cases. Because we don’t have a separate word for that narrower definition, there is some ambiguity. The word “transpiler” resolves the ambiguity in one direction. In the other direction, the ambiguity is usually resolved by assuming that “compiler” means “to object code” by default when the context doesn’t specify otherwise.
> machine-code binaries aren’t a programming language
Machine-code binaries aren't (they are programs), but ISA of a CPU is.
I'd plug in a requirement of Turing-completeness for the source and destination language, and at that point the "semantics-preserving" variant makes perfect sense to me. Although canonically textbooks and Wikipedia don't seem to require that, so I'm fine accepting a definition not requiring Turing-completeness, just preservation of semantics.
It makes sense to drop Turing-completeness. Many interesting languages are not Turing complete.
Perhaps the most practical example: just-in-time compilation of regular expressions to native code via LLVM. Regular expressions are not Turing complete.
Another example: take Agda as a source language. Agda is deliberately not Turing-complete, but can express almost any program you might be interested in. (Basically, Agda only allows you to express terminating programs. That's why it's not Turing complete.)
> Furthermore, translating from (say) JSON to XML is also “translating from one formal language into another”, but that’s not an example of what we mean by “compiler”.
I'd be perfectly happy to describe that as a compiler.
Transpiler is a pretty old word too. My understanding is that it's a compiler whose output is human readable. Readability is more about whether it is an explicit goal: all compiler output is readable by some humans.
I don't think anyone would say Unity's il2cpp is a transpiler even though it compiles .net bytecode into C++: producing readable C++ is a non-goal for il2cpp.
> Also why do people invent a new word 'transpile'
Not this ignorant myth again - it isn’t a new word, been in use since mid 60s. A transpiler is a form of compiler from high-level language to high-level language. A compiler didn't even originally translate, but just link.
The same way ‘navy’ is a kind of blue but we can say navy and add a little more specific information. Nobody rails against navy saying ‘but it’s just a shade of blue!’
> Also why do people invent a new word 'transpile', when the old word already perfectly describes translating from one formal language into another?
The answer is... the old word ('compiler') didn't mean what it does today, back in the 60s, when 'transpiler' was coined. Compiler with its current meaning isn't a much older word than transpiler - they're near contemporaries (50s and 60s). Before that a compiler meant a linker.
Ok, that makes sense. I had thought 'compiler' was from the 1950s and already had its modern meaning.
Of course, if we had more rational vocabulary, we'd call them 'translators' instead. (In fact, that's what German does. Perhaps they coined their corresponding term a bit later, when things had already shaken out a bit more.)
That is absolutely not true. Just because early compilers acted more like linker/loader doesn't mean they used the word compiler to mean "linker". When the term was coined it absolutely meant translating mathematical formulas into machine code. Compiler very much had the same meaning it has today.
Sorry that's not my understanding of the history - early compilers were more like what we'd call today template compilers - linking blocks of pre-defined machine code together.
Yes, that is very much what they were. However, they used the term compiler in very much the same way it's used today. It didn't mean something different. The concept of the "compiler" was to translate mathematical symbols (or predefined machine code representing that math) and later English like words into programs a machine could execute.
The selling point of Go is readability, feature minimalism and code that might be boring to write but gets you readability in return. Putting a Lisp on top of that throws these away.
I'm not saying there's nothing left - channels, maybe the standard library. But putting a Lisp on top of Go feels like going strongly against the grain.
But Go is not that readable. In fact across all languages I know I would factor go towards the bottom end of readability. Mostly because of it's insane error handling overhead. A lot of function bodies are like reading a book where every sentence is followed with 3 that have no meaning.
The fact that goland automatically collapses error handling blocks is solid evidence that enough developers find go's error handling blocks to be invasive.
I agree. My point is that you can't objectively define readibility, so it will always be a matter of preference. I've met my share of Go developers that think it is very readable. I personally can't stand it.
You don't understand 'readability' the same as the language creators.
More text to read doesn't necessarily make a file or function less readable. Quite the opposite usually.
IntelliJ collapses the repeatedly occurring err != nil blocks with expandable placeholders in grey color to aid readability. Not the ultimate evidence but says something when a commercial product spends effort behind such features.
I don’t know, I understand most assembly line I see in isolation, yet often have no idea what the whole does. Sometimes too primitive primitives actually hinder readability/understanding.
I know many don’t feel that way, but functional stream manipulation is a very good example for that — I much rather deal with filter.map.reduce whatever, than 3 nested for loops with random breaks inside, even though the latter may be easier to reason about line-by-line
Not necessarily no. It's a balance. And Go skews so far that it's no longer reasonable. It's so bad that the most popular Go IDE by default removes the error handling code from view.
That’s a myth. Go read some random Go. Trivial functions often do nothing but call a couple of other functions, check errors and return. But non-trivial functions use relatively less lines for error handling.
> The selling point of Go is readability, feature minimalism and code that might be boring to write but gets you readability in return.
That is one selling point. Another is that it's a GC'd language that compiles to native on all major platforms, and is more popular than the alternatives, like OCaml, Haskell, D, Common Lisp. Just like Java was used as a platform on which to build languages, like Groovy, Scala, Clojure and Kotlin t otake only the most popular.
Besides the (subjective) advantages that you've mentioned, the selling points of Go include standalone binaries, fast compilation, a good standard library, and crucially, strong corporate support. A transpiled language could benefit from those.
I've been dinking around with generating Go from Lisp macros. For me the motivation is that infra teams in my company have been supporting Go but dropping the ball on Java/Scala, so I want to generate review-ready code for them while letting my experienced team work in some more complete language (maybe Java though that seems like a lot of work for a prototype).
Ecosystem, you forgot ecosystem. Go has lot of mature libraries. So new language will not have to create yet another library for HTTP server, JSON parsing, SSH connection, key-value DB, SQLite connection etc..
And network applications can be written without relying on async.
JVM: 'bloated' or at least the perception of it. Now improving quite fast but also Oracle.
Dot net: still Microsoft.
C/C++: Not similarly high level languages
Scripting languages: lol slow.
Turns out Go ecosystem isn't that tiny at all. Yeah java is larger but also comes with a bad perception among other programming language users. So after Java, if you're targetting a general purpose audience, it's Go I think.
Never really understood this company-problem - if we squint hard enough, then perhaps .NET is really too Microsoft-centred (though improving rapidly in recent years), but Java has multiple, fully independent implementations and a specification. It is much much more open than Go in this regard, though again, I think it is often overblown of a problem.
Not trying to start a flame war, but.. is the Go compiler nice? Other than being fast, and static compilation what good features does it have compared to say the LLVM toolchain? It barely does any optimizations.
I have a large C codebase for an embedded device, and I've often thought about the work involved in one day porting it to Embedded Linux with a higher level & safer language.
The transpiled Go code seems to end up having to use lots of unsafe code, as even this little sample shows... it would probably be impossible to translate C without doing that. Is the resulting binary actually memory safe in any meaningful way, or even safer than the original C?
The unsafe parts, surely not. That's not entirely the point though. Dropping a dependency on C is a big deal in Go: faster calls, no CGO means faster builds and static binaries.
Go also let's you hand-write assembly and link it in, usually for performance reasons, which very much lacks safety guarantees, but has proven to be incredibly useful for optimizing hot code paths.
The gc compiler produces notoriously large executables but Go programs need not be so large. A small program compiled using the tinygo compiler can be as small as 10s of kilobytes.
This certainly doesn't look like a embedded device though:
> The reference hardware platform is the PC Engines™ apu2c4 system board. It features a 1 GHz quad core amd64 CPU, 4 GB of RAM, 3 Ethernet ports and a DB9 serial port.
I didn't mean to imply this. I consider most router hardware embedded, but they includes CPUs. It is just that they're CPUs that you would generally consider embedded (like a SoC).
And BTW, my reply was mostly because of the context: people were talking about binary sizes for Go, but if you're using PC-like hardware you wouldn't care about binary sizes anyway (you can simply put more storage, even something like a 1GB SD Card would be way bigger than something like OpenWRT that can run with hardware as low as 8MB of Flash memory).
I know the author ended up with code that was the same speed at the end of all this, but if you're ever intentionally refactoring code just to make testing easier (and not to aid in the readability or overall architecture) isn't that usually a bad design decision? Further, if you're intentionally refactoring code that you know may slow it down 2x just to aid in writing tests, isn't that a bad reason to refactor that code?
> The result library is often slower than pure C (~2x magnitude) but still having a native Go port could simplify some aspects of the workflow like testing.
In my experience, being difficult to test is a sign that a refactor is needed anyways.
Difficulty testing also creates a higher barrier to adding new features. If you can't test something, it reduces confidence that your change won't break it.
If it's an extremely performance-sensitive component, then you'd obviously want to be very cautious about big changes. However, that's usually not the case.
Nope, if code wasnt testable, code was less valuable. Overall architecture doesnt bring client, doesnt ease test, doesn't speed up development, etc: it can be just a pretty picture on the architect's wall while everyone is miserable. Refactor for testing.
Agreed if it's gonna double the time spent per unit of work, compromise. Do not care about the architect's pretty picture, he's not solving clients' problem, you are.
> What an amazing tool that can completely change function names when it converts from C to Go.
How can one read the code of the benchmark, then switch into virulent sarcasm mode without trying to understand the code? And seeing "+1" comments without any effort to understand is also disheartening.
The blog post had a link about the Go helper functions the author used. It lands on https://github.com/akhenakh/goh3/blob/main/h3.go This shows that the `FromGeo()` function used by the Go benchmark is a helper that calls transpiled functions. The benchmark code itself was of course not transpiled, so the sarcasm was unneeded and wrong.
The C and Go code may not be strictly identical, but they seem pretty close. Enough to suppose the blog post was sincere.
I believe the right thing to do is to ask the author why they used a custom rewrite of `latLngToCell()` in Go instead of calling the transpiled function.
> The Go function `XgeoToH3()` allocates a TLS then calls the same functions
And when author 'batches' the thread local storage he changes the code from thread-safe (C version) to not thread safe. The transpiled code is littered with TLS alloc and free calls to emulate stack variables that must be matched in order and that use the heap for temporary storage rather than the stack, with different cache access patterns that can very much affect tiny benchmarks. Contrary to what the author supposed about the performance, the fast Go version can't even be run over multiple threads.
This is analogous to converting locks to NOPs and improving performance on a single-threaded microbenchmark, but doing so isn't an appropriate comparison even though the code may look identical except for one or two lines.
> Contrary to what the author supposed about the performance, the fast Go version can't even be run over multiple threads.
Since go deliberately fully hides the os threads, this TLS emulation would be per green thread (per go-routine).
I don't see why the go code could not spin up say N goroutines, each one with one NewBatch() call to initialize the simulated TLS for that green thread, followed by repeatedly calling XgeoToH3() (which should be cleaning up its TLS usage to be back to 0 bytes used upon return).
Needing to call NewBatch one per thread changes the API a little from the cgo API, but I don't think it is otherwise not thread safe.
You'd allocate one TLS per thread, or have a reusable pool, if you wanted to run this multithreaded. Allocating and freeing a TLS per coord is not how the C code would run. If you created a new pthread per point in C that would be pretty slow too.
Yes, you can do this, but why didn't the code do this in the first place? Because you also want an API that's not annoying and difficult to use.
It's a real drawback to this transpiler that it takes thread-safe, purely functional code and turns it into code with side effects that have to be carefully managed. Probably due to Go not having a fixed in place stack guarantee.
There are many parts of the code that are functional in C and have side effects in the transpiled version.
Look ultimately what we have is a C version and cgo version that are roughly the same speed and a transpiled version that is 1/6th the speed - and the caller can be in any thread in any of those. Then there's a different API where the caller has to manage storage that's on par with the first two, but that's not the same thing. If you jump through these hoops you can be on par is a different claim from what the blog author made.
Now it's possible that the wrapper functions could be made fast by storing the TLS object in a Go thread local storage so the API is the same, but the author didn't do this.
I will translate from sarcasm to productive comment:
"The analysis appears to be flawed. The C code and Go code ultimately call different functions, making performance comparisons unreliable at best. Furthermore, the code is single-threaded and will not benefit from 'Go runtime scaling on multiple cores.'"
It's the default mode of twitter and other places where conversation is a wasteland.
Sure, I'll take a cynical terrorist over a true-believer murdering people, but here we're debating conversation among non-terrorists with and without sarcasm.
I appreciate your reply, but there are different tones to it. I see most technical cynicism as frustration and ambition for technical excellence.
Most emotionally detached debate gives some undertones of something creepy,
I can't really put my finger on. I would not waste sweat on a vim vs emacs war :-) but if you don't get your heart pounding on a Windows vs Linux vs Mac debate for example, there something unsettling about it also. :-)
Quoting from the article below:
"..So as cynicism about many – perhaps most – things rises, so too does our appreciation and affection for what is good and true. Cynicism leads to more tender feelings towards what is truly lovable..."
Honesty, and raw criticism of things that deserve it, is valuable. I think sarcasm, though, is rarely the right tactic.
Sarcasm is, most of the time, the mode of either impotence or petty resentment. If you actually want to affect change or genuinely communicate, it is a poor tool.
Occasionally, yes, it can be a righteous blow of justified revolt. And of course this is how the sarcastic imagine themselves, almost always. But far more often they are just indulging their worst impulses.
It’s pure chaff when we are actually trying to have conversation and not write reddit/twitter gotchas.
In this case, the de-snarked version is a much better response and it leads to a conversation towards clarity. e.g. the author could respond to why the functions were changed.
The original snark doesn’t do that. It just casts judgement instead of seeking clarity. It’s childish and something I would do in my shameful Reddit-arguing days.
There are other choices besides cynicism and sarcasm vs being over-seriously stern. I wouldn't downvote on this basis, but earnest sincerity certainly makes for better conversations — imho.
While I agree with you for in-person communication, sarcasm and cynicism generally don't translate well to a text-based medium. This is why there exists Irony Punctuation[1] such as /s
Sarcasm aside, it is possible for Go to beat C in some scenarios.
The secret is the garbage collector. In high memory allocation applications a GC can be more efficient than direct heap allocations. This is especially true if the C/C++ application is making heavy use of reference counting. A GC will ultimately have lower overhead than managing the reference counting.
All that said, in pure CPU calc scenarios with low allocations, C will almost certainly beat the pants off of go. Go's compiler is... subpar. Yes, it compiles fast, but it does that by ditching optimizations (perhaps this changes with LLVMGo).
> In high memory allocation applications a GC can be more efficient than direct heap allocations.
Anybody who cares about allocation speed in C or C++ (or Rust or <insert low level language>) will use arena / bump / slab allocators. GC is more efficient when you're writing naive code and you don't get to write naive code if you care about speed / memory usage.
I'm not saying it's not possible to reach or surpass C-like speeds with Go, but when comparing C to higher level languages with GC there should be an asterisk next to the statement you're making.
While not always the case, it is often the case that a good GC will beat all of those. A generational copying GC with parallel collection will treat most allocations as pointer bumps. Giving them the speed of a bump allocator with periodic fragmentation fixes when collection happens. It will beat both slab and arena allocators.
> naive code and you don't get to write naive code if you care about speed / memory usage.
I mean, you can certainly always implement a GC in C (most GCs are implemented in C/C++). Which does suggest that anything you can do with a GCed language can ultimately be replicated in a memory managed language.
However, the point of a GC is that naive code IS faster and easier to reason about. You don't have to create and manage elaborate data specific object pools in order to get the same speeds as if you did.
The other thing to consider about GCs is they enable algorithms that are really hard to replicate in memory managed languages. Concurrent algorithms are WAY easier to reason about because you don't have to wrap your head around object lifetime with thread lifetime.
> While not always the case, it is often the case that a good GC will beat all of those. A generational copying GC with parallel collection will treat most allocations as pointer bumps. Giving them the speed of a bump allocator with periodic fragmentation fixes when collection happens. It will beat both slab and arena allocators.
Unfortunately Go's allocator is none of those. It has slow allocation performance because they don't do generational GC.
Very nicely stated. Being able to more easily reason about code, as enabled by GC, allows the kinds of large refactorings and algorithmic optimizations that people working on C codebases tend to shy away from.
That said, I've definitely seen use cases, particularly in games, where memory pools are clearly optimal. And certainly the deterministic nature of such pools has advantages as well.
> While not always the case, it is often the case that a good GC will beat all of those. A generational copying GC with parallel collection will treat most allocations as pointer bumps. Giving them the speed of a bump allocator with periodic fragmentation fixes when collection happens. It will beat both slab and arena allocators.
Wouldn't memory barrier add a significant overhead in this case? (compared to an arena not needing any memory synchronization at all if it's only needed on a single thread)
The short answer is that for java at least (and I suspect other GCs in other languages, but haven't confirmed) there's a thread local allocation buffer which is just the pointer bumps.
But this is hardly like comparing to Java as Go gives you a lot of flexibility in how you organize, allocate and use memory thanks to pointer support and composite value types. Making an Arena allocator is something you could do in Go as well.
But it is hard to manage memory effectively everywhere in C without creating a lot of overhead for the programmer.
Does escape analysis play into it? This could allow the Go compiler to reduce the number of heap allocations (which are not free in either language), which in turn also reduces the load on the GC.
I'm not familiar with the state of go escape analysis. It would play into it, but then you'd also argue that you don't need it in C because you'd simply put that sort of thing on the stack rather than the heap.
GC will win when heap allocations are frequent, short lived, and unavoidable.
> a GC can be more efficient than direct heap allocations
Unoptimised, naive allocations perhaps. I'm old enough to still remember when Microsoft used similar line of reasoning to create artificial cases where C# beat C++ performance.
I would wager that the majority of C/C++ codebases (e.g. internal business code) use naive allocation. In C++ particularly, naive allocation (I might define this as straight up "has-a" ownership-based heap allocations and deletions) is considered best practice by many.
Not only that, shared_ptr is also popular in such codebases, and shared_ptr usually has an additional allocation to store reference count in the "control block".
Given that most C++ code in modern Windows applications is for jungling COM instances, no need for artificial cases when every couple of lines there is a Release() or AddRef() call, with possible OS IPC in the middle.
I think your pushback is good - I didn't think the explanation about multiple cores made sense either. The Go runtime doesn't magically spread things across cores. However, I don't think your sarcasm helps the conversation.
what? Magic would be if Go could run single threaded code in multiple threads automatically and efficiently. The Go keyword is just a thread start for a green thread. There is nothing magical about it.
To be even less charitable, TFAA changed the code from apparently establishing a TLS connection per coordinate (so 1 million connections) to establishing just one and sending a million requests over it.
I’m not surprised this is a 90% perf gain. The issue is not initialising the libc.
I think too many here read the story as some kind of competition between Go and C, when it should be more read as how well the conversion process went. For anyone interested in going down this route, surely this is useful information.
We don’t have unlimited time in this industry and thus if something like this can be done with minimal effort while retaining expected performance , then I would say that we s a good thing.
Stop nitpicking as if the author tried to have final word on which language is faster.