Hacker News new | comments | show | ask | jobs | submit login
Go Performance Tales (jmoiron.net)
296 points by signa11 on Apr 17, 2014 | hide | past | web | favorite | 41 comments

I found the bit about Go using AES native instructions to accelerate map key hashing most interesting. This accounted for a >50% throughput increase when he found aws c1.xlarge instances which had AES-NI as compared to those that didn't.¹

This is the kind of detail most developers would not be aware of, and to be fair, even now knowing it exists the only reference I can google up at golang.org is the C source code of runtime/alg.c where you will see…

    23		if(use_aeshash) {
    24			runtime·aeshash(h, s, a);
    25			return;
    26		}
… no hint that it might reduce your hosting costs by 33% or account for some huge variation in performance between one test machine and the next, or even individual runs if you are spinning up cloud instances to do your testing.

¹ Does your CPU have the AES, SSE3 and SSE4.1 cpu capability bits all turned on? If so, you will hash mightily! Do you even know where to look to check?

I had a similar realization recently, where I would occasionally get an ec2 instance that would cause my server to core dump.

Turns out that a library I was using set `gcc -Mnative` by default, which was emitting some instructions which weren't valid everywhere. I was requesting the same virtual hardware, but the physical variations caught me off guard.


I find it interesting how insightful, technical articles like this receive hardly any comments while the usual "softer" articles that tend to dominate the Hacker News frontpage these days receive dozens if not hundreds of comments. I wonder what this says about us, HN readers.

http://en.wikipedia.org/wiki/Parkinson's_law_of_triviality (whence the "bikeshedding" expression comes from, AFAICT)

There's a difference between comments and upvotes. With articles like this (which I really liked), I certainly upvoted it but because it's so technically detailed, I don't have anything useful to add.

Same here. I really enjoyed reading it, but I don't understand a lot of what it is talking about. I also bookmarked it. When the post had 0 comments I was tempted to post something to the effect of what I just said, but I figured it didn't really add any value.

That the audience is diverse, but common “softer” articles appeal to a wider demographic.

While I'd love to see more of that thing, I don't really think of HN as a technical venue. More like a virtual pub frequented by a variety of folks who participate in the VC ecosystem in one way or another.

Is there a better technical venue?

Hint: it probably isn't /r/programming :)

I'm not aware of one that's broadly-based though I'd love to find it. Slashdot used to be, although I haven't frequented it in many years.

https://lobste.rs has some good content, although the bulk is already on HN.

This is common on every aggregator: most content is not selected for depth or insightfulness, but rather ease of consumption. The extreme of this is Reddit's proliferation of images and memes.

To be fair, HN doesn't feel like it has a hacker spirit to it. It's too preoccupied with business crap and self-promotion.

Maybe that's a good thing.

The softer the topic, the more room for subjectivity. The more room for subjectivity, the more room for controversy. Mountain long threads tend to be highly emotional/opinionated bickering that doesn't go anywhere, because the topic is so subjective that people can easily stick to their own biased worldview, reiterating the same old talking points.

case in point: this branch of comment tree, and your post in particular.

oh, you.

Just a note on the zlib optimization patches. The blog post is linking to an old version, there's a newer one from a month ago. Also, the patch still appears to be a bit buggy (e.g. corrupt output being generated by the new deflate strategy), so don't plan on actually deploying it.

If you want more details on Go profiling, this Go Lang blog post is a great place to look


I've been looking into false sharing and other pitfalls of parallelism on multi-core systems.


It seems like our multi-core hardware isn't that well thought out yet. In particular, locks can cause false sharing. Even go channels are based on locks!

I would love to see some kind of annotation in golang for padding certain structs to cache-line boundaries. This value can be read from most processors, so it could be done in a cross-platform fashion. The gringo ring-channel library has to do its own padding to avoid false sharing.


    type Gringo struct {
        padding1 [8]uint64
        lastCommittedIndex uint64
        padding2 [8]uint64
        nextFreeIndex uint64
        padding3 [8]uint64
        readerIndex uint64
        padding4 [8]uint64
        contents [queueSize]Payload
        padding5 [8]uint64
There was a similar technique for padding Java objects, but it now turns out that the JVM is smart enough to optimize the padding out!

Interesting point. I doubt go would let something as low level like struct padding bubble up into the language though, perhaps there could be some kind of library level "Unsafe" field annotation, though I'd prefer the compiler to optimize this automatically.

My first thought is that it's going to be pretty challenging for a compiler to optimize this without runtime information.

Here's a good article on solving false sharing in Java with cache-line padding:


The biggest performance issues I think that some people run into with Go involve reflection which seems to be slow. Something that does a lot of JSON parsing for example maybe could be much slower in Go than in Java, Python or JavaScript I think. I don't have any data, but I've known people to complain about it that work with Go. I wonder if a JIT or AOT compiler might help.

There are libraries that generate specialized marshallers to fix precisely this issue: https://journal.paul.querna.org/articles/2014/03/31/ffjson-f...

That is an interesting project but won't yet help to speed up the (very slow) JSON unmarshalling.

Indeed, that's on the TODO list at the bottom.

I have also played around trying to achieve a high performance trie in Go.[1] My approach is to use the Ternary Search Trie structure. Unfortunately, I have not yet approached either the performance of the native hash table or a hash table in Go (although Sedgewick tells us you should be able to beat a hash table). My TST does not yet have the Patricia Trie optimization (of collapsing long internal runs). Perhaps with that addition it will get closer to hash table performance.

Also everything he said about channels also holds true in my experience. I haven't tried writing a C library for Go yet but his discovery is pretty interesting for when I dive into that.

[1] : https://github.com/timtadh/data-structures/blob/master/trie/...

Good article. It closely reflects the experience we've had at SoundCloud in our more heavily-stressed services.

do you still use ruby and how?

Sure, the monolithic app at the center is still Rails. We're working to deprecate it and go all SOA/microservice, but many of those services are JRuby.

tl;dr still very polyglot here.

> All of our existing indexes were based on string data which had associated integer IDs in our backend

You already have a perfect hash function :).

The IDs might be sparse though.

Sparse is fine if they are more-or-less randomly distributed.

I was mostly aiming for unique -> no hash collisions.

This is probably a stupid question, but I wonder if the author could have used slices instead of maps with integer keys. It would have used more memory, but it would probably also have been significantly faster. A significant proportion of the performance issues I see raised on the Go mailing list seem to involve maps.

Probably no. Author says that he uses IDs as keys. I can assume that they are 32 or 64 bit long. So even if they are 32 bits, you would need a 4GB slice to hold any potential key. Or make a dynamic array (well, slice) that would give even more overhead than a map (and would closely resemble what map does internally better anyways). So only if those keys were really small, that would be a possible alternative.

If 'select min(keycolumn) from footable' is relatively stable and keys are created ascending (databases do this, right?), a resizing slice (with 0-min() elided from the bottom) seems like a good alternative to a hashing structure.

If the key-space is relatively sparse, this will waste more memory than the hash, of course.

Do they have to use zlib? For on the wire compression LZ4 is better, for archiving lzma2 or zpaq is better.

> Using a map[int]* Metric instead of a map[string]struct{} would give us that integer key we knew would be faster while keeping access to the strings we needed for the indexes. Indeed, it was much faster: the overall throughput doubled.

I'm a little sceptical of this -- type assertions are fast but it's an extra step to initializing a struct. It would have been nice to see tests done comparing map[string]struct{} to map[int]struct{} and comparing map[string]* Metric to map[int]* Metric.

Also, there is no way to escape an asterisk, so I apologize for the awkward space after each one.

My understanding of this section was, that in * Metric they keep the original string ID and struct{}. If they went straight to map[int] struct{}, they would lose the string ID.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact