Hacker News new | past | comments | ask | show | jobs | submit login
Small Tables with Go generics (tbray.org)
58 points by metadat on June 28, 2022 | hide | past | favorite | 15 comments



One thing we've discovered is that something like

    ceilings []byte
looks deceptively "small", but if the median number of transitions is really something like 10, you're paying 24 bytes overhead to "save memory". Depending on your priorities, something like

    ceilings [16]byte
    ceilingsExtend *[]byte
will be the same size with better locality and GC interaction for the common case, or even two separate structures with the same interface (above 24 it sounds like you'd probably want to switch to map anyway). Or just make a hard limit of e.g. 24 transitions per.


Cool idea thks.


I feel like this should use the term sentinel value somewhere. Looks neat, though! Would love to see some benchmarks of it in use.


Out of curiousity during the Great "Write The Stupidly Fastest HTTP Router For No Reason" Go craze 3 or 4 years ago, I benched a simple always-worst-case scan over a byte slice (read "array" if you don't know what this in Go) using the obvious for loop (no attempts at SIMD shenanigans or anything) vs. a lookup in a map. The performance cutoff was on the order of around 10 entries where the map finally become faster. Which is actually somewhat impressive for the map. It's pretty easy for a hashing routine to push it higher than that. And that was again for the answer always being on the end of the array.

That was just for scanning a byte slice and seeing what index a match was at. This is a bit more complicated and we're down in "count the cycles" territory, so it may not quite be at the same cutoff as my code, but I'd still expect competitive behavior at 5 or 6 entries at a minimum.

For this use case where you also want to be able to do ranging, I would expect to see that there are places where this is slower for the very complicated checks, but I'd also expect those complicated checks to constitute such a small fraction of the checks that this would not generally be slower. And then at scale, having more stuff in the various CPU caches would be a huge win. This would actually be a challenge to benchmark up and prove because you'd really need to do something less like "route the same thing a billion times and check the time" and more like "load a production config and run a recorded trace of queries against it" because of cache concerns. (A lot of naive benchmark code of the "run this tight loop" variety is excessively optimistic because it benches the code assuming it fits into L1; not a useless number, but not necessarily what you can expect to see in real life.) The performance of the code in the article will most likely look quite a bit better in real life than naive looping benchmarks would suggest, because the naive looping benchmarks are very likely to fail to expose the cache weaknesses of using fully-populated maps everywhere.


> Great "Write The Stupidly Fastest HTTP Router For No Reason" Go craze 3 or 4 years ago

Heh, I remember that craze as peaking in 2016 or 2017.


My brain does not index on time like that at all. If I was only off by 50% for me that's pretty good. I remember events from my past just fine but putting them into a timeline next to world events or a calendar is generally impossible for me. So I accept your correction, as I've learned to do in such cases. :)


> "Write The Stupidly Fastest HTTP Router For No Reason"

Do you have a link? My google-fu fails me.


It was not a single coherent link. It was just a bizarrely-recurring set of posts on /r/golang, if not other places, where basically it just became a meme to benchmark routers against each other and claim it was the fastest, combined with new programmers asking which of the routers they should use because this one claims to be 25% faster than that one, but doesn't have this feature or whatever.

Eventually the community finally got it through its collective head that speeding up your router is really not a terribly cost-effective thing to do unless routing requests really is a big cost in your profile of your real website. Protip: It's not. For those edge cases where I'm wrong, you can start worrying. But first you need a website that can even be optimized to the point that that matters (e.g., if you do a couple of things with a database you've already passed the point where it matters), then you have to optimize everything else to the point your router really is the biggest problem, then finally you have this problem.

Edge case problems are an attractive nuisance. When speaking generally, I can't deny they exist, but when designing a system, worry about the problems you have, not the ones someone else might hypothetically have.


I'm surprised that the map wins around 10, to be honest. How big was the data you were searching? I have vague memories of linear scanning beating binary searches on target large data sets. Can't remember them comparing to hashing, though.


Hashing often has much better constant factors than binary search because it isn't super mean to branch predictors


The spread of data in the backing store usually hits similar penalties for hash tables, though. Especially in benchmark environments.

10, in particular, just feels odd. That many items almost surely fits in a single cache line.

Granted, if you are storing compound data such that each item is a reference elsewhere anyway, I can see this. Speed is likely dominated by the comparison procedures at those sizes.


Remember the "data items" in this case are bytes. Lots of them in a cache line.


Right. Which is why 10 feels an odd number. I thought the benchmark referenced was possibly something else.

That is, an unrolled loop on bytes can be N conditional increments on an index, with a check at the end to see if still in the bounds. Assuming N is small, will be hard to compete with that, honestly. The hash would be spreading the data wider than the linear search would be. Though, I agree I'd expect it to still be on a cache line for bytes.


Problem is that the automaton-matching itself is pretty cheap compared with event preparation and I/O and so on, so it's hard to benchmark - next comment says smart things on the subject.


Right. I'm expecting it to be nuanced with a point towards memory use compared to the naive maps. Also, just fun. I did not mean it as a criticism.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: