looks deceptively "small", but if the median number of transitions is really something like 10, you're paying 24 bytes overhead to "save memory". Depending on your priorities, something like
ceilings [16]byte
ceilingsExtend *[]byte
will be the same size with better locality and GC interaction for the common case, or even two separate structures with the same interface (above 24 it sounds like you'd probably want to switch to map anyway). Or just make a hard limit of e.g. 24 transitions per.
Out of curiousity during the Great "Write The Stupidly Fastest HTTP Router For No Reason" Go craze 3 or 4 years ago, I benched a simple always-worst-case scan over a byte slice (read "array" if you don't know what this in Go) using the obvious for loop (no attempts at SIMD shenanigans or anything) vs. a lookup in a map. The performance cutoff was on the order of around 10 entries where the map finally become faster. Which is actually somewhat impressive for the map. It's pretty easy for a hashing routine to push it higher than that. And that was again for the answer always being on the end of the array.
That was just for scanning a byte slice and seeing what index a match was at. This is a bit more complicated and we're down in "count the cycles" territory, so it may not quite be at the same cutoff as my code, but I'd still expect competitive behavior at 5 or 6 entries at a minimum.
For this use case where you also want to be able to do ranging, I would expect to see that there are places where this is slower for the very complicated checks, but I'd also expect those complicated checks to constitute such a small fraction of the checks that this would not generally be slower. And then at scale, having more stuff in the various CPU caches would be a huge win. This would actually be a challenge to benchmark up and prove because you'd really need to do something less like "route the same thing a billion times and check the time" and more like "load a production config and run a recorded trace of queries against it" because of cache concerns. (A lot of naive benchmark code of the "run this tight loop" variety is excessively optimistic because it benches the code assuming it fits into L1; not a useless number, but not necessarily what you can expect to see in real life.) The performance of the code in the article will most likely look quite a bit better in real life than naive looping benchmarks would suggest, because the naive looping benchmarks are very likely to fail to expose the cache weaknesses of using fully-populated maps everywhere.
My brain does not index on time like that at all. If I was only off by 50% for me that's pretty good. I remember events from my past just fine but putting them into a timeline next to world events or a calendar is generally impossible for me. So I accept your correction, as I've learned to do in such cases. :)
It was not a single coherent link. It was just a bizarrely-recurring set of posts on /r/golang, if not other places, where basically it just became a meme to benchmark routers against each other and claim it was the fastest, combined with new programmers asking which of the routers they should use because this one claims to be 25% faster than that one, but doesn't have this feature or whatever.
Eventually the community finally got it through its collective head that speeding up your router is really not a terribly cost-effective thing to do unless routing requests really is a big cost in your profile of your real website. Protip: It's not. For those edge cases where I'm wrong, you can start worrying. But first you need a website that can even be optimized to the point that that matters (e.g., if you do a couple of things with a database you've already passed the point where it matters), then you have to optimize everything else to the point your router really is the biggest problem, then finally you have this problem.
Edge case problems are an attractive nuisance. When speaking generally, I can't deny they exist, but when designing a system, worry about the problems you have, not the ones someone else might hypothetically have.
I'm surprised that the map wins around 10, to be honest. How big was the data you were searching? I have vague memories of linear scanning beating binary searches on target large data sets. Can't remember them comparing to hashing, though.
The spread of data in the backing store usually hits similar penalties for hash tables, though. Especially in benchmark environments.
10, in particular, just feels odd. That many items almost surely fits in a single cache line.
Granted, if you are storing compound data such that each item is a reference elsewhere anyway, I can see this. Speed is likely dominated by the comparison procedures at those sizes.
Right. Which is why 10 feels an odd number. I thought the benchmark referenced was possibly something else.
That is, an unrolled loop on bytes can be N conditional increments on an index, with a check at the end to see if still in the bounds. Assuming N is small, will be hard to compete with that, honestly. The hash would be spreading the data wider than the linear search would be. Though, I agree I'd expect it to still be on a cache line for bytes.
Problem is that the automaton-matching itself is pretty cheap compared with event preparation and I/O and so on, so it's hard to benchmark - next comment says smart things on the subject.