
Cache tables - atesti
https://fgiesen.wordpress.com/2019/02/11/cache-tables/
======
antirez
I use this technique often and I love the simplicity it has. If you want to
see a real-world usage, there is one in
[https://github.com/antirez/dump1090](https://github.com/antirez/dump1090):

    
    
        /* Add the specified entry to the cache of recently seen ICAO addresses.
         * Note that we also add a timestamp so that we can make sure that the
         * entry is only valid for MODES_ICAO_CACHE_TTL seconds. */
        void addRecentlySeenICAOAddr(uint32_t addr) {
            uint32_t h = ICAOCacheHashAddress(addr);
            Modes.icao_cache[h*2] = addr;
            Modes.icao_cache[h*2+1] = (uint32_t) time(NULL);
        }
    
        /* Returns 1 if the specified ICAO address was seen in a DF format with
         * proper checksum (not xored with address) no more than * MODES_ICAO_CACHE_TTL
         * seconds ago. Otherwise returns 0. */
        int ICAOAddressWasRecentlySeen(uint32_t addr) {
            uint32_t h = ICAOCacheHashAddress(addr);
            uint32_t a = Modes.icao_cache[h*2];
            uint32_t t = Modes.icao_cache[h*2+1];
    
            return a && a == addr && time(NULL)-t <= MODES_ICAO_CACHE_TTL;
        }
    

As you can see adding a timestamp you can give a TTL to the cached entry
validity without any additional effort: in that case you don't need to purge
the hash table of old entries eventually, because the cache table is fixed
size and the entires will get replaced.

~~~
jnordwick
I don't really see how this applies to the post, or am I missing something?

The linked article is about having multiple entries per open indexed hash
entry, and sizing them to take advantage of cache line sizes.

Your code seems like it basically turns each open index entry into a struct of
hash code and timestamp and then compares the timestamp on lookup. This just
isn't clear because of how you store the timestamp in odd table offsets after
shifting the bottom bit off the hash.

You could take your hash code / timestamp pair and use the articles cache line
packing together really (at the cost of half the line).

~~~
antirez
I think that the main point of the blog post, is that instead of an exact
match in O(N) worst time, you change the algorithm in order to replace the
entry when adding, so that the lookup time also is always O(1) _and the size
is bound_. The cacheline thing is just an optimization in order get the best
out of the single cache miss you incurred. The table offset I use is the
manual version of what you would do with a struct, no differences in the
generated code more or less: the entry and the timestamp are stored at
sequential addresses.

~~~
jnordwick
I can see that a little. It is a good article regardless what you drew from
it. I found it more of a lossy set membership test that used packed cache
lines to help in performance (over chaining), but I don't really know what his
use case was. Good write up none the less.

------
jorangreef
The comparison between the classes of "Las Vegas" and "Monte Carlo" algorithms
is a useful distinction, along with the author's suggestion of using a cache
table as a lookup table for compression match finding.

If anyone is interested, I wrote @ronomon/hash-table as a fast, scalable
cuckoo hash table for Node.js.

It has high load factor, constant lookup time in the worst-case and uses a
non-recursive variant of the cuckoo probing scheme, together with a bloom
filter to reduce cache misses to the alternative cuckoo bucket by an order of
magnitude.

Last but not least, caching is supported according to the CLOCK LRU algorithm,
which is elegant and effective.

Benchmarks and design decisions are in the README:

[https://github.com/ronomon/hash-table](https://github.com/ronomon/hash-table)

------
CoolGuySteve
If the number of searches for each key isn’t uniformly random, then instead of
random replacement, you can bubble sort the bucket values into an LRU eviction
scheme. Each time you do a lookup that succceds and the element in the hash
entry isn’t index 0, swap it with index - 1. Replacement should always be done
on the last element in the bucket vector.

This has 2 advantages:

\- scanning the bucket aggregates to theta(1) if your searches follow the
Pareto law

\- popular keys are less likely to be evicted

Because your buckets are more relevant, you can also make them smaller,
reducing the k time in an O(nk) complexity over n searches and decreasing the
amount of cache thrashing.

~~~
jorangreef
If I understand you correctly, CLOCK LRU works like this, except without the
memcpy swapping cost.

[https://en.wikipedia.org/wiki/Page_replacement_algorithm#Clo...](https://en.wikipedia.org/wiki/Page_replacement_algorithm#Clock)

~~~
CoolGuySteve
In order for the scan to be most efficient in cache, you need to store key-
reference pairs in the buckets, so the swap cost should only be a couple
instructions rather than a memcpy.

------
jnordwick
I've written a couple cache-aware hashes for work (fintech), including an
always correct version of this that rehashed when buckets filled. I always
call them multi-buckets for lack of a better term.

Most data structures that rely on a chain of items can be improved by
bucketing for cache lines (even linked lists).

I've also implemented 32 byte multi-buckets for a cuckoo hash scheme to try
and balance search time and memory usages (you can get extremely high load
factors using this approach). I don't think this approach is any better than a
few other similar multiple-hashing schemes though.

I have tried keeping the hashs together in memory and using SSE compare
instructions to do the comparison/mask of the entire bucket in a couple
instructions, but have found the technique hit/miss.

