
Ristretto: A High Performance, Concurrent, Memory-Bound Go Cache - ngaut
https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/
======
sethammons
This is a seriously great write up. So many links to research and papers.
There were some truly new-to-me concepts and they implemented some stuff that
are optimizations that I've not had to deal with yet despite the scale with
which I work; it is fun to read about those. Seriously awesome work -- I would
love to optimize as much as they have. Two things stand out. One, they could
maybe improve thrashing at full capacity by evicting more than one key when at
capacity (say, evict n% of total keys). Two, my reading here suggests this is
not production ready and dangerous to use.

> ... then used this hash as not only the stored key but also to figure out
> the shard the key should go into. This does introduce a chance of key
> collision, that’s something we plan to deal with later.

If a Get and Set can produce a collision and it is not dealt with, then it is
unsafe for production. Imagine calling user foo's data and returning user
bar's data instead. This could leak PII and would be an unnoticed bug.
Generally, in collisions, you drop down to a list and hurt your runtime to
ensure accuracy. I need a correct cache over a fast cache.

Unless I misunderstood, this is not production ready. Again, amazing work; it
is just not done yet.

~~~
mrjn
We will fix that in the near future. Known problem with our design choice.
Wanted to ensure we put it out there first, before solving it. FWIW, note that
the probability of a collision with 64 bit hash is pretty low. Still, needs to
be solved.

~~~
grumpydba
Could this be solved by some sort of cuckoo hashing?

~~~
mrjn
The benefit of current hashing algorithm is that Go team is maintaining it
;-). Its written in assembly and runs in single digit nanoseconds.

One way to check for collision would be to either store the key as well or use
another hashing algorithm (which would further decrease the probability of
collision, but not eliminate it). Either would introduce some slowness,
because every Get would need to be checked.

~~~
sethammons
Storing the key with the value is a solid workaround.

------
stephen
Wow. That is a great write-up. I was admittedly skeptical of "writing your own
cache" but these guys are super-OCD and obviously capable of producing best-
in-class implementations. Kudos.

...kind of wondering, they had so at least three "well, go doesn't do ..."
(lock-less data structures, thread local storage, and channels aren't high
perf) rabbit holes that I wonder if go was really the right implementation
language for their project (iirc they've also had articles go by about writing
best-in-class go versions of rocksdb/etc.).

But, they do seem very capable of rolling their own infra, and I suppose
someone has to do it to bootstrap ecosystems in each new/upcoming language.

~~~
karlmcguire
Thank you for the kind words!

You're right about our rabbit holes. Our initial "why" for this project was
essentially: "Caffeine is written in Java with excellent performance. Go
should be (and usually is) competitive with Java. Why is there no Go
equivalent?"

With such a high bar we were disappointed in naive, idiomatic Go solutions,
but we knew there had to be some way to close the gaps.

I look forward to continuing to do so. We have a few things on the roadmap
such as adaptivity (see Ben Manes' work for more info) and managing our own
memory.

------
twotwotwo
TinyLFU is kind of magical--most of the win you'd get from expensive tracking
of key frequencies, in much less space. The bloom filter in front seems like a
clever hack.

Not an expectation/criticism/etc. of Ristretto, but out of curiosity, does
anybody know of work about making the cache aware of the expected cost of a
cache miss for each key?

To spell out what I mean: if I'm sending a query result to the cache, I could
also pass along how long the query took to run. Then, in theory, the cache
could try to drop keys with low (expected number of fetches * expected cost
per miss), not just low (expected number of fetches). Of course that's
rough/imperfect, and for some caching scenarios the miss cost doesn't vary by
key.

I guess mostly the question is if it's enough of a help to justify the
metadata+complexity, which I guess requires a trace of gets/sets from
something that has miss costs that vary.

The intuition is just that now that we're pretty good at optimizing hit rate,
maybe expanding the problem to "minimize total miss cost" could be an
interesting direction for the use cases where there's a difference.

~~~
NovaX
There is not a lot of research in this area and I briefly outlined my
understanding of the current state when responding to a github issue:

[https://github.com/ben-
manes/caffeine/issues/345#issuecommen...](https://github.com/ben-
manes/caffeine/issues/345#issuecomment-530571533)

~~~
twotwotwo
Oh, fascinating. My intuition re: size was that you'd look at worth _per_ byte
for something like query results, a whole different kind of thing from the CDN
use case that's been studied.

Makes me wish I could justify adding instrumentation to our 'memoize' wrappers
at work, although our caching doesn't really match the use case for something
like Ristretto or Caffeine. (Little enough stuff and the latency of a
networked cache is fine for us, so "just set aside lots of RAM on some cache
servers" works well enough.)

------
aloknnikhil
Good work! Caching is one of the 2 hardest problems in Computer science. Not
because building a cache is hard, but cache invalidation/replacement is
complex when you consider the system as a whole. Integrating the cache and
preventing user from seeing the stale value is not a trivial task (if you want
to support highly concurrent access). MySQL removed query caching from 8.0
because of the mess that query caches are. Enforcing transaction isolation
levels in a database with caching is non-trivial.

~~~
mrjn
Exactly. Doing caching in a transactional MVCC system is hard. We plan to do
careful data caching in Dgraph with this. Block level cache in Badger would be
easy. So would be key with version level cache in Badger.

------
boulos
Disclaimer: I admit to only reading the blog post and not looking at the code
yet.

I really enjoyed this write-up as well! But I couldn't tell from the post:
does it have a distributed mode like groupcache? (I think the answer is no). I
think nearly all decisions in groupcache are because of the word "group" :).

Similarly, it would be great if there's a "safe mode" instead of the (IIUC)
"start dropping Set requests when overloaded". You'd naturally want most
applications to be prepared for your cache to get blown away, so you should be
prepared for it to be lossy, but I'd certainly want a knob for that :).

~~~
karlmcguire
Glad you liked it!

> does it have a distributed mode like groupcache?

Not at the moment, no. Though it could, theoretically, be used within
groupcache. We'll look into that as it has been brought up a few times now.

> it would be great if there's a "safe mode" instead of the (IIUC) "start
> dropping Set requests when overloaded".

We could definitely look into adding a config flag for that. I'd love to know:
in what situations would lossless Sets be important to you? We already have
guaranteed updates, so the only Sets that are dropped are new items.

~~~
mrjn
Also, if we guarantee that every Set is accepted, it goes against the idea
that the Estimate of the incoming key must be higher than the Estimate of the
key getting evicted.

Of course, one way to ensure every key goes in is to have a cache of infinite
MaxCost. But, that’s an orthogonal idea, I guess.

------
ArtWomb
Kudos on the informative blogs. It's always better when engineering teams
share their knowledge ;)

In setting up the cache, can I assign values to be arbitrary length, binary
safe strings? Can this be used to store indexed blob data of large size? Not
saying this is a good idea btw...

~~~
mrjn
Values can be arbitrary length, yes. We plan to store SSTable blocks from
Badger in Ristretto. They can be strings, byte slices, whatever. That's the
reason we chose interface{} as the input.

~~~
ArtWomb
Excellent! thnx so much for building

------
marktangotango
Does anyone know of a similar cache that lets you restrict memory usage by
“customer” for example? I used redis scripting to implement this use case in
redis, it really feels like a hack though.

~~~
hinkley
Why are you sharing caches between customers?

~~~
marktangotango
Why would anyone do that?

[https://en.wikipedia.org/wiki/Principle_of_charity](https://en.wikipedia.org/wiki/Principle_of_charity)

------
KAdot
> we used the natural randomness provided by Go map iteration to pick a sample
> of keys and loop over them to find a key

The iteration order of Go maps is randomized, but it's far from being evenly
distributed (it could be good enough for the Ristretto use case though). Here
is a test that shows that some keys are 6 times more likely to be selected on
the first iteration
[https://play.golang.org/p/dT1CWuqoHEM](https://play.golang.org/p/dT1CWuqoHEM).

~~~
mrjn
Based on this behavior, the two scenarios for Ristretto would be:

1\. If the chosen sample keys have high Estimate, the incoming keys could be
rejected, affecting the Hit ratios. However, our Hit ratio benchmarks didn't
show such behavior (within 1% of exact LFU).

> With this approach, the hit ratios are within 1% of the exact LFU policies
> for a variety of workloads.

2\. The chosen keys have low Estimates, in which case they'd be removed and
hence, won't suffer from repeated selection.

So, yeah. Doesn't affect Ristretto. But, good thing to keep in mind for other
workloads.

------
kasey_junk
This is really nice. The sync.Pool based ring buffer is a really clever way to
get around the thing that always kills my low latency work in golang.

I wonder how much the type casting costs you’ll see in real world situations?
I’ve not benchmarked that generally but it’s always something I worry about
with ‘generic’ data structures like this.

~~~
karlmcguire
Thank you. The sync.Pool lossy ring buffers are probably my favorite
innovation, especially because of how little code it is.

I was worried about type casting as well and generally avoided bare interfaces
like the plague. However, in all the benchmarks I've ran for throughput, type
casting was a small percentage of CPU time. I'll see if I can dig up a
flamegraph.

Of course, the only reason sync.Pool performs so well is because of its
internal usage of thread local storage. If the Go team exposed that, we could
make our own typed sync.Pool implementations... I won't hold my breath,
though.

------
ysw0
Great write up and tool, can't wait to try it out in my projects. Enjoyed your
last write up as well!

------
jeffomatic
Question about the buffered writes and invalidation: is there anything that
prevents race conditions between a Del() and a prior Set() on the same key?
Just glancing at the source, it looks Set() ends up in a channel send, whereas
Del() seems to synchronously update the internal map.

~~~
mrjn
Feel free to file an issue. That could cause correctness issue in Ristretto as
of today.

~~~
mrjn
This issue would fixed with this PR: [https://github.com/dgraph-
io/ristretto/pull/62](https://github.com/dgraph-io/ristretto/pull/62)

------
wiradikusuma
Go is usually used for microservices, and for that scenario, Ristretto is not
applicable since it's a library, and not a "shared server" like Redis. Am I
right?

~~~
wrnr
Very much to the contrary, why go over the network to get a value from a key-
value store when you can have a mutex protected map that only hods the values
that where accessed by that node. The "lets have a shared memory between our
services" architecture makes things easy only to make it much harder again.

~~~
hinkley
I’ve been trying to convince my coworkers that we should have a cache per
node, but so far they aren’t biting.

Alternatively I think within the decade we are going to expect every service
to contain its own fault prevention logic instead of just the web tier. At
which point every service has its own reverse caching proxy in front of it.

------
cgag
Wow, I was just researching go caches like a week ago and read your previous
summary of the state of the world. Didn't realize this had been posted.
Excellent timing.

------
uvtc
Oh, same name as the Xfce image viewer:
<[https://docs.xfce.org/apps/ristretto/start>](https://docs.xfce.org/apps/ristretto/start>)

~~~
reaperhulk
Name collisions are unfortunate but inevitable. This name makes me think of
ristretto255 ([https://ristretto.group/](https://ristretto.group/)), but
that’s the world I live in. I suspect this project is likely to see more name
recognition over time than either of our examples.

~~~
baby
indeed, I thought this was a post about the crypto stuff :c

