Hacker News new | past | comments | ask | show | jobs | submit login
The State of Caching in Go (dgraph.io)
201 points by boyter 11 days ago | hide | past | web | favorite | 49 comments

Great article (or start of article)!

The GitHub stars obsession is too much, though. I read the request in the main content area, then a nag banner popped up at the bottom of the page requesting it again. How annoyingly desperate and uncool.

Enough already, I'm trying to read!

Or was, anyway..

Plus I'm not even logged in to GitHub at present and cannot without getting to another device for 2FA.

No offense, but now I'll likely never give that star.

Arrr... Sorry about that. That was my idea, so my bad. We will get rid of the pop up banner.

Please do. Same here... read the initial request, but when the lower banner thing popped up also begging for a star my perception of that was negative.

Clicking on the [x] in the corner of the pop up (to get rid of it) then launched a new window (well, new tab in this case).

Didn't read the rest of the post.

Do you mind sharing what browser/OS were you using? Clicking on the [x] shouldn't do that.

(i'm the person who implemented that widget)

On firefox on android, the X opens github

thanks, will look into that.

Thanks matee.

(co-author here) Thanks for sharing this post! We've elaborated the various issues we encountered trying to use a concurrent LRU cache in Dgraph and our dissatisfaction at existing choices which fail to provide memory management, high concurrency and hit ratios.

With guidance from Ben Manes, we're planning to work on a new concurrent Go cache based on Caffeine (Java). If you're interested in helping (or already have something similar), do reach out to us!

Meanwhile, check out Dgraph, our distributed graph database which is what keeps us busy: https://github.com/dgraph-io/dgraph

What are you thinking about for the API signatures? Without beating a dead horse this is where go’s lack of generics really becomes burdensome.

If it were an on disk cache []byte would likely make sense but given it’s in memory I’m not sure. If you use interface{} you’ll need to measure that cost as well.

As mentioned down in this thread, the key-val would be bytes-bytes or string-bytes. While this is more expensive in terms of CPU, it has the advantage that the size of elements in the cache are carefully understood and do not need the cost of GC sweeps.

When de-serialized, in-memory structs take up more memory on heap and it gets hard to compute that -- we've tried that in vain in the past.

Having said that, there's still a general need to store pointers as values and that's something we'll look at once this byte-based implementation is ready.

Our current plan is to store []byte given that badger stores blobs of bytes. Using interface{} may have some overhead, but we are no so worried about that right now.

So the []byte solution requires running through a serielization step which for most in memory uses will be expensive. Did you choose that because in you use case it eventually ends up setialized in any case?


I'm a go novice, and don't have any particular knowledge about dgraph, but is this a case where auto-generated code could be a good solution? (Assuming it's a bounded set of types)

I don't see why it's a problem, caching solutions like Redis also store bytes. If you cache a JSON blob you don't have much choice.

For an in memory cache (ie one not going to disk/network) you have to unnecessarily serialize your in memory representation to bytes. This is very frequently the most expensive (computational) step in a system.

If you are just going to send that stuff to nic/disk eventually anyway its perfectly fine (which is why something like Redis which is an out of memory cache can get away with it).

You might want to take a look at https://github.com/LiveAsynchronousVisualizedArchitecture/si...

It is a completely lock free key value store, though does not have eviction built in.

Its memory management is done using a free list of blocks, which is part of how it maintains lock free concurrency.

Very cool, I have missed Caffeine when making stuff in go!

Cache hit ratio increases with the size of cache and decreases with the number of "hot" items regardless of cache eviction policy - LRU, FIFO, etc. Cache hit ratio reaches 100% when cache size becomes large enough to hold all the "hot" data. It would be great to see https://github.com/VictoriaMetrics/fastcache in the benchmark results. It is faster and it uses less memory comparing to other solutions for caching the same amount of data.

I think it is a well put together analysis, but I feel I'm missing something.

> FreeCache and GroupCache reads are not lock-free and don’t scale after a point (20 concurrent accesses). (lower value is better on y axis)

If you look at the graphs, they level out and seem to work best with more concurrent load, the exact opposite of what they say. Am I missing something?

Also, I think it would be valuable to see higher concurrent requests: with what I'm used to, 60 concurrent requests would be low - I'm interested in maybe 1-5k concurrent requests.

I agree, those graphs were pretty confusing. As far as I can tell, they are not showing the average time each operation takes (ie average elapsed wall time), they must be taking the length of the test and dividing by the number of operations. It would be much easier to parse if they showed operations/second instead.

In other words, 9 concurrent pregnant mothers would show 1 month/baby on those charts.

If I have that right, their comment makes more sense. After 20 concurrent requests, the overall throughput of operations/second does not increase with additional concurrency.

This is true that the latency of each operation won't change much as we increase concurrent accesses, but the amortized latency would (be expected to) reduce as the graphs show. When we did benchmarks, the Go benchmark framework provides us with ns/ops which we directly used to plot the graphs. We could calculate ops/sec too but that would just be another step of calculation using throughput = 1/(time taken/ops), may be more clear to understand though. The graphs would still flatten out after 20 concurrent accesses.

We just updated the graphs to show throughput numbers instead. Should be more clear now.

The post speaks about go not having a thriving ecosystem of packages. My view is that there are actually a lot of Go packages out there but certainly go’s indifference to packages and the fact it is just now getting an “official” package capability (RIP dep) has contributed to slower growth of the ecosystem. NPM has had many many growing pains but I sure do love the ease of use and discoverability of packages. I sincerely hope go catches up. I love the language and tooling.

Go has lots of packages. They are easily discoverable on godoc.org or just some googling.

There is a cultural skepticism about the overuse of packages. There are occasions where the STD lib copied a function instead of adding an import for example. Like many things in Go this pushback is needed but sometimes goes too far.

But maybe single function packages in npm we're a bad idea.

Anyway caching is a mixed bag in my opinion. Local per-thread caching is often better than a global cache as it avoids contention. It's also trivial to implement with a map and requires no special coordination.

It does require rethinking how you design a solution to a problem. FWIW that design process tends to lead you down a better direction anyway for producing distributed systems.

For example one giant, randomly distributed Kafka topic with a global redis db for a cache is probably a lot worse off than a system with more predictable data locality on the consumers.

(co-author here) The number of Go libraries tend to be just slightly below the number of their users (dramatically speaking).

Libraries are hardened by repeated usage by many different users who each bring their own special use cases and improve them to bring them to production quality. Go has no platform where certain well-written libraries can be recommended and get more exposure. Thus, they don't tend to mature more than the specific use case they get written for, provided they are still being maintained.

Arch Linux is a great example of giving well-doing packages more exposure by upgrading an AUR to community to core/extra. That way, more users gather around packages improving them even further.

To the second point about per-thread caching, Go does not expose threads to end-users. So, there's no concept of thread-local. What you're describing results in lock striping, which has contention issues as described in the post.

> Go has no platform where certain well-written libraries can be recommended and get more exposure.

Godoc.org is that platform, it serves the purpose. Others in the module universe are under development.

> To the second point about per-thread caching, Go does not expose threads to end-users. So, there's no concept of thread-local. What you're describing results in lock striping, which has contention issues as described in the post.

In Go you'd have to orchestrate this locality yourself, i.e. a fixed worker (goroutine) pool each with its own cache. This is probably less work than it sounds.

Generally, Go does encourage you to author solutions to your specific problem, rather than adapting a general-purpose library. This is almost as much a part of the ethos of Go as implicitly-satisfied interfaces, or "share memory by communicating", or any of the other proverbs. Maybe Go takes it too far. But effectively no other language exists at this point on the spectrum, and I'm happy that we have at least some representation over here; I think lots of programmers live here, too, and appreciate the tradeoffs.

> Godoc.org is that platform, it serves the purpose.

Godoc.org is great. But (beyond documentation) it is at best a search engine, providing equal platform to all libraries however production ready or broken they might be. Unless I'm missing something, it does not intend to promote certain libraries over others, the same way as AUR -> community -> core works in Arch (which is the model I think is missing in Go).

> In Go you'd have to orchestrate this locality yourself, i.e. a fixed worker (goroutine) pool each with its own cache.

Having many small caches within the same process would result in more misses per key, which if it results in disk accesses would not be ideal or might be worse than contention.

Moreover, being able to spin Goroutines as and when required to branch off a big job into smaller tasks is the beauty and benefit of Go compared to other languages like C++ or Java, where you must start a thread pool upfront and shoot tasks off to it.

>the same way as AUR -> community -> core works in Arch (which is the model I think is missing in Go).

Arch packager here.

It doesn't quite work like that. The distinction between community and extra is largely based on who packages it, it used to be defined as above 5% usage. The distinction between that and core is that core is considered essential to the distribution. No package is going to go from AUR to core without replacing some integral part of the system.

I see. Then I guess there's a different model that can more accurately represent the idea of giving certain well-written/actively-maintained/popular packages a special platform than others to make it more enticing for wider community to adopt those.

The underlying issue is that building a production ready library doesn't just happen -- it needs one or few initial authors and (along with a large group of users) a group of dedicated power users who can then continue to maintain the library and optimize it for their usage, in turn developing it enough to be considered production ready.

> Thus, they don't tend to mature more than the specific use case they get written for, provided they are still being maintained.

This is all a bit abstract. There are plenty of go libraries which are suitably mature for production use. For example Google's cloud libraries or Amazon's AWS libraries. There's a platform for communication of issues and releases: Github. I guess I haven't really had an issue with it... The only problem I've had is the messiness of versioning, but there's been solutions to that for a long time (godep, govendor, glock, etc...)

> To the second point about per-thread caching, Go does not expose threads to end-users. So, there's no concept of thread-local. What you're describing results in lock striping, which has contention issues as described in the post.

Sorry for the confusion. I did not mean thread-local variables. I meant breaking up your workload across mutiple goroutines so that you don't have to use shared state.

For example suppose you receive a message with an org ID. You start 8 workers, all data for (id % 8 == 0) goes to worker 0, (id % 8 == 1) goes to worker 1, and so on.

Each worker maintains a local map of data and no mutex is needed to coordinate access, because only one goroutine can use the data at a time. So there's no lock-striping and no contention issues.

This is the recommended approach for Go concurrency:

> Do not communicate by sharing memory; instead, share memory by communicating.

I understand that this sort of approach doesn't work for every problem, there's not always an easy way to split up the data like this (you might end up with hotspots for example). But I feel like that conclusion needs to come after a bit of reflection, instead of reflexively reaching for a global object + mutex. (Not saying that's what you guys are doing... just a common issue I've seen working on distributed systems with developers new to Go)

Not to nitpick here, but Google's cloud libraries and Amazon's AWS libraries work because there're probably a dedicated team of engineers who're assigned to these projects, considering cloud platforms are supposed to make these giant orgs a ton of money.

When Google needs something, it gets built, corrected and optimized quickly (e.g. Go context library). Open source / crowd sourcing doesn't work that way.

A thread local cache is not going to be as performant esp. w.r.t. large data sets. There is a huge area between global redis db for a cache and thread local cache. At some point you have a large data set that is accessed by many different actors at different intervals. Having each application cache (and re-cache if they go down) is not always sustainable.

My feeling is that there's a good ecosystem in most areas, but that go discourages you pretty hard from producing collection types of your own, so it's particularly underdeveloped in this area.

It’s the lack of generics that makes code reuse hard in Go.

It is true that Go doesn't lend itself to general packages often. You are either left with interface{} parameters and return values (bye bye compile time type check) or working against an interface like the ol' sort package. That said, I've only run up against this a couple of times in multiple years of working in the language, and it so happens that caching is one of them.

Is there a need for an npm or a rubygems style central repository? Seems like it would help the language out, based on your comments.

I'm not very familiar with go but one thing I don't understand is why the caches need to be distributed. I wonder, why not just have one cache per thread?

You're probably imagining an async, thread-per-core model. One cache per thread might be reasonable then (although having more, smaller caches decreases hit rate, so might fail their requirement #5).

Go programs are written in a synchronous, thread-per-request model. You'd end up with _tons_ of small, very cold caches and a miserable hit rate.

You could approximate the former in Go by just having an array of N caches and picking from them randomly. This is similar to their "lock striping" with less contention (no stripe is a hot spot) but a lower hit rate.

This was wonderfully simple and concise. Thank you.

Cache invalidation is pretty hard if you need to clear data from local caches for every thread.

That requires N times more memory, where N is the number of threads. 32x more memory??

Given the Zipf distribution, I wonder if just a small LRU cache per thread might not be a terrible thing?

Juggling caches on databases is a challenging thing, and there has been some back-and-forth on best practices. MySQL for the longest time shipped with a query cache. As of MySQL 5.7.20 it was deprecated, and has now been removed in MySQL 8, largely because it was as likely to hurt you badly as help you, particularly with correctly sized InnoDB buffering.

Zipf is a little idealistic. It is perfect for a quick analysis as the base case that a cache should excel at. Unfortunately LRU doesn't because it can be easily polluted. (examples: https://github.com/ben-manes/caffeine/wiki/Efficiency)

A database will often scan many records which would flush an LRU. Postgres uses small LRU buffer caches in your per-thread model, backed by a larger LRU-like cache. The buffer caches are easily flushed by scans, but protect the shared cache from this noise. That shared cache could probably benefit from a smarter policy and this is an on going topic.

This is assuming that all threads need a global view of the data. When this is not the case, it is sufficient to have many async goroutines ingest on a channel and fill their own local map, lock-free. This is our general strategy for high-performance streaming routines.

Right, fair enough. I only thought of that after I posted. Not used to thinking that memory is limited lol

The author did indicate that they had problems with oom and memory eviction so it probably is a memory limited use case.

> LSB 9-16

What is this? Google thinks it's some sort of metal stamp.

Least significant bits

Makes sense now, thanks!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact