
Groupcache: an alternative to memcached, written in Go - sferik
https://github.com/golang/groupcache#readme
======
joshfraser
For those that didn't catch it, this is from Brad Fitzpatrick, the same guy
who made memcache

~~~
adamnemecek
ASAIK, he wrote the original memcached in perl but what everyone is using now
is a rewrite of memcached in C by Anatoly Vorobey [1]. Not that it diminishes
his efforts or anything, just pointing it out.

[1]
[http://en.wikipedia.org/wiki/Memcached#History](http://en.wikipedia.org/wiki/Memcached#History)

~~~
bradfitz
And then I rewrote it in C++ for App Engine and other users within Google
(Blogger, Plus, hundreds others).

I also then ported the C++ memcached (called memcacheg) to Go, for profiling
Go vs C++, and named that one memcachego. So I've written it about 4 times.

~~~
groby_b
"I've written it 4 times, and I've gotten exceedingly good at it" :)

Kidding aside, anything you learned during the rewrites that caused noticeable
changes to the design, or it's still the same thing, just transliterated?

------
chetanahuja
I'd be curious to hear performance numbers (assuming a reasonable front-end
server to this library). I get it that the replicated in-memory caching part
is valuable. But (from painful experiences with Java) I also fear that a GC
based memory management system is anti-optimal for an in-memory cache of small
objects, especially as the size of heap grows to beyond a couple of GB( * ).

(* )
[http://cdn.parleys.com/p/5148922a0364bc17fc56c60f/GarbageCol...](http://cdn.parleys.com/p/5148922a0364bc17fc56c60f/GarbageCollection.pdf)

~~~
thrownaway2424
Go's garbage collector sucks pretty hard for large heaps. A 16GB heap that's
mostly live objects will take tens of seconds per round, during which no
goroutines will run (it's a stop-the-world collector). Basically it's 15 years
behind Java in this regard.

If you're worried about this, stick with C++.

~~~
jerf
I couldn't find in the code where the objects are actually stored, but
theoretically there's nothing stopping you in Go from allocating yourself up
an array of X bytes (for X in the mega or gigabytes) and completely managing
it yourself, leaving the GC merely for incidental activities, which themselves
can be minimized through various design patterns and some care.

It is true that you should describe Go as a garbage-collected language, but
unlike a lot of other such languages, Go has C-like mutable arrays readily
available. In practice it's more like a sort of hybrid, in that even though
it's garbage collected you have a lot of opportunities to write code that
still doesn't really use it, without having to "drop down to" C or something.
I still hope to see some improvements in it before I could make a big
commitment to it, but I've certainly got my eye on it.

~~~
coldtea
> _I couldn 't find in the code where the objects are actually stored, but
> theoretically there's nothing stopping you in Go from allocating yourself up
> an array of X bytes (for X in the mega or gigabytes) and completely managing
> it yourself,_

Nothing stops you, except: \- The determination NOT to do something the
computer can do \- Previous exposure to a proper, modern GC \- The fact that
you used Go to get away from this kind of shit in the first place

~~~
jerf
I did not mean this is a technique you'd use in general. I meant specifically
in the context of groupcache, where an array of bytes that you manage yourself
is a reasonable approach to the problem.

If you don't want to do that, don't. I'd never start the prototype with that
functionality, for instance. But when you discover that you need it, Go
permits it in a way that Python or Perl do not. Go is GC'ed and you are free
to use that to the extent you want, but it's easier to escape from the GC than
it is in many GC'ed languages. It may not be obvious from a casual reading of
the spec, but there's a lot of ways around garbage in Go. Not like, say, Rust,
and certainly not the same level of Raw Unstoppable Power as C++ (at a
corresponding huge complexity cost), but it's a very interesting middle
ground.

Note that if you were writing groupcache in Java, you'd probably end up doing
the same thing and just allocating an expanse of bytes which you manage
yourself.

------
borlak
I really wouldn't call this an alternative. If you are running memcached, it's
very unlikely you can switch to Groupcache.

Parts of your application may rely on the expiration feature. But the biggest
change is the inability to overwrite a current cache key. Every application
I've used does this constantly (object updates).

Groupcache in its current form is useful for a very narrow set of
applications.

~~~
arscan
I would think that the real limiting factor right now is that it is only
available for Go applications (its a Go library, not a standalone server).
There are comparably very few Go applications in the wild that could use
this... where a memcached server can interface with just about anything. Or am
I missing something?

~~~
drbawb
Groupcache being a library is what made me so excited about it in the first
place.

I'm writing a torrent tracker in Go that could make great use of a distributed
cache. This would enable you to put an arbitrary number of "tracker nodes"
behind a load-balancer.

One of the goals of my project is that its easy to configure: everything
except the RDBMS is hosted inside a single binary and configured with a single
file. In the simplest case: that binary is responsible for two web-servers, a
process-local cache, and a DB connection pool. (In more complex cases, each
binary simply acts as another node behind a load-balancer; and they attempt to
cooperate together.)

It's useful to avoid hitting the disk by having the tracker cache torrents
that are active. When you start adding more tracker nodes, this is where
having a distributed cache would be nice.

Without groupcache, that means I've just forced my users to install and
configure memcached. With groupcache, the configuration and server are simply
part of my application. (As an added bonus, since Go is statically linked, I
haven't even added any external dependencies on shared libraries.)

\---

I agree that the use-cases for groupcache are far more limited in scope, but I
was still glad to hear that this was a library, and not a standalone server. I
think we need _more_ distributed caches implemented as libraries, not less.

There are many use-cases where it may not be preferable to have a separate
caching server. In my career: getting approval to install memcached on a
customer's server could add _weeks_ to our deploy time simply because of red-
tape; in my personal programming endeavors: setting up memcached is just an
added layer of complexity that could easily be avoided by using libraries
instead of servers.

I'd argue that we should leave memcached [more specifically: standalone
distributed caches] for the more complicated scenarios where a dedicated
solution is called for. Bring on the distributed-cache libraries where a much
simpler, ad-hoc solution would be more beneficial to end-users.

------
happyhappy
I want to use this, but since the keys are immutable, how can I store data
like sessions which can change and would sometimes have to be invalidated from
the server side (i.e. you can't simply change the session ID in the cookie and
use a new cache entry, because bad-guy could still be holding on to an old
stolen session ID)?

In general, how can one learn to think in an immutable fashion to effectively
exploit this?

~~~
lucian1900
The simple solution is to always version everything. There's no such thing as
an update, merely a new version of a thing.

~~~
FooBarWidget
So without any mutable storage, how do you verify that the version that the
client requested is the latest version?

~~~
justincormack
If the client has an old link serve an old version. But have links to what
will be future versions and make the client walk them. It is doable but
different. Lots of stuff is static anyway, like CSS, so you can use for this
and have a different process for stuff that varies.

~~~
FooBarWidget
But as the grandparent said, serving the old version may result in a security
vulnerability. There are cases where you MUST serve the latest version, and
the latest version only.

~~~
samnardoni
Then don't use groupcache.

------
reeses
Am I reading this as a distributed, immutable, weak hash table rather than
what one would consider a 'cache'?

Mind you, doing so avoids the hardest parts of caching (and especially
distributed caching, which otherwise begins to underperform around ≥ 5-7
nodes), so I can see significant upside. No surprise stales, distribution
update clogging, etc.

------
willvarfar
I noticed this when he talked about speeding up the Google download servers.
Very interesting :)

Its an alternative to memcache but not a direct replacement. I hope he adds
CAS etc.

I hope they start using the kernel's buffer cache as the backing store, or
explain why its not a good idea:
[http://williamedwardscoder.tumblr.com/post/13363076806/buffc...](http://williamedwardscoder.tumblr.com/post/13363076806/buffcacher)

~~~
fooyc
Apparently there is no CAS by design, this allows groupcache to replicate hot
keys on multiple hosts.

~~~
willvarfar
Yes, but if it was opt-in...

Its nice if you can use groupcache as a memcache replacement even if some use-
cases tie the hands of the implementation e.g. replication of hot items.

~~~
happyhappy
I've been thinking about something similar. I don't see how timed expiration
would conflict with the two most important features - the filling mechanism
and the replication of hot items. Am I missing something that would make timed
expiration impossible?

~~~
willvarfar
Yeah on the first pass of the problem you seem right.

The CAS must have an authoritative node (my mind wanders thinking about
replication and failover) but the key it protects - with the version baked in
- can be replicated surely?

------
buro9
[http://talks.golang.org/2013/oscon-
dl.slide#46](http://talks.golang.org/2013/oscon-dl.slide#46)

> 64 MB max per-node memory usage

So this is best used as a LRU cache of hot items.

It doesn't compete/replace memcache comprehensively, but it does attack the
use of memcache as a relief for hot items.

I can see me mixing my Go programs with both groupcache and memcache.

Edit: I have glanced through the code and cannot see where the 64 MB per-node
limit comes in. Anyone see that?

~~~
nknighthb
It's a configurable limit. 64MB is just what he set the limit to for that
particular group, that's what the "64<<20" in the NewGroup call is for. You
could configure it to use many gigabytes.

~~~
buro9
Ah, that's what I was not seeing when I was glancing through the source.

In that case... for me it fully replaces memcache for how I'm using memcache
today.

------
casperc
How does its sharding by key algorithm work and how does it handle adding new
peers? I was looking for at in the source, but couldn't find anything related
to it.

~~~
dsymonds
You need to implement that part yourself. Look in peers.go. It'll be dependent
on how you discover peers in general.

------
azth
[https://github.com/golang/groupcache/blob/master/sinks.go#L5...](https://github.com/golang/groupcache/blob/master/sinks.go#L59)

Uh oh. The dreaded cast we see here [http://how-
bazaar.blogspot.co.nz/2013/07/stunned-by-go.html](http://how-
bazaar.blogspot.co.nz/2013/07/stunned-by-go.html)

------
justinhj
I'm a bit confused about how you use this system with immutable keys. At face
value it's a great idea, but I need a simple example of how's it is used to
say retrieve a piece of data, then later update that to a new value.

Is this anything like how vector clocks are used, where the client uses the
clocks to figure out which is the right state in a distributed system?

~~~
pjscott
The use-case for which this was originally designed was a download server: the
values were chunks of files, each of which gets its own unique identifier. If
you want to update a file, you simply create a new file for your new version,
and point clients at it instead of the old version.

------
j_baker
I like the idea, but it seems like it would make deployment a pain. How do I
spin up a new server without rebalancing and/or restarting the world? Not to
mention that now when I _do_ need to restart the world, I can't do so without
also clearing my cache.

~~~
somethingnew
[http://en.wikipedia.org/wiki/Consistent_hashing](http://en.wikipedia.org/wiki/Consistent_hashing)

------
RyanZAG
How does this compare to Hazelcast[1]? Seems like the same idea, but far less
features?

[1] [http://www.hazelcast.com/](http://www.hazelcast.com/)

~~~
arohner
Hazelcast is incredibly buggy. Avoid at all costs.

~~~
tacticus
Got any links\talks\papers on this as a dev group at my work just started
using it.

~~~
arohner
Just personal experience, sorry.

* it deadlocks. Often.

* it often doesn't recover from network partitions. Symptoms include both sides of the partition never merging, and deadlocks. Recovering requires rebooting the entire cluster at once. Due to their locking strategy, a rolling reboot of the cluster isn't enough.

*some of the developers didn't seem to demonstrate understanding of races, or distributed systems. When I reported racing bugs, they asked if I could attach reliable unit tests.

------
justinmares
It'd be interesting to compare this alternative to other solutions like Redis.

Has anyone used both/either?

~~~
joeblau
If I were to speculate, I would say that Redis is probably faster.

~~~
throwit1979
TIL that waiting for a network roundtrip is faster than accessing an in-
process hash map directly.

------
otterley
How do you set an item's TTL? How do you set the maximum size of the process's
cache?

~~~
lcampbell
> How do you set an item's TTL?

Values stored are immutable, and expire when they are no longer used. There is
no concept of a TTL.

> How do you set the maximum size of the process's cache?

The cache is broken into a set of named groups, each with their own separate
cache and cache-filling method. The size of each group's cache is set when
created via NewGroup[1].

\--

[1]
[http://godoc.org/github.com/golang/groupcache#NewGroup](http://godoc.org/github.com/golang/groupcache#NewGroup)

