
Writing a very fast cache service with millions of entries in Go - janisz
http://allegro.tech/2016/03/writing-fast-cache-service-in-go.html
======
pcwalton
From the article:

> [Go] also has managed memory, so it looks safer and easier to use than
> C/C++.

But most of the post describes a sophisticated way to work around the garbage
collector, totally reliant on a specific implementation detail of the current
Go GC (skipping of pointer-free data types), documented in a GitHub issue. It
seems easier to not have, or to not use, the GC in the first place for this
specific project if most of the engineering effort is going to go into an
elaborate workaround for it. In particular, I don't understand the reason for
rejecting offheap: the article suggests "a cache which relied on those
functions would need to be implemented", but surely this is less complex to
implement than the cache that relies on hiding what are effectively pointers
behind offsets so the GC won't think to scan them.

(This isn't a suggestion to not use Golang at all, to be clear. Nor do I mean
to suggest that GCs are bad things in general.)

~~~
vog
Maybe this type of program is better suited for a language like Rust.

However, while not having a GC, you have to take care of all memory issues in
a way that you convince the compiler that your won't ever blow up.

It would be very interesting to have such a comparison, so we could see
whether it's easier to work around the GC, or easier to write bullet-proof
code with manual memory management.

I'd expect the Rust to be more robust with regard to performance, but the
question is: Would it also pay off in term of code complexity?

~~~
mhd
I'm not sure how the current implementation of Go handles it, but its
spiritual relatives Modula-3/Oberon handled this quite well, with a GC for
most occasions and ways to bypass this with "unsafe" modules that allowed for
untracked allocations/deallocations and pointer arithmetic. It's not really an
either/or situation by (language) definition...

~~~
stcredzero
I'm puzzled as to why all mature GC doesn't have "permspace."

~~~
schmichael
Go's GC is non-moving and therefore cannot be generational.

[http://llvm.cc/t/go-1-4-garbage-collection-plan-and-
roadmap-...](http://llvm.cc/t/go-1-4-garbage-collection-plan-and-roadmap-
golang-org/33)

~~~
pcwalton
Nitpick: you don't need moving GC for generational GC. But you lose most (but
not all) of the benefit of generational GC without moving GC.

------
endymi0n
I still don’t get why people try writing their own data store, especially in a
language that's simply not very well suited to that task (and we're an almost
100% Golang shop here). Seems to be a rite of passage.

The requirements are literally <100LOC of wrapping an HTTP interface around
Redis with using TTL - which has been battle tested for years and is both rock
solid and ridiculously fast.

Public service announcement: Don't write your own data store. Repeat after me:
Don't write your own data store, except if you want to experimentally find out
how to build data stores. It's arbitrarily hard and gets even more so at every
layer. Plus, you leave behind an unmaintainable mess for the people after you.
There's already a great OSS data store optimized for every use case and
storage medium I could possible imagine.

~~~
im_down_w_otp
There are two datastores I would really like that don't exist.

1) An efficient, high-performance distributed persistence store for
arbitrarily large CRDT's.

2) A Kafka-like highly-available distributed binary log w/ cheap topics, that
doesn't require external coordination, and doesn't lose acknowledged writes
(which I'll happily give up any shape of linearization guarantee for).

~~~
chris_va
(1) Spanner does exist, though not really available outside of Google

~~~
im_down_w_otp
Yes, Spanner does exist. Spanner also does not satisfy what I said in #1.

~~~
QuercusMax
Can you elaborate on this?

~~~
im_down_w_otp
Well, for starters Google Spanner has almost nothing to do with CRDTs.

For another it would be really weird for Spanner to expose something CRDT
shaped since Spanner is a strongly-consistent distributed database which
depends heavily on global order and read/write locks driven by meticulously
coordinated multi-site global wall-clock time and consensus groups.

In some ways it's almost the opposite of what I want... with exception of the
fact that it's also distributed. I want extreme availability with lazy, weak
coordination.

~~~
chris_va
Read/write locks...

No, it's slightly more lazy in that that. It does timestamp coordination, but
retrospectively.

~~~
im_down_w_otp
You mean except for the lock table that's maintained by every paxos leader?

------
matthewmacleod
I wonder if I'm missing the point here, but why are Redis or
Memcached—implementations of exactly this service which are battle-tested and
well-used—not suitable due to "additional time needed on the network", but
this service _is_ suitable? Is it just down to the requirement for a HTTP API?

One thing I've noticed is an extreme demand for making internal services
available over HTTP. It has it's benefits, but the obvious downsides are the
overhead and complexity of HTTP being totally overkill for things like a key-
value store of this nature.

~~~
jzwinck
I had to read the relevant sections three times to sort this out. I believe
what they are saying is that this service they made had to speak HTTP in some
specific way, and since Redis doesn't speak that way directly they would have
needed to proxy requests in their service to Redis which would mean one
network hop to reach their HTTP service plus one hop to reach Redis.

Of course, Redis and Memcached support Unix domain sockets which do not use
the network and do not suffer from the overhead of TCP. The authors do not
address this at all, suggesting they weren't aware of UDS, nor the fact that
TCP within one Linux host does not touch the network at all.

Adding to the confusion is that even given an in-memory cache library which
overcame their objections to the "network" based ones, they still elected to
write their own low-level cache.

So the comment about time needed on the network was either spurious or
misinformed. And one thing it was not: measured.

~~~
bluetech
Quoting redis docs
([http://redis.io/topics/benchmarks](http://redis.io/topics/benchmarks)):

> When the server and client benchmark programs run on the same box, both the
> TCP/IP loopback and unix domain sockets can be used. Depending on the
> platform, unix domain sockets can achieve around 50% more throughput than
> the TCP/IP loopback (on Linux for instance). The default behavior of redis-
> benchmark is to use the TCP/IP loopback.

Haven't measured myself though.

------
Rezo
Redis eats million of entries for breakfast, is pleasant to work with, has TTL
expiration of keys built in and is available as a managed AWS ElastiCache
service when you get into serious data sizes: up to 32 core 237 GiB nodes, and
then you shard to add more. Redis is also super as a local cache, and simple
to deploy and manage together with the app that uses it.

Since you obviously ran some quick benchmarks and concluded that running it
locally over a unix socket (confused why you would mention "time needed on the
network"... you tested with local sockets, right?) was too slow, you should at
least let Antirez know you've run into a new mysterious performance bug ;)
Writing a cache service can be a fun side project, but I doubt you gained
anything by doing so except another homegrown part to maintain.

~~~
krenoten
redis is single threaded, so 32 cores doesn't mean much without sharding.

I generally prefer memcache unless you have a super locked-down infrastructure
(no engineers to deploy a KEYS operation that destroys a shard and all the
systems that rely on the data inside until it's finished). Multithreaded +
simpler API is great for multitenancy when you have to provide infrastructure
to engineers who don't want to learn about infrastructure.

------
Artemis2
Why did they have to invent "a very fast cache service with millions of
entries"? Are they the first company to ever need one so they had to write it?
Can't Redis or Memcached be fast (despite the "additional time needed on the
network" — even though Redis uses raw TCP for transactions, while this uses
HTTP + JSON)?

As others pointed out, Go is also a very poor choice if you need to work
around every part of the language.

~~~
mixedbit
Probably because they wanted full control and understanding of the source
code. Large companies often prefer to develop things in-house, even if there
already exist good alternatives.

~~~
nl
_Large companies often prefer to develop things in-house, even if there
already exist good alternatives._

Fair point. Except they are a small consulting company.

~~~
serafin_allegro
Nope. We are not :) We are e-commerce platform - part of Naspers Group:
[http://www.naspers.com/page.html?pageID=3](http://www.naspers.com/page.html?pageID=3)

------
sulam
Sorry, I stopped reading at 10K rps and p50 of 5ms. In this day and age these
numbers are pretty bad, especially for a cache where presumably all accesses
are constant time. Every single caching solution listed out-performs this
handily.

~~~
delroth
I had the same feeling when reading this article. That's nothing like "high
performance", especially the 5ms median latency. memcached is a whole two
orders of magnitude lower in terms of median latency.

------
inglor
> Considering the first point we decided to give up external caches like
> Redis, Memcached or Couchbase mainly because of additional time needed on
> the network.

Uh... we regularly get performance better than what the OP has described in
their "needs" with Redis. It can run in memory - on the same machine and has
plenty of HTTP frontends you can work on (not a lot of networking).

------
roel_v
So essentially, to meet their requirements, they had to work around the Go
garbage collector and use a non-standard HTTP server and JSON parser. Why not
just write it in C++?

~~~
smt88
> _Why not just write it in C++?_

No need. Someone already wrote Redis.

~~~
iveqy
Redis is written in C, afaik.

------
st0p
Why not use Varnish? POST messages of 500 bytes could easily be rewritten /
proxy'd to GET requests. That might not be 100% restfull but seems like a lot
less work. On our production environment Varnish always responds in less then
2 ms. Even on my development VM I never see response times > 5 ms. It has all
the other requirements they state.

Perhaps I'm prejudiced because Varnish has proven to be such an awesome
caching mechanism (we still use Redis as a key/value store), but this seems
like NIH.

~~~
leetrout
Varnish is actually difficult to use for POST requests unless you know C and
can tweak the existing vmods. I just came off the same basic requirements and
ended up going with Nginx (OpenResty) with Lua & Redis.

------
osweiller
As a user and advocate of Go (in my case primarily as glue code that is
beneficial because it's easy and efficient to get to results), articles like
this do the platform a disservice.

This implementation is far from fast (two magnitudes better performance and it
would be credible as "very fast"), and it is non-idiomatic, specifically doing
things to avoid the benefits of Go.

As an aside -- HTTP and serialization are both costly. In many, many cases
where I've seen them in effect, they were a significant expense for little to
no architectural gain.

~~~
nly
> HTTP and serialization are both costly.

But can be done very fast (introducing very little latency) and are essential
pure operations, so can be parallelized very well (for throughput). Of course,
this doesn't account for dev costs, and doesn't make your architectural point
invalid.

------
golergka
Straightforward, simple functionality, extremere perfomance requirements and
avertion to allocations and GC?

That sounds like model use case for good old C. I love Go and currently am
actively learning it, but why take a memory-safe, GC language and then build
your own ad-hoc memory management on top of it to avoid it?

~~~
fleitz
This is probably why the creators of memcached and redis chose C.

------
c0l0
I admit I only skimmed the article, but I think the tradeoff of parting with
the requirement of having an HTTP API with JSON as the message format for
transporting IDs over the wire, and just using Redis instead and its wire
protocol, would probably have been the more time-efficient way to arrive at an
(at least) acceptable solution for the problem they were looking to solve. But
yeah, where's the NIH in that? ;)

------
iagooar
I wonder if your team considered leveraging Nginx with some good caching
policy, for most use cases it should already have everything you need and
it'll probably be tough to beat Nginx's performance.

------
NicoJuicy
I found the mention of ffjson interesting, a faster serializer then the
standard buildin json serializer ( 2x - 3x as fast) ==>
[https://github.com/pquerna/ffjson](https://github.com/pquerna/ffjson)

~~~
LeonidBugaev
You should check
[https://github.com/buger/jsonparser](https://github.com/buger/jsonparser) and
[https://github.com/mailru/easyjson](https://github.com/mailru/easyjson) as
well. They are even more faster.

------
barrkel
Manual memory management isn't an esoteric technology. If that's what's
required to reduce GC overhead, it's an acceptable choice, especially when
it's in a fairly simple system like a cache (a hash table with an eviction
policy).

------
HereBeBeasties
If you think "very fast" is 5ms, over a LAN, then truly I despair.

I mean, if they've paid you to write this, I presume you have reasonable
hardware to run it on. Bog standard BSD sockets TCP on Linux is down around
the 10 microsecond range now. What on earth are you doing with the other
4.99ms?

------
JulianMorrison
"basically everything in Go is built on pointers: structs, slices, even fixed
arrays"

As I understand it while that does apply to pointer-to-struct and slices and
probably strings, that isn't true for naked structs and naked arrays. Those
both behave as value types like int.

~~~
jakub_h
True, one of the prime considerations for Go's design was that despite having
pointers, structs could be embedded as values into other structs. That's the
reason it doesn't look half like Java.

------
tyingq
I'm curious why they didn't use something like mmap. That would have skipped
the off heap approach, and also allowed for management, statistics, etc, to
run as a separate process.

Edit: Apparently the offheap package does use mmap if you pass a path to their
Malloc.

------
xor-xor
Regarding the comments here "why not Redis via UDS/loopback?" etc. - there are
some cases where such solution won't fit - e.g., running your service on Mesos
cluster.

------
Symmetry
Is this the sort of problem where optimistic concurrency would be a better fit
than standard locks?

------
elcct
very fast. much entries. so cache. such go. wow

~~~
twic
Go is web scale.

