
Clarifications about Redis and Memcached - juanfatas
http://antirez.com/news/94
======
cheald
I dropped Memcached in favor of Redis for caching a _long_ time ago, because
as far as it matters for my purposes, Redis is a strictly-superior superset of
Memcached's functionality, and I have no desire to maintain multiple pieces of
software in the stack if one will do the job.

I'm sure there are extreme cases where Memcached is in fact the better tool
for the job over Redis for caching workloads. I also expect that 99%+ of
people trying to decide between Redis and Memcached will never get into that
territory. Redis is _so_ fast that unless you're doing O(n) operations on very
large data sets, you're unlikely to notice any substantial differences.

The other thing about caching is that the data is, by its nature, disposable
and rebuildable. So even in the extreme minority case where Redis would no
longer be sufficient, migration from one KV cache system to another is about
as easy as it gets. Pre-optimizing your caching layer stack for Facebook
levels of traffic isn't even justifiable from a lock-in standpoint like it
might be with other data storage needs.

In the case of your average Sidekiq user, serving cache fragments for a Rails
app, memcached vs redis for your caching layer is almost certainly an
inconsequential choice WRT the performance of your application, and the choice
of Redis reduces your ops and client library overhead. The choice should be
pretty clear in those circumstances.

------
jtchang
This is a really good blog post. I find it very neutral even when the author
happens to be the one who created Redis.

What I really like about both pieces of software is that they are dead simple
to set up. If I were to make a decision today for new projects who are just
starting to scale I'd probably take a good hard look at redis and see if you
don't need any of the extra features. I could see a lot of projects eventually
needing those features and leveraging redis. I really like the pub/sub feature
in particular. Persistence is always nice if you don't have to pay too much
for it.

Does anyone know if amazon has a redis/memcache service? I can never remember
what they name things.

~~~
kgrin
They do - ElastiCache. And as it happens, you can choose either Redis or
Memcached!

------
gtrubetskoy
I'm curious about the "threaded redis" reference. Back a couple of years ago I
built a Collaborative Filtering recommendation system that used Redis for
graph storage and relied heavily on sorted sets to compute the recommendation
right in Redis.

I _really_ needed some kind of a parallelism and so I hacked together
[http://thredis.org/](http://thredis.org/) (and then mostly for fun added SQL
operations to it by linking it with SQLite).

Since then I've kind of abandoned this project and moved on to other things,
but I still think that there is a valid case for some form parallelism in
Redis. I had learned some tough lessons while hacking on Thredis such as
importance of ordering locks, having retry strategies, and there are still
bugs that can cause it to crash AFAIK, but the take away was that it's doable
- I was a newbie at it, today I'd probably do a much better job. In my (not so
scientific) testing Thredis was only slightly slower than Redis.

~~~
antirez
I remember thredis very well! But it's different compared to what memcached
does and Redis has plans for: memcached just threads the I/O part, not the
access to the key space which is serialized via a mutex. However what you had
in mind is also in our long term plans... and was addressed in another blog
post here: [http://antirez.com/news/93](http://antirez.com/news/93)

~~~
NovaX
The locks are becoming more fine-grained in memcached [1], so that should be
less of a problem now.

It is possible to remove lock contention on the read path [2] if a concurrent
hash table is used. This can be done while using an O(1) eviction policy that
outperforms LRU [3].

[1]
[https://github.com/memcached/memcached/pull/97](https://github.com/memcached/memcached/pull/97)
[2] [https://github.com/ben-
manes/caffeine/wiki/Design](https://github.com/ben-manes/caffeine/wiki/Design)
[3] [https://github.com/ben-
manes/caffeine/wiki/Efficiency](https://github.com/ben-
manes/caffeine/wiki/Efficiency)

~~~
antirez
NovaX: thanks for the interesting references. The point is, is it worth for
memcached to avoid the global interpreter lock in the hash table with the
number of cores currently deployed machines have? I would expect to see very
little contention. The concurrent hash table looks a good idea for memcached,
for sure to have a mutex per key would be likely an overkill in terms of
memory usage. I'll try to read with care the links you provided, thank you.

~~~
NovaX
As you said elsewhere, the network I/O is the primary bottleneck. There are a
lot of different hashtable designs (so per-key locks not required), but fine
grained locking of the table/LRU is probably enough. Since an in-app cache has
a different perf profile, the latter two links summarize my work.

------
LukaAl
I tried both and i prefer Redis over Memcached except for very simple use
cases. It is true that for a pure caching system Memcached is easier to setup,
easier to scale and faster. But by just a small margin. On the opposite side,
Redis provide a lot of primitive very fast and implemented in C that allow for
more complex use cases. For instance, in one of the project I was following,
the ability to merge two or more sorted set was pivotal in implementing a more
efficient warm-up scheme. I could get the same result with Memcached and a
software layer, but it would be less efficient in term of latency and
bandwidth. Given these huge advantages, my choice is almost always Redis
because I prefer to manage one system instead of two.

------
devit
If data access is equally distributed across keys, then running multiple Redis
instances (with sharding) is perfect, since ultimately you want each key to be
serviced by a single core, for optimal performance in a system with caches and
NUMAs.

However, if data access is read-heavy and not equally distributed across keys,
then having shared memory (aka multithreading) is quite essential since you
want all cores to operate on the same data, and shared memory is much better
than sending the data across especially if the heavily accessed keys vary
quickly over time.

I'm not sure if memcached handles this well (it requires a lightweight
rwlock/RCU/MVCC mechanism, for instance), but a shared-nothing system like
Redis cannot provide good performance in all cases.

~~~
antirez
In a networked server getting data from memory, the time required to access
the data itself is negligible, so the real performance in mostly a factor of
how much I/O a thread can sustain (Redis with heavy pipelining will handle at
least 500k ops/sec in the same hardware it handles 100k ops/sec without
pipelining). There is still the case of the load to be very biased towards,
like, 5% of keys. But that 5% of keys are very very likely to be distributed
across all processes.

However this is not true when you have a case with, like, 2/3 super hot keys
that are requested a lot more than any other. But in this case what allows
scalability is replication with many read-replicas.

~~~
vitalyd
I think once you implement threaded i/o, requests for hot keys will hit in the
cpu cache and you'll become NIC limited. At that point, read replicas _is_ the
best solution rather than shared memory since contention will move to NIC and
adding more cpus won't help.

Edit: Salvatore, you should also look at the Seastar/ScyllaDB design (if you
haven't yet) - that architecture would work well for redis as well. And if
user has access to DPDK (or other kernel bypass enabled NICs, like
Solarflare), their performance will go up even further.

~~~
natebrennand
The group behind Arrakis did some testing by bypassing the kernel through
NIC's and the results were pretty amazing. They also made modifications to
memcached and got similarly great results FWIW.

[https://www.usenix.org/system/files/conference/osdi14/osdi14...](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-
peter_simon.pdf)

------
mperham
I'm glad antirez spoke up and clarified. I do think Redis is perfectly
acceptable as a cache if you make a few config tweaks and run a second
instance.

Many of the developers I work with are unfamiliar with Redis. Maybe they've
heard of it and they install it to use Sidekiq and that's it. From the
perspective of a newcomer, memcached is "safer" to run because it'll be memory
limited automatically with literally zero config necessary. Redis does require
the non-default LRU and memory tuning to be a safe cache.

It sounds like my performance concerns are primarily a thing of the past. Glad
to hear it and I look forward to the threaded I/O coming after lazyfree.

------
yuliyp
For a post that attempts to compare Redis to memcached for caching, it's
amazing how few actual numbers appear in the post.

~~~
patsplat
"Actual numbers" for caching services only have meaning in the context of a
particular application.

~~~
yuliyp
Yes, but as-is the post is pure hand-waving. At least some measurement could
confirm the theories being stated. Did someone actually try taking one use
case where they had memcached and replace it with Redis? What actually
happened?

~~~
patsplat
Ok then. Build an application heavily dependent on caching. Restart memcached
in the middle a production workload. How long does it take to get out from
under the dog pile?

EDIT: This started out a bit flippant. Wanted to make the point that antirez
is not just handwaving.

For example one of the points without elaboration was 'there are “pure
caching” use cases where persistence and replication are important'. Sometimes
caching warming isn't feasible i.e. needing to restart the cache in the middle
of a production workload. Using entirely volatile cache one can find extended
downtime from the dog pile of requests waiting for the cache to warm up.
Persistence can be an attractive form of insurance against this scenario.

------
jack9
Some minor points

> Redis in its special case of memcached replacement, may be addressed
> executing multiple processes

This is always less optimal to a single process. Simpler is better.

> Redis is very very observable

This has a cost...trashing your phenomenal read and write performance. Running
a concurrent redis, just for monitoring, is a hack that I have used to
continue leveraging the observability.

~~~
skuhn
I agree that simpler is often better, but running multiple cache processes may
be a requirement for either memcached or redis if you want to get consistent
latency from them. I have observed a significant improvement in performance
(at large scale) when binding memcached to a single NUMA zone.

NUMA lets you address non-local memory, but memcached and redis don't utilize
libnuma so they don't know whether the memory is local or not. The entire
system's memory is available as one contiguous blob, but some of it is a lot
slower than the other (depending on which CPU core you're running on). To get
around this, on most servers you'll need to run two processes (and two
different ports) and bind them to appropriate CPU core sets and memory.

~~~
antirez
Thanks for the info, I think similar results were reproduced with Redis too.

------
rubiquity
I would guess that Mike Perham's point of view is based on how Sidekiq uses
Redis for persisting jobs. If you don't configure Redis properly and you use
Redis as a job queue as well as a cache then you risk the problem of your job
queue entries getting evicted depending on your Redis config/version of Redis
you are using. That type of thing happening frequently would be very annoying
for someone that made a job queue that uses Redis for persistence of jobs.

Caching web content can grow and grow until you run out of memory. If you
aren't using Redis for data that can be easily regenerated, then go ahead and
use Redis as a cache. But if you are, I think it will give you operational
peace of mind to segregate where you store your background jobs and where you
cache content.

~~~
thezilch
That makes zero sense. Why would you run them out of the same instance / with
the same configuration? Mike is already running a second process for caching;
he'd simply replace it with a different instance of Redis.

~~~
rubiquity
You're right. It doesn't make sense. Does that prevent tons of people from
doing it? Nope.

------
seamusabshere
I cache with redis:

[https://github.com/seamusabshere/lock_and_cache](https://github.com/seamusabshere/lock_and_cache)

(and lock too, with Redis Redlock)

------
hackerboos
Last time I looked Redis would balk if it ran out of memory whilst memcached
just pushed out the oldest caches.

I guess that's not a problem anymore..?

~~~
skuhn
By changing the eviction policy and using the _maxmemory_ directive, you can
effectively reproduce memcached's behavior in redis.

[http://redis.io/topics/lru-cache](http://redis.io/topics/lru-cache)

------
avdicius
Both are relatively small projects and really there is no much stuff to
compare.

I'm slowly working on my own memcached clone and plan to add persistence there
eventually. Not so sure about replication, it might be too hard. In principle,
after I'm done with getting a better version of memcached, both feature-wise
and performance-wise, if time permits, I can also add the redis protocol
support.

Therefore, it will be possible to have both in a single package and don't
worry finding which one is better.

This is my project, if anyone is interested:
[https://github.com/ademakov/MainMemory](https://github.com/ademakov/MainMemory)

[Update: my statement was only about cache-related functionality, admittedly
redis supports very interesting data structures, persistence, replication. But
as just a very fast in-memory cache, there is nothing particularly advanced in
either case. On the other hand there are projects like RAMCloud, Seastar that
I find inspiring when I work on my own project.]

