

Redis VS Memcached (slightly better bench) - dlsspy
http://dormando.livejournal.com/525147.html

======
antirez
Just tested in practice the N instances approach with Redis: what I get in
similar hardware is 100k SET/GET operations per core, so using the same box
dormando used for the test we should get 400k operations per second.

Note that you'll see this numbers for both SETs and GETs, as with the single
process approach of Redis there is no contention. Instead with memcached you
see different numbers for SETs and GETs, I guess this is due to some kind of
locking happening. It's a tradeoff, with threads you have more power against a
single "object", but at the cost of more complexity and less linear
scalability. Anyway it's still possible to run a memcached process per core, I
would like to see this results as well.

Currently I don't have access to a box with 8 different cores like the one
used in the test so I can't fully replicate the test, so I hope dormando will
update the results adding the test that uses four Redis instances.

------
postfuturist
I don't get it, Redis isn't optimized to be a key/val store like Memcached.
Redis does a hell of a lot that Memcached doesn't do, so if we added those
operations to the chart, Memcached would be flatlined. If I need a fast,
key/val cache, Memcached is the obvious choice. If I need something different,
like a fast set of in-memory, disk-backed queus, or hash tables, or sets, I'll
use Redis. It's like saying, look, this hammer puts nails in the wall faster
than banging them in with this multitool.

------
antirez
Hello,

I'm not sure why memcached can't saturate all the threads with the async
benchmark, but if you want to maximize everything in your test involving
multiple processes you should also run four Redis nodes at the same time, and
run every redis-benchmark against a different instance.

We tried, and this way you'll get very high numbers for Redis, but this is
still wrong as it starts to be very dependent on what core is running the
benchmark, and if it is the same as the server. So a better one is to have two
boxes, linked with gigabit ethernet, and run the N clients in one box and the
N threads of the server (being it a single memcache process and N threads or N
Redis processes) on the other box.

~~~
dlsspy
Running four instances of redis is not the same as running one instance of
memcached with four threads.

Four times the number of processes makes for 1/4 the effectiveness of multiget
or multiset on average since your keys are now spread across several process
boundaries. In good clients, that's the difference between a single packet to
a single destination and four packets to four different destinations plus
result processing.

Scaling horizontally is great, but scaling vertically is also useful in
practice. It's giving the process more memory to handle more requests on fewer
connections.

A bit more than a thread per core is faster. This is demonstrated by
dormando's run of your test as well as a comparison of the throughput of your
test vs. my mc-hammer tool (<http://github.com/dustin/mc-hammer>) I use for
memcached. It can push well over twice as many ops on my dual core laptop as
it can with a single core (which is in the neighborhood of a single instance
of your test app).

It wouldn't take a _lot_ of work to get your benchmark tool natively
supporting multiple threads. Kill off the global config and split out one per
thread and then allow the connection to find its config. Atomics for recording
stats and there should be quite near no cross-thread communication (this is
what I did for mc-hammer).

That'd be useful for your own test cases as well as apples-to-apples
comparison with memcached.

~~~
antirez
Hello, your reasoning is what resulted in memcached to use a multi thread
approach I guess. I don't agree with this design choice for a few reasons: in
Redis we saw that multi gets are everything but a very used primitive. You can
still group things that are often asked very often together, so that they'll
be in the same instance (we have an explicit concept about this, called hash
tags). We have hashes for objects that are composed of many fields, and it's a
single key, so every kind of hashing algorithm will still lead to the object
being stored together.

In contrast see the benefits of a single threaded approach: zero contention,
you can manipulate complex data structures without locking, more stability
(less bugs) for the same amount of features.

In my opinion it's a no brainer... but it's also a matter of taste / vision /
priorities.

About benchmarking with a multi threaded tool, this is something I'm going to
investigate today because something is strange here.

Let's assume that a multiplexing benchmark is not enough, why it was still
able to saturate Redis better than memcached? I don't have a good answer to
this question but I want to understand what's happening.

Also, running multi-thread benchmarks against multi threads servers is going
to bring results that are much different than what you'll get in the real
world where there is a networking link between: when both the benchmark and
the server thread are in the same core I think what happens is that the I/O
system calls start to be much faster.

It's a shame I don't have the hardware to perform the two boxes test as this
could be really cool for Redis and memcached, a much better indicator of what
the limits are currently with this two systems and in general with TCP
servers.

------
mateuszb
this is a quite funny response :) seeing one person talk about toilet software
and make an oversight when writing a 'better' test :)

