
How Fast can A Single Instance of Redis be? - eblenkarn
https://docs.keydb.dev/blog/2019/06/17/blog-post
======
dvirsky
IIUC the benchmark did not use pipelines and the numbers show that. Redis
itself can outperform this by a factor of 10 with pipelining, and I'm betting
the difference from the module won't be that big given such a setup (which is
how you should ideally work anyway).

Still, nice to see the modules API being put to interesting use cases.

~~~
privateSFacct
Yeah - be interesting to see the real use case here. Pipelining + scripting if
needed normally go a long way. I'm curious who needs even more beyond this in
terms of performance and if this module actually provides it.

------
caymanjim
Anecdotally, my experience running Redis servers on AWS (both standalone EC2
instances and ElastiCache dedicated Redis instances) is that network latency
is likely to become a barrier before anything else. We struggled with the same
performance problems on both small-footprint small-payload Redis DBs and on
large ones, and paying for the next tier of network connectivity (between our
applications on EC2 and our Redis servers on EC2/ElastiCache) did the most to
alleviate delays.

~~~
KenanSulayman
This is very accurate. Seeing variations in GET / SET operations with
ElastiCache up to 10000μs, which is way longer than the actual execution
speed. After moving complex operations to transactions & even Lua scripts,
performance is fairly acceptable again.

I assume the AWS network is pretty noisy.

------
specialist
I'd like to see the client code, see how it manages backpressure.

(I'm probably being thick; the benchmark code is probably linked and I'm just
not seeing it.)

I recently maintained some nodejs & expressjs stuff. Neither the Redis clients
(or the original developers of our stuff) have any concept of backpressure
(throttling). In our case, HTTP request received would cause a Redis request.
For whatever reason, expressjs (or nodejs) event loop thingie processes
Request preempt other stuff. So Redis responses would pile up in Redis itself,
causing Redis to ABEND (out of memory).

Trying to explain backpressure, queueing, throttling to a bunch of junior devs
who _LOVE LOVE LOVE_ nodejs... Definitely was not my favorite gig.

\--

PS- I started using Redis a few gigs ago. I really wanted to hate it. I'm
primarily a Java dev and was esthetically offended by the NoSQL fad. But turns
out Redis is awesome. And antirez is now a personal hero. I truly wish I was
more like him.

~~~
ariosto
We've been running into the issue where our redis instance(s) randomly dies. I
haven't been able to pin point the problem (nodejs+redis). Would love to hear
your thoughts on some gotchas to look out for.

~~~
hinkley
Not OP, but I wonder what the collectd plugin for Redis would tell you.

------
lossolo
Well, to be honest if you are IO/network bound and kernel TCP stack is the
bottleneck then user space networking like DPDK can help in every application.
It depends on application but sometimes additional complexity of introducing
DPDK is just not worth it, and just spinning another instance/server is a
better choice. Look also at Seastar used in ScyllaDB with and without DPDK
numbers.

Just remember you need to give DPDK one whole NIC and it's using polling so
100% CPU usage on polling cores.

~~~
jdsully
DPDK does poll but you can have it sleep in low traffic situations. At the
expense of a little extra latency.

~~~
hinkley
I always figured the right solution here is a queue that sends any time the
request queue is over N elements or every M microseconds, whichever happens
first. Haven’t seen many implementations though.

Nagle’s algorithm for the oldest, maybe a couple others, possibly some code
I’ve written.

------
z3t4
These numbers seem off. 4ms to store something in memory ? Even if it include
network latency it should be able to improve further.

~~~
jdsully
Most of the bottleneck is in the TCP stack. That's why user mode networking
like this module does helps a lot.

~~~
fh973
TCP latency is on the order of dozens of us, not ms.

Valid question I'd say.

~~~
jdsully
TCP latency increases when you have high load and buffers in the way. Over 50%
of CPU time is spent in the TCP stack for Redis, with the bulk of the rest in
query parsing.

------
tinktank
How is this different from what Solarflare
([https://twitter.com/Solarflare_Comm/status/11134717313798430...](https://twitter.com/Solarflare_Comm/status/1113471731379843072))
is doing with Cloud Onload? From what I understand they don't require any
application changes and they work on any networking application.

~~~
neomantra
OpenOnload transparently replaces the UDP/TCP network stack and epoll calls of
an applications with highly tunable userspace components. It can work with any
application that uses these system calls. Of course, that application could be
Redis. If you search around, I've commented on using them together.

This is a Redis module, so will only work with Redis. Although I don't see its
implementation (?), it appears to connect Redis' Unix Socket interface to a
network stack running on DPDK (a user-space low-level network interface).

In SolarFlare world, ef_vi is their library to DPDK -- it is a packet buffer
interface. They then have OpenOnload (transparent acceleration that cooperates
well with the kernel) and TCPDirect, a proprietary userspace TCP/UDP library
that has its own interface. It's even higher performance than OpenOnload
because it doesn't have to coordinate with the kernel and you manage the
sharing of the network resources.

SolarFlare has a DPDK driver too. OpenOnload doesn't accelerate UNIX sockets.

One thing this module doesn't accelerate is epoll... I think a properly tuned
SolarFlare solution would be higher performing -- especially on the same
machine with TCP-loopback acceleration. But you don't know until you try it...

edit: added note about epoll

~~~
jing
Yep. And Mellanox does it too fwiw -

[https://community.mellanox.com/s/article/vma-improves-
redis-...](https://community.mellanox.com/s/article/vma-improves-redis-
transaction-rate-and-latency--memtier-benchmark-x)

------
lousken
[https://archive.fo/4WMYg](https://archive.fo/4WMYg)

------
paprikawuerzung
Nice! I am also working on developing an extension for Redis and tried
creating FlameGraphs as well but am not able to get them to work properly.
Could you please share the commands you executed for the Flame Graphs? Would
be greatly appreciated!

Already tried using '-fno-omit-frame-pointer' and '-O0'

# $CMD is a command starting a redis-server and creating traffic

perf record --freq=10000 --all-cpus -g -- $CMD

perf script --input=perf.data | ./stackcollapse-perf.pl > out.perf-folded

./flamegraph.pl out.perf-folded > perf-redis.svg

~~~
namibj
I can recommend hotspot from KDAB. Use

    
    
      perf record --call-graph dwarf,32768 -f 999 -- $CMD
    

if the following does not work or you work on something older than Haswell:

    
    
      perf record --call-graph lbr -f 999 -- $CMD
    

Be careful with the frequency. Use cycles:up as the event (with -e) for
general cpu time, and other stuff like LLC-load-misses
cycle_activity.stalls_l3_miss as an example on a Kaby Lake system. Use

    
    
      perf list
    

to search for the right event name. On the Broadwell i5/dualcore+HT Laptop I
see cycle_activity.stalls_l2_miss as the equivalent, due to it apparently not
having an L3 cache. cycle_activity.stalls_mem_any highlights code where the
CPU is doing nothing while waiting on memory.

For de-inlining I found simpleperf from the android-ndk to be the only tool
not wastefully spawning one addr2line for each-single-address. Yes, that takes
ages to process. Yes, I gave up and used simpleperf, which caches this. And
yes, I considered patching perf-tools to use the pipe-based interface to
addr2line.

Hotspot unfortunately appears unable to distinguish time spent in a function
between the different inline stacks inside said function, so I had to forego
heavy link-time-optimization that got 5-10% without much else, because there
was no meaningful insight left into what part spend how long computing.

And please, please refrain from -O0 when you want performance. Either to use
the performance or to measure it. Instead add -g or -ggdb in there, to force
dwarf debug info to get line info and stack frame unwinding, the latter
without relying on the frame pointer. Though, for the unwinding itself, lbr
does a great job. Just keep in mind that it's max depth is limited by the CPU
generation, and can't by bypassed/increased.

------
the_duke
Hackernews hug of death already?

~~~
blaisio
Maybe they should use this module that I can't read about to power their site.
:P

~~~
mpweiher
Makes me feel a little better about my libµhttp/Objective-C/Objective-
Smalltalk server (serving [http://objective.st](http://objective.st)), which
held up just fine to the HN hug of death, running on the smallest digital
ocean droplet. :-)

~~~
majewsky
A simple nginx also holds up very well:
[https://xyrillian.de/thoughts/posts/latency-matters-
aftermat...](https://xyrillian.de/thoughts/posts/latency-matters-
aftermath.html)

------
ngaut
How does it compare to Redis threaded I/O?
([https://twitter.com/antirez/status/1110973404226772995?lang=...](https://twitter.com/antirez/status/1110973404226772995?lang=en))

Using 4 threads it is simple to get 2x the performances of single-threaded
Redis (even if yet the reading part is not threaded).

------
kraftman
Impressive! Would be great if you could share the SVG of the flamegraphs too.

~~~
ericblenkarn
No Problem [https://download.keydb.dev/Redis-Normal-
FlameGraph.svg](https://download.keydb.dev/Redis-Normal-FlameGraph.svg)

[https://download.keydb.dev/Redis-plus-Module-
FlameGraph.svg](https://download.keydb.dev/Redis-plus-Module-FlameGraph.svg)

------
ksec
Well the site is down.

~~~
ericblenkarn
We are back up, had to move to a bigger server haha

~~~
overcast
I'm always curious what kind of setup people run that can not handle a
thousand people connecting into HTTP.

~~~
lioeters
I did a bit of sleuthing in the dev console.

The site is a static HTML site generated by Docusaurus [0]. That seems like it
should be very fast and lightweight for serving tons of concurrent requests.

In the page's response header, it shows that the server is NGINX on Ubuntu.
Ahhh, and "X-Powered-By: Express". I suspect that's the bottleneck, serving
static assets via Node.js+Express, instead of directly with NGINX.

I also see in the document foot that it tries to load a script from
[http://localhost:35729/livereload.js](http://localhost:35729/livereload.js).
That could be a sign that the site is a development build, not optimized for
production..?

[0] [https://docusaurus.io/en/](https://docusaurus.io/en/)

------
espeed
To maximize reach, have you explored what would take for a first-class Redis
wasm [1] implementation, maybe pairing it with Terra [2] for Lua scripting.

[1]
[http://webassembly.github.io/spec/core/exec/index.html](http://webassembly.github.io/spec/core/exec/index.html)

[2] [https://github.com/zdevito/terra](https://github.com/zdevito/terra)

------
segmondy
"Redis is known as one of the fastest databases out there."

Redis is not a database. Let's begin with that.

~~~
PopeDotNinja
Depends on how you define database. Wikipedia starts with...

”A database is an organized collection of data, generally stored and accessed
electronically from a computer system. Where databases are more complex they
are often developed using formal design and modeling techniques." [1]

...which makes Redis sound like a database to me. If I am not technically
correct, feel to educate me.

Side note: I try to focus on what problems we're trying to solve, because it's
hard enough to get people to sync up on that. I've found focusing on being
right is inversely proportional to the health of my relationships.

[1]
[https://en.wikipedia.org/wiki/Database](https://en.wikipedia.org/wiki/Database)

~~~
segmondy
Your filesystem then is a database. You can even stretch it to include your
text editor. Redis is not a database.

~~~
im3w1l
I think there is much to be learned by comparing databases and filesystems.

One thing I think that filesystems can learn from databases is the notion of a
compound primary key. It would be neat if app-files were identified by an
(app, type, id) tuple. This would bring the advantages of both the posix and
the windows filesystem layouts.

For instance if we had(app=firefox, type=/usr/bin, id=main). Then we could
easily find all firefox files by querying by app. Or we could easily find all
binaries in PATH by querying by type.

ps. I think this would work better than the overly general tag-based
filesystem people sometimes propose.

~~~
dvirsky
Well, that's basically what directories are.

~~~
im3w1l
The posix filesystem layout doesn't provide an easy way of enumerate all files
belonging to a given program. The traditional windows (98?) layout of one app
in one folder conversely doesn't easily allow enumerating all binaries. Or all
manual entries. Etc

~~~
dvirsky
Ok, so let's say you arrange your folders to acommodate for that. you have
/files/$user/$program/$file. That's basically what a primary key in a database
looks like. If you want a secondary index, what databases often do is just
create a second table with a different primary key, with the value being
primary keys of the main table.

We can model that in a filesystem as well, of course. So if I want one
filtering by file type and one filtering by month of creation, then I can
create /files/$program/$file and then ln -s /files/$program/$file
/files/month/$month/$file

