
An update on Redis Streams development - djanowski
http://antirez.com/news/116
======
neovintage
I believe Redis has a great developer experience. It’s easy to get set up and
use. When I see the work on Redis streams, I think its going bring a much
better getting started experience for developers that want to start using
evented architectures. This might be a turning point where we see more
developers and applications utilizing those types of architectures. In the
end, this might give Apache Kafka a run for its money, or not, who knows. I've
tried using Apache Kafka and it can be a bear to set up given its dependencies
and to administer.

Put another way, I think Kafka solved a problem for enterprises and was a tops
down approach to the problem. Redis streams is a bottoms up approach to
implementing evented architectures. Maybe there's room for both products in
the market?

~~~
antirez
Thanks for your interesting POV on the Redis/Kafka intersection on this. What
could be interesting is that while Redis Streams _are certainly_ totally
inspired from Kafka streams, it's just a conceptual thing, so the two things
can act in very different ways in practice.

For instance just imagine that I never touched a Kafka system in my life,
never used it, don't even know the API, I only read all the documentation they
have on the site about the design, to get the higher level picture and combine
this with my own ideas about fixing the fact Redis was lacking a "log" data
structure.

Pub/Sub + other data structures were not able to provide time series and
streaming, but yet Redis streams remain an ADT (Abstract Data Structure),
while Kafka is a tool to solve a very specific business case. So the
applications have some intersection, but are also very very apart.

For instance you can create infinite small keys having streams, so Redis is
good for many IoT usages where you receive data from many small devices. Redis
Streams also stress on range queries, so that you can combine the time series
with millisecond-range queries.

However, yes, the fact that I added also consumer groups is a way to put this
"80% streaming" into a more usable streaming systems more similar to Kafka,
for the use cases where:

1) The memory limits.

2) The speed.

3) The consistency guarantees of Redis make sense.

However at the same time, it was a great challenge and pleasure to do what I
always try to do, that is to create an API for developers thinking like I'm
designing an iPhone, and not some terrible engineering thing which does what
it should but is terrible to use (I'm not referring to Kafka that I _do not
know_ ). So I really hope that what you say "easy to setup and use" will be
what developers will feel :-)

~~~
j_s
I'm a big fan and I hope you see this comment before the edit window passes
(unlikely, unfortunately):

You need to get yourself some whitespace in this reply!

~~~
antirez
Thanks! Done :-)

------
pselbert
I must admit, for the past couple months I've been digging for status info
about Redis Streams. They would fit a use case we have perfectly, but we use
cloud providers for Redis so manual compilation with modules isn't possible.

Really pleased to see that I'm not the only one digging for info and that work
is ongoing.

~~~
antirez
Thank you, I believe it's my fault that many potential users remained
wondering. Sometimes I forget that the world is not inside Twitter... and I
should instead blog more and tweet less, both for myself, because writing a
blog post gives me much more sense to accomplish something, and for people
interested in Redis that will find information more readily, and also
searching via Google and so forth.

~~~
j_s
Why not both?!! (summarizing tweets occasionally)

Perhaps a resurrection of the Redis Watch newsletter is in order! Are there
any existing alternatives?

------
acidus
I wonder if this impacts the plan of releasing Disque as a plugin in Redis
4.2. I always thought that Disque could have a great impact in the field of
job queues.

~~~
antirez
Hello, no change... The original plan was:

4.0 (done) ->

Streams back ported to 4.0 (Work in progress) ->

4.2 (or 5.0) with Disque + Cluster improvements + Modules improvements, ...

It's just a renaming:

4.0 + Streams backported is now called 5.0

What was to be 4.2 is going to be called 6.0

Why I'm choosing to go for integer numbers? Because I believe that things like
4.2 should be for minor improvements, mostly operational, but to add the first
data structure after ages deserves 5.0, similarly to have a reshaped Redis
Cluster + Disque deserves 6.0, and I get myself confused as user of other
systems when they advance like 1.4, 2.3, 2.7, ... It's simpler to talk about
Redis 4, Redis 5, Redis 6, ...

~~~
jrajav
Hi, since you're here just a quick question not really worth a Github issue -
what is the reason for the 1-second resolution in the DELAY parameter in
Disque and will that ever get more fine? We currently use Bull as a job queue
on Redis and the delay is a key and visible part of our application, so that
alone kind of eliminates Disque as a consideration for us.

~~~
antirez
Hello, thanks I'll take this in mind. The resolution could be made to be
accepted in milliseconds, and it should be simple to honor it with an error
for example of 50 milliseconds or alike, but to get true 1 ms resolution
requires non trivial changes to the system that must act like a real time
scheduler in some way...

------
epaulson
Are there any examples of good web APIs that offer something like a unified
log as an abstraction? I'm not looking for systems, but actual companies that
have some kind of "streaming" data feed where you can (re)connect to an
endpoint and say "give me everything from [logical] timestamp X". Ideally one
where you stay connected and get longpoll SSE/WebSockets/MQTT-style streaming
responses.

I kind of want the opposite of webhooks.

~~~
jkarneges
There are certainly "client oriented" (as opposed to webhooks) push APIs out
there. However, such APIs that let you specify a starting position to read
from are rare.

Sometimes APIs will give you tokens to use for resumption (e.g. SSE event IDs,
or any long-polling API), but typically these are for a time-limited session
rather than a stateless query against any point in a long-lived log.

Years ago, services like Friendfeed, Livefyre, and Convore had stateless long-
polling APIs that returned a log of data, I believe. These kinds of APIs seem
to have fallen out of fashion, though. There are still stateless long-polling
APIs, but most of the ones I'm aware of don't return logs of data. For
example, Dropbox and Box will let you query for a change notification against
a starting position, but then you have to fetch the actual data separately.

That said, just because streaming APIs that let you set a starting position
are rare doesn't mean they're impossible to make. My company
([https://fanout.io](https://fanout.io)) has built tools to help with this.

Edit: since you asked for a real example, Superfeedr is one such API:
[https://documentation.superfeedr.com/subscribers.html#stream...](https://documentation.superfeedr.com/subscribers.html#streaming-
rss)

------
manigandham
Great to see. We actually ended up using Redis for our event stream after
trying Kafka and the rest. Using the list extension module multiple list pop
functions lets us get to 2gbps of throughput on a single redis node with AOF
persistence. Using streams would make things even simpler and faster.

~~~
somethingsimple
Did you have issues with Kafka? Curious to hear because I’m about to try and
start using it for something at work.

~~~
manigandham
It's just much more work to install and maintain Kafka, and it has issues with
load balancing and recovery due to the design tying cluster ownership to
partition data. With AOF persistence and a replica, Redis is durable enough
for us and extremely fast with no maintenance.

If you absolutely need Kafka then it's still a good option, although I'd
recommend looking at Apache Pulsar [1] for a better design. It separates
storage and compute for better performance and scalability while giving you
features like per-message acknowledgements.

1\. [https://pulsar.apache.org](https://pulsar.apache.org)

------
netgusto
Could someone ELI5 what's a redis stream ? I thought that pub/sub mecanisms
were some kind of a stream already.

~~~
luhn
Redis Pubsub is fire-and-forget, so if you aren't listening when a message is
fired, you'll never receive it. Redis Streams store messages, so you can
connect and read all the messages since you last checked. It's a similar model
to Kafka.

~~~
eropple
I've not been following Redis a lot, so this one is new to me. I read
antirez's original Streams blog post[0] but it feels like it's missing some
stuff--is it known out there how this interacts with Redis Sentinel or,
separately, Redis Cluster?

[0] - [http://antirez.com/news/114](http://antirez.com/news/114)

~~~
itamarhaber
In the Redis sense, streams are just a new data structure (value type of a
key), so there shouldn't be any special concerns with regards to Sentinel,
partitioning and/or clustering in that sense.

~~~
eropple
That is my intuition, yeah, but that makes me pretty worried about hammering
hot keys and the like with regards to Cluster.

I also don't mind running Kafka, though, so I may not be the target audience.

------
nicois
From the blog post, it sounds like the RDB format will break. Is there any
means to upgrade from the current unstable RDB format to the v5 one? Obviously
there are no consumer groups in unstable now, so in theory reading from a v4
RDB should not be hard..

------
mdomans
Here's a question from an old Redis hater (the note is important, since my
question is going to be slightly biased - I disagree with a lot of core
decisions behind Redis):

How is this going to be different from Kafka? And I don't mean implementation
details, because these are always fun read. Kafka is on the market for ~7
years, during which it has proven to be oh-so-fast and pretty durable.

Oh, and while I'm at it. Here's another problem Redis geniousl added: a GIL.
GIL is a great idea, but comes with huge tradeoffs. David Beazley spent years
showing how many tricks you can play upon yourself with GIL.

So ... now you have Streams and GIL together. And you already have dicts (you
call them hashmaps). I have a feeling you're trying to implement Python. If
so, it's done. But come on, 3.6 is cool. And we're kinda solving the GIL
problem. With PyPy. Which will blow your mind.

~~~
antirez
So yes, that's it. The Redis hidden agenda was to compete with Python (to be
honest not a big secret... you can see that Redis works internally as an
interpreter in a pretty obvious way), and now that you uncovered it, I'm going
to say it aloud: we are going to exit in a few months with a new package
system we are working at for 5 years at this point, based on the blockchain
(proof of installation), which will kill NPM completely, so with a Python
killer + an NPM killer we'll see what mind will be blown.

~~~
mdomans
Chapeau bas. But apart from my rather obnoxious joking - how the whole Stream
and GIL interoperate. I mean, this is technically a complex problem.

