Hacker News new | past | comments | ask | show | jobs | submit login
An update on Redis Streams development (antirez.com)
220 points by djanowski on Jan 25, 2018 | hide | past | web | favorite | 46 comments

I believe Redis has a great developer experience. It’s easy to get set up and use. When I see the work on Redis streams, I think its going bring a much better getting started experience for developers that want to start using evented architectures. This might be a turning point where we see more developers and applications utilizing those types of architectures. In the end, this might give Apache Kafka a run for its money, or not, who knows. I've tried using Apache Kafka and it can be a bear to set up given its dependencies and to administer.

Put another way, I think Kafka solved a problem for enterprises and was a tops down approach to the problem. Redis streams is a bottoms up approach to implementing evented architectures. Maybe there's room for both products in the market?

Thanks for your interesting POV on the Redis/Kafka intersection on this. What could be interesting is that while Redis Streams are certainly totally inspired from Kafka streams, it's just a conceptual thing, so the two things can act in very different ways in practice.

For instance just imagine that I never touched a Kafka system in my life, never used it, don't even know the API, I only read all the documentation they have on the site about the design, to get the higher level picture and combine this with my own ideas about fixing the fact Redis was lacking a "log" data structure.

Pub/Sub + other data structures were not able to provide time series and streaming, but yet Redis streams remain an ADT (Abstract Data Structure), while Kafka is a tool to solve a very specific business case. So the applications have some intersection, but are also very very apart.

For instance you can create infinite small keys having streams, so Redis is good for many IoT usages where you receive data from many small devices. Redis Streams also stress on range queries, so that you can combine the time series with millisecond-range queries.

However, yes, the fact that I added also consumer groups is a way to put this "80% streaming" into a more usable streaming systems more similar to Kafka, for the use cases where:

1) The memory limits.

2) The speed.

3) The consistency guarantees of Redis make sense.

However at the same time, it was a great challenge and pleasure to do what I always try to do, that is to create an API for developers thinking like I'm designing an iPhone, and not some terrible engineering thing which does what it should but is terrible to use (I'm not referring to Kafka that I do not know). So I really hope that what you say "easy to setup and use" will be what developers will feel :-)

I'm a big fan and I hope you see this comment before the edit window passes (unlikely, unfortunately):

You need to get yourself some whitespace in this reply!

Thanks! Done :-)

What level of guarantees will Redis Streams have during power failure? Can I configure if events are close to immediately persistent to disk VS served from memory and occasionally paged to disk?

Redis streams are just another data type and follow the same persistence settings that have been available to Redis: https://redis.io/topics/persistence

You can use either RDB: snapshots of the entire dataset at some interval and/or AOF: logging of all changes fsync on every write, every second or leaving it to the OS.

> that is to create an API for developers thinking like I'm designing an iPhone

Albeit my request is quite selfish, I really would love to hear/read more about your thoughts on designing APIs. My experience using Redis has been excellent and I'd to be able to replicate that sense of design in the systems I build as well.

I must admit, for the past couple months I've been digging for status info about Redis Streams. They would fit a use case we have perfectly, but we use cloud providers for Redis so manual compilation with modules isn't possible.

Really pleased to see that I'm not the only one digging for info and that work is ongoing.

Thank you, I believe it's my fault that many potential users remained wondering. Sometimes I forget that the world is not inside Twitter... and I should instead blog more and tweet less, both for myself, because writing a blog post gives me much more sense to accomplish something, and for people interested in Redis that will find information more readily, and also searching via Google and so forth.

Why not both?!! (summarizing tweets occasionally)

Perhaps a resurrection of the Redis Watch newsletter is in order! Are there any existing alternatives?

Your blog posts are wonderfully written.

I wonder if this impacts the plan of releasing Disque as a plugin in Redis 4.2. I always thought that Disque could have a great impact in the field of job queues.

Hello, no change... The original plan was:

4.0 (done) ->

Streams back ported to 4.0 (Work in progress) ->

4.2 (or 5.0) with Disque + Cluster improvements + Modules improvements, ...

It's just a renaming:

4.0 + Streams backported is now called 5.0

What was to be 4.2 is going to be called 6.0

Why I'm choosing to go for integer numbers? Because I believe that things like 4.2 should be for minor improvements, mostly operational, but to add the first data structure after ages deserves 5.0, similarly to have a reshaped Redis Cluster + Disque deserves 6.0, and I get myself confused as user of other systems when they advance like 1.4, 2.3, 2.7, ... It's simpler to talk about Redis 4, Redis 5, Redis 6, ...

Hi, since you're here just a quick question not really worth a Github issue - what is the reason for the 1-second resolution in the DELAY parameter in Disque and will that ever get more fine? We currently use Bull as a job queue on Redis and the delay is a key and visible part of our application, so that alone kind of eliminates Disque as a consideration for us.

Hello, thanks I'll take this in mind. The resolution could be made to be accepted in milliseconds, and it should be simple to honor it with an error for example of 50 milliseconds or alike, but to get true 1 ms resolution requires non trivial changes to the system that must act like a real time scheduler in some way...

Love that you are using sem version properly. Too many folks couple marketing reasons into their versioning schemes as opposed to simply, “did I break backwards compatibility?”

That sounds like a good plan, thank you!

My feeling is that Redis itself is good enough for job queues so that there is not a huge pressure to improve it with Disque.

I actually implemented yet another job framework[0] for fun in Python with Redis and it was a pleasure. Lua, pub/sub and atomic operations really go a long way!

[0] https://github.com/NicolasLM/spinach

What I loved about Disque is that it was the magical distributed systems thing that everybody really want for messages safety, but done in a way totally transparent for the user, and it actually worked well enough that there are people keeping using the RC1 for ages... That's why I want to resurrect such a project. The fact that it is multi master, auto federated and auto balancing, and all the auto stuff happen without any user stress, was kinda a good thing. As a Redis module everything will be the same, but, without all the code duplication, because otherwise Disque is like a 5000 lines of code project or alike, if not less.

That's for Salvatore to decide, but the version is just an arbitrary choice. Assuming that Disque is coming as a Redis module, it may be even supported by v4.0. OTOH, I believe that implementing Disque will have implications on the Modules API, so these would have to be backported to a v4.2 (or alike) if it is decided so. In any case, this shouldn't have an impact on the Disque roadmap significantly IMHO.

Are there any examples of good web APIs that offer something like a unified log as an abstraction? I'm not looking for systems, but actual companies that have some kind of "streaming" data feed where you can (re)connect to an endpoint and say "give me everything from [logical] timestamp X". Ideally one where you stay connected and get longpoll SSE/WebSockets/MQTT-style streaming responses.

I kind of want the opposite of webhooks.

There are certainly "client oriented" (as opposed to webhooks) push APIs out there. However, such APIs that let you specify a starting position to read from are rare.

Sometimes APIs will give you tokens to use for resumption (e.g. SSE event IDs, or any long-polling API), but typically these are for a time-limited session rather than a stateless query against any point in a long-lived log.

Years ago, services like Friendfeed, Livefyre, and Convore had stateless long-polling APIs that returned a log of data, I believe. These kinds of APIs seem to have fallen out of fashion, though. There are still stateless long-polling APIs, but most of the ones I'm aware of don't return logs of data. For example, Dropbox and Box will let you query for a change notification against a starting position, but then you have to fetch the actual data separately.

That said, just because streaming APIs that let you set a starting position are rare doesn't mean they're impossible to make. My company (https://fanout.io) has built tools to help with this.

Edit: since you asked for a real example, Superfeedr is one such API: https://documentation.superfeedr.com/subscribers.html#stream...

Great to see. We actually ended up using Redis for our event stream after trying Kafka and the rest. Using the list extension module multiple list pop functions lets us get to 2gbps of throughput on a single redis node with AOF persistence. Using streams would make things even simpler and faster.

Did you have issues with Kafka? Curious to hear because I’m about to try and start using it for something at work.

It's just much more work to install and maintain Kafka, and it has issues with load balancing and recovery due to the design tying cluster ownership to partition data. With AOF persistence and a replica, Redis is durable enough for us and extremely fast with no maintenance.

If you absolutely need Kafka then it's still a good option, although I'd recommend looking at Apache Pulsar [1] for a better design. It separates storage and compute for better performance and scalability while giving you features like per-message acknowledgements.

1. https://pulsar.apache.org

Could someone ELI5 what's a redis stream ? I thought that pub/sub mecanisms were some kind of a stream already.

Redis Pubsub is fire-and-forget, so if you aren't listening when a message is fired, you'll never receive it. Redis Streams store messages, so you can connect and read all the messages since you last checked. It's a similar model to Kafka.

I've not been following Redis a lot, so this one is new to me. I read antirez's original Streams blog post[0] but it feels like it's missing some stuff--is it known out there how this interacts with Redis Sentinel or, separately, Redis Cluster?

[0] - http://antirez.com/news/114

In the Redis sense, streams are just a new data structure (value type of a key), so there shouldn't be any special concerns with regards to Sentinel, partitioning and/or clustering in that sense.

That is my intuition, yeah, but that makes me pretty worried about hammering hot keys and the like with regards to Cluster.

I also don't mind running Kafka, though, so I may not be the target audience.

is it strictly in-mem? for instance if redis is restarted is there an option for unread streams to be saved with redis persistence

The built-in persistence mechanisms should be compatible with Streams.

This article from Brandur does a great job of explaining a use case for Redis Streams [0]. As part of that he begins by explaining what they are, how they work, and a bit about how they differ from Kafka.

[0]: https://brandur.org/redis-streams

Redis streams will allow for direct modeling of time series data. For example,

     redis-cli>XADD AAPL 1516899637000.0 open 221.25 high 222.1 low 220.90 close 221.50 volume 1121223234234
(All data is imaginary.)

From the blog post, it sounds like the RDB format will break. Is there any means to upgrade from the current unstable RDB format to the v5 one? Obviously there are no consumer groups in unstable now, so in theory reading from a v4 RDB should not be hard..

Here's a question from an old Redis hater (the note is important, since my question is going to be slightly biased - I disagree with a lot of core decisions behind Redis):

How is this going to be different from Kafka? And I don't mean implementation details, because these are always fun read. Kafka is on the market for ~7 years, during which it has proven to be oh-so-fast and pretty durable.

Oh, and while I'm at it. Here's another problem Redis geniousl added: a GIL. GIL is a great idea, but comes with huge tradeoffs. David Beazley spent years showing how many tricks you can play upon yourself with GIL.

So ... now you have Streams and GIL together. And you already have dicts (you call them hashmaps). I have a feeling you're trying to implement Python. If so, it's done. But come on, 3.6 is cool. And we're kinda solving the GIL problem. With PyPy. Which will blow your mind.

So yes, that's it. The Redis hidden agenda was to compete with Python (to be honest not a big secret... you can see that Redis works internally as an interpreter in a pretty obvious way), and now that you uncovered it, I'm going to say it aloud: we are going to exit in a few months with a new package system we are working at for 5 years at this point, based on the blockchain (proof of installation), which will kill NPM completely, so with a Python killer + an NPM killer we'll see what mind will be blown.

Chapeau bas. But apart from my rather obnoxious joking - how the whole Stream and GIL interoperate. I mean, this is technically a complex problem.

great response


How can you compare a programming language and a memory store? How are they even remotely comparable? Redis needs a syntax to exchange data with clients in a human-readable way.

Honest question: Why do you hate Redis?

It was meant as a joke.

Alas, I have been more than once bitten by poorly implemented GIL and in my personal experience any GIL is problematic in a high-throughput environment. It simplifies the problem of implementation of environment or API for the price of introducing global lock into the system, thus eliminating any possibility of lockless design. And we know it's doable.

I spent years trying to work around GIL in production environments and we have the tools about now to do it reliably. We have on_commit in Django, we have Celery with proper support for mechanisms like chords and a myriad of others.

I do get that Redis has a different approach to many things. But here's the deal - every time I had to rely on Redis as a critical component, there was a problem. Either there was that "one gotcha" in the config or I didn't understood something or else.

A while ago I was a part of discussion with the author of Redis on using Redis as cache. And quite some people, not only me, noted they had bad experience and their benchmarks didn't show Redis is better than Memcached. And, I think that's ok. Unless you pose your product as a competitor to memcached. Would I use memcached-db? No, it's a bad idea.

My problem is that Redis originally was a key value store with datastructures. And that's a great idea and not that many services let you model datastructures in a distributed way. I actually implemented a distributed system of progress control for a very critical piece of architecture at my current workplace. I'm a Redis user since 2014 in regard to commercial products.

My point of view is probably largely skewed but I don't understand and I never seen that explained why Redis tries to do everything. We have battle tested Kafka for streams, ElasticSearch and Solr for searching and Memcached for caching. Can't we get a super reliable data store that supports data structures? A data store that is _able_ to utilize the fact we have more than one core?

I know my view is unpopular, but I don't see addition of GIL as a genius idea. I simply don't see a coherent direction for Redis development.

BTW: is there a benchmark comparing latest Redis without the GIL and the latest with GIL?

Had to re-read a few times, but the comparison just seems like a poorly worded attempt at a joke/insult.

Also as a pinch-of-a-salt in the whole sugary environment of HN where every new feature is cool by definition. For example, what are the shortcomings of Kafka compared to Redis Streams?

And an extra note when someone says "it's hard to implement ... without locks and so". Look at Firefox. I dumped it's bushy tail around 7 years ago when switching to the apple-scented world for good.

I came crawling back. Why? Because people at Mozilla spent their time working on their product, finally coming to agreement that the fact your product is OS doesn't mean you can cut your standards in half focusing on features you like instead the ones that actually make it better.

Here's the link in case someone missed it https://hacks.mozilla.org/2017/11/entering-the-quantum-era-h.... I'm not a Mozilla fanboy, to be frank for a while foxy browser scored lower on my "software like scale" than Redis.

And they won me over. In 5 minutes. And it's been proven time and time again that performance is a feature.

p.s. Now that I write it I realize that for certain reasons I should just try to write the damn thing and instead of writing pointy comments let antirez rip my software apart :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact