
Redis Streams and the Unified Log - waffle_ss
https://brandur.org/redis-streams
======
sandGorgon
> _Redis streams aren’t exciting for their innovativeness, but rather than
> they bring building a unified log architecture within reach of a small and
> /or inexpensive app. Kafka is infamously difficult to configure and get
> running, and is expensive to operate once you do. Pricing for a small Kafka
> cluster on Heroku costs $100 a month and climbs steeply from there. It’s
> temping to think you can do it more cheaply yourself, but after factoring in
> server and personnel costs along with the time it takes to build working
> expertise in the system, it’ll cost more._

This. Precisely this. Redis Streams, if done right, will be a killer feature.
Redis is already pretty much omnipresent and has a great managed cloud story
(e.g. Elasticache). And it's cheap.

------
PaulRobinson
The unified log pattern - and event sourcing based on that - is a very
powerful pattern.

We are doing some work at my job right now inspired by this approach (in no
small part, that LinkedIn article). I hope we can open-source some of this
tech, or at least start blogging about the concepts as nobody really talks
about it, and the mechanics are a little tricky to get right.

We currently think the best approach is to bring ideas from DDD (specifically
the idea of a bounded context and a domain), a unified log which we call the
GEL (for Global Event Log), domain-specific event logs, event-sourcing with
CQRS and then some quite lightweight means to produce "projections" (different
structural representations of data).

We therefore have a few pieces that get names and become easy to reason about:
every event goes to the GEL. Domains have systems that listen to the GEL for
events they are interested in and map those events into commands on a command
service. Clients can issue commands directly, as well. Commands alter
projections and produce events back to the GEL. Clients and domains can query
a domain's query service.

It's not really rocket science, these ideas are as old as the oldest mainframe
you've ever heard of, but it affords incredible flexibility and unrivalled
performance.

We're using Kinesis for our GEL that has a read performance issue, but not a
big enough issue for us yet to look at alternatives - the zero-admin cost is
attractive to us right now.

For me, now, MVC and CRUD style applications look like really odd anti-
patterns. I'm amazed we got as far as we have with them.

~~~
alexatkeplar
Paul are you based in London? I've been thinking about dusting off my Unified
Log London meetup ([https://www.meetup.com/unified-log-
london/](https://www.meetup.com/unified-log-london/)) and it sounds like you
would have a ton to talk about there. Email address is in my profile!

~~~
galeaspablo
Ill buy you both a drink if you make this happen.

------
joelverhagen
We didn't use Redis Streams, but NuGet.org is built on this concept. NuGet.org
is the public repository for NuGet, the .NET package manager.

We have a public unified log (JSON blobs, behind a CDN) containing all events
about package publishes, edits, and deletes. The rest of our endpoints have
background jobs reading this log and updating different "views" of the package
data. Each event's unique ID is a timestamp of when it was added to the log,
much like Redis Streams.

We have found this is a powerful concept which has helped us build a very
reliable infrastructure where almost all public endpoints are static JSON
blobs served by a CDN. The only compute we need for customer API calls is a
search service.

We hope one day that our unified log (called the "catalog") will be used for
custom client needs and for package replication (e.g. corporations behind a
firewall that can't talk to NuGet.org directly).

API docs: [https://docs.microsoft.com/en-us/nuget/api/catalog-
resource](https://docs.microsoft.com/en-us/nuget/api/catalog-resource) Catalog
root:
[https://api.nuget.org/v3/catalog0/index.json](https://api.nuget.org/v3/catalog0/index.json)

------
simonw
Having worked with both Kafka and Redis, this certainly rings true for me:

"Redis streams aren’t exciting for their innovativeness, but rather than they
bring building a unified log architecture within reach of a small and/or
inexpensive app. Kafka is infamously difficult to configure and get running,
and is expensive to operate once you do. [...] Redis on the other hand is
probably already in your stack."

~~~
pram
My current job is maintaining Kafka infrastructure, dozens of clusters. I
literally don’t understand how it got so popular. It is by far the least
reliable software I have ever experienced. We hit so many bugs and weird edge
cases that essentially cripple the entire cluster and stop all producing.
Zookeeper in addition has been a constant source of headaches.

------
bruth
Great read. I agree that Kafka is overkill if even necessary for most
applications. I have been using NATS Streaming [1] which has the same
properties as Redis Streams for some time and it is wonderful.

[https://github.com/nats-io/nats-streaming-server#nats-
stream...](https://github.com/nats-io/nats-streaming-server#nats-streaming-
server)

------
markbnj
Can anyone who has done it both ways characterize the tradeoffs vs. using amqp
(say, rabbit) as the bus? I've been excited about this feature in redis, and
this is a very interesting use case for it, but when I read the part toward
the end regarding consumer checkpointing to prevent premature log truncation I
was thinking message queuing acknowledgement semantics basically deal with
that for you. I'm sure its more costly, but if you need to know whether
something's been consumed then you need to know.

~~~
simonw
The point of this kind of log architecture is that producers never need to
know if a consumer has consumed something, because the log is designed to
allow as many consumers as you like to reliably work through the log from any
given position.

As a producer, you just need to know that your message has definitely been
delivered to the log. What consumes it is none of your business.

Have you read [https://engineering.linkedin.com/distributed-systems/log-
wha...](https://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying) ? It's one of my
favourite essays on software architecture of all time, and it goes deep into
this concept.

~~~
markbnj
> The point of this kind of log architecture is that producers never need to
> know if a consumer has consumed something, because the log is designed to
> allow as many consumers as you like to reliably work through the log from
> any given position.

Right I think we're saying the same thing. In the article checkpointing was
introduced to enable safely managing the size of the log, i.e. it appears that
in this use case _somebody_ needs to know whether the consumers are caught up
in order to adhere to at-least-once semantics. It just seemed to me that if
you reach the point where you're rolling your own to solve that problem there
are other possible solutions.

I haven't read the linked-in piece yet but I intend to. It was also linked in
the article. Thanks for linking it here.

------
linkmotif
I'm finishing up an application that's built around a Kafka Streams topology.
Kafka Streams is quite nice and I am excited to put it into production to see
how it works.

All mutations run through the Kafka Streams application. A Relay webapp client
communicates with a graphql-java server which resolves GraphQL queries against
a gRPC service layer. This service layer either queries Kafka Streams state
stores, other state stores, or writes mutations to Kafka, wherein they are
processed by the Kafka Streams topology. Another Kafka Streams topology
indexes topics to Elasticsearch. Having separate topologies seems to make it
easier to reset them independently in case I want to replay state. All of this
is glued together with Protocol Buffers, which are a great complement to both
Kafka and Elasticsearch.

It is really nice to see Redis join this scene and to see more people getting
excited about building applications this way. Looking forward to checking this
out.

------
swah
I didn't understand why he couldn't keep staged and checkpoints in Redis as
well...

~~~
hackcrafter
I was thinking that too, then I realized he was wrapping the code in
`DB.transaction` blocks, which rollback on any type of exception, so that all
those staged/checkpoint entries only persistent if the block cleanly executes.

Pretty neat...

------
kbenson
People are coming out with their own cases where they've implemented this
pattern, and I'm sure there are a lot more that haven't bothered to note their
cases. That's because it's actually the simplest (and possibly most obvious)
solution whenever you have disjoint producers and consumers. I use it myself
through the disk. I have a time based directory structure, and within that
compressed blobs of data that are timestamped and have whatever other uniquely
identifying information is required.[1] In fact, I have no less than seven
distinct sets of data like this, some JSON, some HTML from scraped pages, etc

One set of processes retrieves and stores the data in the normalized
structure, another set of processes reads and processes them. If there's ever
a problem in the processing, the system can fail until someone looks into the
error and adds an exception or fixes the processing. This is useful for when
you can't be sure the remote side will continue to work as expected (I've
found API's to be only slightly more stable than random web scraping. Sure,
they may version their APIs, but that doesn't help when they don't communicate
new versions or when they are deprecating the old ones and you're hit with a
dead API endpoint it out of the blue).

Not only is the processing asynchronous, but the development is so
asynchronous, I've added data sources that I knew I would eventually want to
process (as adding a data source is fairly easy), only to get around to
writing the processor six months later, and the nature of the system means
that it's trivial to process the old data. In fact, the _normal_ operation
would likely note it is that far out of date and process the old data. The
only consideration that needs to be made is whether you want to optimize the
back-processing by aggressively caching, since that can greatly speed up the
process.

I really think the pattern it is the natural conclusion most people would come
to when presented with the right requirements. The real art comes from
recognizing when your current problem can map to a pattern such as this when
it doesn't appear to initially.

1: e.g. "/$base/$logname/%Y-%m-%d/%H/%Y%m%d%H%M%S_${unique}.json.gz"

------
Gepsens
Anyone who has worked with redis at scale knows exactly how the persistent
queue approach of Kafka is far superior for any robust system that relies on
the queue for truth and scales out (aka not your tiny crud app).

And Kafka is hard to deploy ? Please... If you deploy a redis cluster without
knowing the pitfalls of distributed systems, you'll have the same problems as
with Kafka. Also, soon enough you can deploy brokers over jbods without ZK,
what will be the argument then I wonder...

~~~
simonw
That's covered in the article:

"Pricing for a small Kafka cluster on Heroku costs $100 a month and climbs
steeply from there. It’s temping to think you can do it more cheaply yourself,
but after factoring in server and personnel costs along with the time it takes
to build working expertise in the system, it’ll cost more."

If your project can afford Kafka, use Kafka. This article is about achieving
the same pattern in any project that already has Redis.

~~~
orclev
I think Gepsens argument was that your not achieving the same pattern, rather
your achieving a subset of the pattern where several important considerations
in dealing with a distributed system have been ignored and/or are unsupported.
Once you do add those features back in then you basically arrive at Kafka and
all the complexity involved with standing it up, so Redis isn't really an
Apples to Apples comparison. While it certainly might work for certain use
cases (and indeed appears to do so), many of the use cases that Kafka is
necessary for would end up being just as complicated to setup Redis to
support.

I think this is largely a rephrasing of one of the findings of dealing with
NoSQL vs. Relational DBs, which is that for a subset of problems relational
DBs have been used to solve in the past, NoSQLs are perfectly capable of
replacing them, but not every problem a relational DB handles can be handled
by a NoSQL without investing significant time and effort layering more complex
systems on top, at which point you've basically implemented a poorly optimized
ad-hoc relational DB. In this case, Redis, with the new stream type is capable
of handling a subset of problems that have previously required Kafka to solve,
but it isn't itself a replacement for Kafka in all situations since there are
several important features of Kafka that aren't available in Redis.

------
tmd83
The system I work on as of now isn't really fit for a unified log design. But
there are definitely some critical part of the system that can use a high
performance centralized event log and can be made simpler and better
(faster/more reliable). The ability to decouple producer & consumer, fanning
in/out in both side, batching all are very attractive properties. But I live
in a transactional world and even though in quite a few cases I can make the
consumer idempotent the publishing of an event becomes tricky. If you are not
using the log as the only store but in addition to a traditional transactional
data store (databases for e.g.) how can you ensure that events are published
only and surely for a successful transaction?

The staged log record idea is interesting but it makes the database bottleneck
bigger specially if a lot of different transaction (spread across many tables)
now is bottlenecked on the same stagedlog table.

Maybe I'm missing something here?

~~~
bjt
I don't see the single stagedlog table being a bottleneck if you're just doing
inserts to it. Postgres is already doing appends to a single WAL internally.

~~~
tmd83
I have seen it happen though in such contexts (though in Oracle). I assume the
difference is that the database has much more freedom in how it writes to WAL
(batching and such) but in the table itself it has to do some degree of
coordination and locking.

The other big issue is that even if the insert into the shared table is fast
(simpler structure) the transaction hence the commit can be long due to the
complexity of the application transaction.

I will be happy to be corrected though if we are doing something wrong.

~~~
bjt
By my (admittedly limited) understanding of MVCC, one long running transaction
that includes an insert will not block other inserts to the same table. The
worst you'd see in that situation is if you were using a sequence for IDs, an
ID might be grabbed from the sequence at the beginning of transaction A, then
another ID grabbed and inserted into the table by transaction B, and then when
transaction A finally commits, the order of the IDs won't match the
chronological order of the commits.

------
netcraft
Is Kafka expensive because of the hardware requirements, the number of servers
needed or just because its relatively niche? Or something else?

~~~
tomnipotent
Common complaint is the reliance on ZooKeeper, which requires three servers to
get started.

~~~
WaxProlix
Yeah and it's not obvious that there's any way to get Kafka up and running
without Zookeeper, or what their interaction is. And you can get Zookeeper and
Kafka running on one machine, but it's such a pain and there's nothing to help
really in the docs.

~~~
poooogles
>And you can get Zookeeper and Kafka running on one machine, but it's such a
pain and there's nothing to help really in the docs.

What sort of thing are you after? Like a getting started for production
without creating footguns style how too?

~~~
WaxProlix
More like getting started in a dev environment; I want to play with a
technology on a laptop or my home compute machine sometimes before venturing
into production or recommending it to clients.

~~~
miguno
Also, since you talked above about the "lack of documentation" to get up and
running with a small Kafka deployment of only 1 broker and 1 ZK instance: this
is covered in the Kafka quickstart, which is probably what most people will
read when starting out with Kafka. See
[https://kafka.apache.org/documentation/#quickstart](https://kafka.apache.org/documentation/#quickstart).

Perhaps you had a different issue though?

------
rakoo
I feel kinda sad for CouchDB who has been providing the same feature for such
a long time yet still is perceived as an experimental db...

------
oplav
> It’s possible for a record with a smaller id to be present after one with a
> higher id, but only in the case of a double-send.

Doesn't this constraint only hold true if the XADDs to redis are completed
synchronously?

~~~
simonw
They are synchronous - "XADD returns the ID of the just inserted entry" so it
won't return until the entry has been inserted.

------
ngrilly
This article is excellent, like all @brandur writings.

------
frik
RE example "rocket-rides"

Can someone translate the example "rocket-rides-unified" code for non-Ruby
coders can read it too? (preferable to a C-like syntax language like Node.js,
PHP, Go or Java)

[https://github.com/brandur/rocket-rides-
unified](https://github.com/brandur/rocket-rides-unified)

