
Streams: a new general purpose data structure in Redis - darwhy
http://antirez.com/news/114
======
jrochkind1
One thing I like about this post is the story of how the feature came to be:
Someone who understood redis very well, thinking about the problem over
literally years, eventually resulting in a more targetted and goal-driven
thinking, and even that "specification remained just a specification for
months, at the point that after some time I rewrote it almost from scratch in
order to upgrade it with many hints that I accumulated talking with people
about this upcoming addition to Redis."

I've been thinking about this sort of thing for a while, wanting to maybe call
it "slow code" (like "slow food"). This is how actual quality software that
will stand the test of time gets designed and made, _slowly_, _carefully_,
_intentionally_, with thought and discussion and feedback and reconsideration.
And always based on understanding the domain and the existing software you are
building upon. Not jumping from problem to a PR to a merge. (And _usually_ by
one person, sometimes a couple/several working together, _rarely_ by
committee).

~~~
threeseed
> This is how actual quality software that will stand the test of time gets
> designed and made ..

No not really. It's called waterfall and we had it for decades.

And it was largely abandoned in favour of agile development because despite
architects spending months designing something there was always something that
was left out. Or the scope changed during those months. Or millions of other
things that change whilst your hidden away from your customers/end users
instead of delivering them new value every week.

For me personally the truth is somewhere between agile and waterfall. Some
upfront design but not the slow code you refer to.

~~~
jrochkind1
I disagree, I think "waterfall" (at least in stereotype) would be more like
coming up with the specifications in advance and then _sticking to them_
regardless of new information from the world or from the experience of
developing the software. That's not the story the OP told.

I also agree with the person who responded that there is a difference between
building products, and building tools for building products -- building shared
libraries/code/tools may require different approaches. One of the things I've
noticed about shared code, is it's _much harder_ to fix rethought decisions or
change paths. "Backwards incompatibility" doesn't even exist as a thing in
your local app really, at least you can always at least theoretically global
search and replace when you change APIs.

I don't think we've figured out any magic bullet process for developing
software. It's definitely not the stereotype of "waterfall", but I don't think
the stereotype of "agile" is it either -- at _least_ when it comes to shared
library/tools. And I'm absolutely convinced that those shared tools which do
well were almost always designed carefully and intentionally with
understanding of the domain, not just slapped together with stimulus-response.

------
femto113
I have a confusion about ID structure/format:

    
    
       The ID is composed of two parts: a millisecond time and a
       sequence number.  The number after the dot is the
       sequence number, and is used in order to distinguish
       entries added in the same millisecond. 
    

Does this mean for example that 1506872463535.11 comes after 1506872463535.2
(because 11 > 2)? If so that means treating these as decimals (which will be
easy to do inadvertently) will yield the wrong order (as would sorting them
lexicographically). If so it seems like something other than a decimal point
would be a better separator (colon perhaps).

~~~
antirez
Yes actually maybe it's a good idea to change the point with something else.
Thanks for the hint.

~~~
ryanschneider
Supporting `MAXSIZE` as well as `MAXLEN` on `XADD` would also handle a nice
Kafka feature (the ability to define your log size in either number of
messages or size on disk).

Something like: `XADD MAXSIZE ~ 2147484000 * foo bar` to cap the stream at 2GB
+ 1 node.

~~~
dvirsky
And if ids are timestamps, maybe we can define it as MAXTIME as well.

------
wyc
Projects tend to gain more and more functionality to match the new workloads
they're being used to accomplish, and it must certainly be a difficult
decision for project visionaries. Do I listen to my users and implement
features that will solve their new woes, but in return accept increased
complexity and higher learning barriers? Complexity sucks, but it's even
harder to say no to users in pain.

I wonder if this opens such projects up to disruption, in the original sense
of the word. I've seen the same teams forgoing apache for nginx, then forgo
nginx for haproxy when it matches their needs. With additional layers of
complexity, there may be accompanying opportunities for "simple but good"
projects to gain traction.

~~~
barrkel
It's also why software is a pop culture. Sophistication and completeness is
seen as complexity and cruft by each successive generation, who start
something new and simple.

I don't think it's very avoidable. Tech is genuinely getting incrementally
better, but it's usually in a sawtooth pattern.

~~~
freshhawk
I'll agree that sophistication and completeness is often seen as
complexity/cruft but it also _always_ comes with actual cruft as well since
improvement is incremental and breaking APIs is annoying.

My favorite aspect of this cycle is when some features in the complex software
become seen as so useful as to be required and standard, so when the new
simpler version is created they have to figure out a novel way of providing
that useful functionality in a simple and elegant way. And they do it,
sometimes knowing they have made a significant advance and sometimes without
knowing.

~~~
scott_karana
Do you have any examples of that second case?

Sounds too good to be true... but I'd _love_ to be proven wrong :)

------
ralusek
I love redis, and this API looks amazingly simple. I'm sure I can think up a
good use case for this, but the only problem I have with it is that time-
series log data of this nature is increasingly becoming the defacto source of
truth in the various models resembling some version or another of event-
sourcing.

Obviously the general thinking is that event sourced time series data allows
you to treat every other data source as derived state from the log data. If it
gets written to the log, it's safe and that's the only real data that cannot
be considered a redundant read layer. A common structure might look like:

1.) Event Sourced/Time Series Layer: Kafka/Kinesis>S3 takes in and saves log
data

2.) Operational State Layer: RDBMS Constraints / Application Logic determine
how operational state is derived from log data

3.) Indexing Layer: Query optimizations occur with redundant read layers in
what is essentially all just various forms of indexing. This can be RDBMS
index, ElasticSearch, MapReduce, Redis, etc.

Redis' place has historically been at #3, for many reasons. Whether an
application has an event-sourced layer in which their operational state is
derived from log data or is actually considered the primary source is
something that is hugely variable. I would say that most applications do not
make a distinction between #1 and #2, and just write state directly to an
RDBMS. But while the operational state may or may not be considered redundant,
depending on the application, the indexing layer is almost guaranteed to be a
redundant layer. The redundancy of the index layer means that Redis operating
purely in-memory allows the whole thing to be blown away with no consequence.
To move it two steps down the data model to the defacto source of truth is a
monumental shift in responsibility. Redis as an in memory caching layer has
only ever had me have a cursory awareness of its capabilities in terms of
saving to disk, but I would think that a fundamentally different use case like
this will have me taking a serious look at where that functionality is at
today.

All of this being said, there are plenty of use cases with kafka/kinesis which
are done today which actually don't even save the log data at all, and just
use them as an intermediary buffer to have multiple consumers on an event
stream. There's also nothing stopping us from just having one of the consumers
of this stream to be saving it to S3/Disk ourselves.

~~~
dbattaglia
I would imagine this is best used for the "store a ton of events with some
capped maximum size" Kafka use case (ie - realtime analytics, IoT data, etc).
I just can't imagine using Redis as your source of truth for an event sourced
system, especially without the partitioning and log compacting features that
Kafka has. Still this seems like a pretty amazing feature to have in Redis,
can't wait to start playing with it.

------
Sinjo
It's been a long time since I looked into this: is there now a way to
configure a cluster of Redis instances such that you won't lose messages on
node failure? If not, all the nice at-least-once delivery (or "effectively
once" when you add message dedupe) you get with something like
Kafka/Kinesis/GCP PubSub is gone.

If not, either people's messages don't matter /that/ much (which is fine, just
not great for most of my usecases at the moment) or everyone's in for another
round of "oh shit, where did the data go?"

Edit: Just in case we end up in CP vs AP datastore wars, please go read
[https://martin.kleppmann.com/2015/05/11/please-stop-
calling-...](https://martin.kleppmann.com/2015/05/11/please-stop-calling-
databases-cp-or-ap.html)

At-least-once delivery requires neither CAP consistency (linearisability) nor
CAP availability (any non-failed node must return a response in a non-infinite
time), but is a very useful property!

~~~
antirez
Hello, the streams have basically the same characteristics as any other Redis
data structure, that is, from the POV of a local node, you can configure
strong persistence on disk, but on node failures, you have basically different
tunable amount of _best effort_ consistency, it means that you cannot
guarantee no messages are lost. So basically this means that you can:

1\. Use the default asynchronous replication, and live with the fact (if the
use case permits this) that on failover, the message did not yet received the
slave that will be promoted.

2\. Use WAIT to force synchronous replication to N slaves. This will not
_still_ make Redis ensure you in mathematical terms that the failover will
pick a slave that received the message, under complex partitions, but narrow
the real world failure models leading to losing data to more "unlikely" cases.
Yet you have just best effort consistency but with better real-world outcomes.

So Redis streams will be good choice if one of the above is acceptable.

~~~
hueving
What is the consistency model of redis?

It sounds like anything can be lost in redis during normal HA operations even
with WAIT pushing to a majority of slaves. Is that right?

~~~
Posibyte
Eventually consistent unless you're only using a single node. I believe that
Redis itself commits to disk at various checkpoints in time, so if a fail
happens, you're really only guaranteed to fail over into a pool of data that's
consistent up to the last checkpoint of the node you're moving to.

EDIT: And as antirez said above, you can WAIT to force synchronization to all
nodes, so you would be pretty likely to fail over onto a node that has n-1
messages if it didn't sync in time. That still isn't guaranteed however.

------
jonny_eh
The clear writing/documentation, concise API design, and clever implementation
of antirez and the rest of the Redis team continues to amaze me. Redis is
easily my most admired OSS project.

------
jkarneges
Very cool! We've been doing time series with Redis using sorted sets,
referencing items by timestamps and integer offsets, and using the clock shift
workaround described in the article. Having this kind of thing consolidated
down into a few Redis commands would be handy. The API looks clean, too.

------
itaifrenkel
Two comments on effectively once stream processing.

1\. Consider adding an example for a stateful event stream processor client
that saves the last read stream offset in redis, together with its current
state and continues reading from that offset as an atomic operation. For
example, a client that sums a stream of numbers, in order to have effectively
once semantics would need to persist to redis the sum and offset together.

2\. Consider adding a stream read deduplication example to mitigate clients
that reinserted the same event twice. It is not clear how the client should
behave if it didn't get an ack and it resents an event. What is the correct
resending semantics so the reader would effectively dedup? What is the right
data structure used to dedup message ids without consuming too much memory,
etc...?

------
bbrunner
I've been using redis + resque[1] for a few side projects and I have to say
I'm glad that streams are getting first class support in redis. I was always a
little wary of hacking this sort of functionality on top of redis lists. It
worked, but it sort of seemed a little bit fragile.

[1] [https://github.com/resque/resque](https://github.com/resque/resque)

------
luhn
I'm very excited about this. I've been eying HTTP EventSource for a while now,
but there hasn't been a good solution for the backend broker. Kafka is
overkill and Amazon Kinesis' pricing isn't viable if you have lots of topics.
This fills the need perfectly and Redis is already part of my stack.

~~~
baileymiller182
Check out nchan.io, an nginx module that does everything you need to connected
EventSource to redis backed pub/sub.

~~~
denkmoon
+1, good to see other developers out there using EventSource and nchan. We've
been happily using both in production for almost a year now.

------
tuna
Amazing ! nit: Change COUNT to LIMIT while it still time. Also can this
primitive be used to replicate redis data instead of sentinel/cluster ?

------
hyades
Under what circumstances would one prefer Redis streams over Kafka and vice
versa?

~~~
agacera
I can think in some circumtances: 1 - You already have a Redis infrastructure
and don't wanna or don't have resources to deploy a full Kafka infrastructure
(3 kafka brokers + 3 zookeeper nodes)

2 - Kafka clients are not available (or are poorly available) for every
programming language. Redis has a simpler protocol, so it has more/and better
clients available and even if you use an exotic language, it is easy to write
a client to it (well... easier than Kafka)

3 - Kafka AFAIK does not have any internal cache implementation, so every read
is served from disk (+ page cache). This means that Redis Streams will
(probably) perform much better for use cases when the consumers need to fetch
data from old offsets.

edit: added reason number 3.

~~~
chicagobuss
re: client support - I dunno, this seems like a pretty comprehensive list to
me? I mean, there's even a rust client:
[https://cwiki.apache.org/confluence/display/KAFKA/Clients](https://cwiki.apache.org/confluence/display/KAFKA/Clients)

~~~
agacera
Confluent only officially supports the Java client (and now has a Python, Go
and .Net clients as well that I didn't know) and it is really recommended to
use the client with the same version of broker due to protocol
incompatibilities.

Most Kafka client implementations are open-source projects of their own, this
is also true for most redis clients implementations, but again: Kafka protocol
is much more complicated than Redis.

I haven't used Kafka with other languages besides Java or Scala, so I can't
really say how mature are the other clients.

But my point about how easy is to implement a client for Redis if needed is
still valid. =)

~~~
kasey_junk
They also support a c reference implementation (which is how they get others).

------
ricardobeat
Slightly off-topic, but could the blog be adjust a little for mobile reading?

    
    
        <meta name="viewport" content="width=device-width, initial-scale=1">
        #content { max-width: 800px; } // replaces width: 800px
    

seems to do the job, and also behaves better in narrow desktop browser
windows.

~~~
antirez
Thanks! I'll change it tonight.

------
erulabs
Fantastic news, congrats Salvatore! Cannot _wait_ to replace some hacky Kafka
uses with tried-and-true Redis4! :)

~~~
ykler
In what sense is Kafka (or your use of it) hacky? I have never used Kafka, but
I have always thought of it as being more solidly engineered than Redis but
also more complicated and perhaps tricky to deploy (based on blog posts I
read).

~~~
sidlls
In any context its used where the demand (by whatever measure you care to use:
bandwidth, throughput, message durability, etc.) doesn't justify it or isn't a
good use case of Kafka, for starters. That happens all the time, because every
data and infrastructure engineer in the Bay Area wants to put Kafka on his
resume.

~~~
hueving
But that doesn't explain why Kafka has any minimum the output required. Does
it have usability issues?

A good tool should be able to be used at any scale.

~~~
erulabs
Kafka has very poor tooling in my experience (a folder full of fairly buggy
bash scripts...), and due to ZooKeeper requires a lot of operational care. For
example, it's extremely easy to destroy a Kafka cluster by bringing a new,
empty ZK server online with newer but incorrect data in its volume. ZK will
happily trash the entire cluster thinking it has new instructions. So network
isolation is key, which, while obvious, is another source of potential
failure.

Kafka also has the JVM, which requires a lot of love to scale in my
experience. I do not want my programmers messing around with GC options when
writing to what should (to them) be exposed just like a regular file handle
(except distributed across many systems). I strongly prefer to avoid Java
applications at all costs - in my experience it takes years and years and
years for Java based infrastructure to become relatively stable & reliable
(see ElasticSearch 5.0, or ask anyone who has been oncall for a Tomcat based
application). This is almost certainly personal bias, but it's my bias
regardless.

Redis also has a _massive_ number of tooling / monitoring / ecosystem
advantages, including hosted options, and can run on a single instance without
configuration changes from the developers perspective.

I also have personal reasons to prefer Salvatore's work over the work of
Confluent.

~~~
takeda
> For example, it's extremely easy to destroy a Kafka cluster by bringing a
> new, empty ZK server online with newer but incorrect data in its volume. ZK
> will happily trash the entire cluster thinking it has new instructions.

How does that happen? I mean a new, empty ZK server with never data than the
rest of the cluster?

Also, please note that ZK is not meant to be a database, but a coordination
service, it's guarantee is to have all nodes being always in consistent state
and neither of its nodes allow to make any changes if there's no quorum. So if
a new node somehow has more recent data with higher serial number it's
expected that remaining nodes will sync to that.

~~~
erulabs
Exactly right - in my case the situation was another team accidentally
bringing a new ZK node with "bad" but "new" data online. Had there been
network isolation, no issues. Had there been static cluster identifiers, also
no issues. It was a messy environment, and it should have been prevented by
operational diligence, but my point is redis is "harder to mess up". As on on-
call engineer, I'll always go with simpler, foolproof tools. Another qibble is
how gnarly the client-side driver for Kafka is...

I don't hate Kafka, I just don't like ZK and find redis has better tooling and
a better track record at my shops :)

~~~
takeda
In order to connect a ZK host to the cluster its IP needs to be included in
configuration of all the nodes.

It's hard to accidentally add node to a cluster. A person who can
"accidentally" add a ZK node has enough permission to do a lot of more
devastating things accidentally.

~~~
erulabs
Yep. All it takes is service discovery and a not-totally-familiar with ZK jr.
sysadmin.

This is all in service to my point about simplicity and safety.

------
twic
> However a special ID of “$” means: assume I’ve all the elements that there
> are in the stream right now, so give me just starting from the next element
> arriving.

I can already see lazy users just repeatedly reading $, and then dropping
messages when they arrive faster than they read them.

Might it be safer to instead have command to ask what the latest ID in the
stream is? You'd start off by using that to work out where the streams are,
then construct an XREAD command to read from there. To construct your next
XREAD, it should be easier to update the IDs from the ones you just read,
rather than fetching the latest IDs again. Maybe.

------
philjackson
I've had a break from Antirez' blog for a while - it's fun to go back and see
how good his English has gotten!

------
Gigablah
Surely any discussion about the concept of logs and mention of Kafka should
have a reference to this excellent article by Jay Kreps on LinkedIn:

[https://engineering.linkedin.com/distributed-systems/log-
wha...](https://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying)

~~~
eternalban
It [OP] was a bit cringe worthy, frankly. Obviously the Kafka papers about
'reconsidering the log' and 'unified view' and all that have been out there
for years now.

The Kreps article is quite excellent and well worth the read.

------
poorman
_> A final important thing to note about XRANGE is that, given that we receive
the IDs in the reply, and the immediately successive ID is trivially obtained
just incrementing the sequence part of the ID, it is possible to use XRANGE to
incrementally iterate the whole stream, receiving for every call the specified
number of elements._

Love it. No need for an expensive SCAN command.

------
itaifrenkel
The consumer groups proposal breaks the FIFO abstraction of a stream by
allowing multiple clients to process a single stream.

Have you considered adding a semantic layer inside streams that allows each
client to consume a substream? In effect the stream becomes multiplexed
substreams.

If substreams makes the design too complex... have you considered server side
stream 403 semantics? When a stream is manually deprecated it enters an
immutable state and provides a redirect response with a link to another
stream. This would allow multiplexing and demultiplexing streams without
changing the client implementations too much.

For completeness I would state the obvious when fifo grouping is needed: 1\.
Scaling stateful event processing by splitting streams and adding more clients
(CPU limit) 2\. Scaling cross region replication by splitting streams and
adding more tcp connections (network limit) 3\. Handling more throughput by
splitting a stream into two redis nodes (disk I/O limit)

~~~
manigandham
Don't use consumer groups and every client will get a complete copy of the
stream. What is broken with that?

~~~
itaifrenkel
The use case I'm referring to is when the client must be sharded to avoid CPU
or network or disk bottlenecks.

~~~
manigandham
Then that's exactly what consumer groups help with but it sounds like you want
partitioning then - which is exactly what Kafka does but with a little more
automation.

Run multiple Redis instances and use a simple hash based on whatever key you
want to route messages and get the throughput you need. It's probably never
going to be part of the core Redis logic but should be possible as a module to
do the routing when used in a Redis cluster.

~~~
itaifrenkel
What I'm referring to is changing the partitioning dynamically by splitting
streams, not just redis nodes. Here is one implementation example
[http://docs.aws.amazon.com/streams/latest/dev/kinesis-
using-...](http://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-
java-resharding-split.html)

Doing it without server support is tricky.

------
philsnow
Is there a reason to choose millis as the granularity instead of micros or
nanos? Is it because there's a stronger expectation of machines in a cluster
agreeing on what milli it is "now" vs the other granularities?

I'm kind of thrown by the idea of putting the timestamp / stream-id in the
XADD command, I would have thought the server would assign that, since one of
the strengths of redis's single threaded nature is consistency: what's in
redis is the truth. If you allow clients to specify timestamp, what happens
when ntpd isn't running on some? I probably misread or misunderstood it.

Could you allow specifying `$` as the timestamp to tell the server you want it
to use whatever it thinks the current time is as the timestamp / stream-id?

~~~
antirez
Hello, the stream implementation does not need for the different servers (for
instance master and its slaves) to agree about the time. Simply the server
that receives the XADD command will generate the ID (and the time part of the
ID) to attach to the item. All the other participants in the replication will
accept the same ID, because clients will use " _" to specify the ID, while the
command is rewritten to slaves with a specific ID. Example, I run into the
master:

    
    
        127.0.0.1:6379> xadd stream * a 1 b 2
        1506977609865.0
    

But this is replicated as (output of redis-cli --slave):

    
    
        "xadd","stream","1506977609865.0","a","1","b","2"
    

So XADD allows to specify an ID just for replication / AOF pruposes, not
because clients should actually specify an ID normally. However of clients
really want to do that, they could but at the risk of getting errors, for
instance:

    
    
        127.0.0.1:6379> xadd stream 10.0 a 1 b 2
        (error) ERR The ID specified in XADD is smaller than the target stream top item
    

Redis will anyway not accept any ID which is smaller than the current top-item
ID.

The reason why it was chosen to use milliseconds instead of nanoseconds is
because, for most applications to query for sub-millisecond ranges is likely
not useful, so to see even larger numbers in the ID maybe is just unpleasant
if not useful, however we are still in time to change this if there are good
motivations. But being the time the one produced by the local host, after a
failover the IDs are generated by another host. Milliseconds can still more or
less match with good time synchronization, but nanoseconds? So it's like if
this additional precision will be just used to store non-valid info.

~~~
philsnow
> Redis will anyway not accept any ID which is smaller than the current top-
> item ID.

I was reading on mobile earlier and maybe missed this point, excellent. I also
didn't realize that clients would use (star) to specify the ID and that the
receiving server turns that into an actual ID before replicating it / AOFing
it.

> But being the time the one produced by the local host, after a failover the
> IDs are generated by another host. Milliseconds can still more or less match
> with good time synchronization, but nanoseconds? So it's like if this
> additional precision will be just used to store non-valid info.

I think most failovers necessarily take longer than a millisecond so any
resolution smaller than millis would _probably_ be okay, but yeah this is not
a compelling reason to switch to micros/nanos. My suggestion to switch to
micros/nanos was more to try to reduce the number of collisions requiring the
server to de-dup / assign sequential sub-epoch numbers to events arriving
during the same server tick. I guess that's not a big issue though.

Thanks for the reply, Salvatore. Redis is one of my favorite codebases and
projects.

------
vtuulos
Redis Streams could be a nice real-time counterpart to
[http://traildb.io](http://traildb.io): For instance, use Streams in Redis to
record data in real-time and periodically store it in TrailDBs for long-term
archival and analysis.

------
baystep
This may actually solve an issue I was about to tackle, which is high-speed
notification delivery. Currently I was going to do a nasty wrestle of PUB/SUB
with Lists and blocking keys to try and get a cluster of message processing
servers to digest notifications when they get added to the "queue". My purpose
in this is notifications, as in actual push notifications for mobile/web/etc.
This seems like it fits perfectly with what I had planned. Especially since
this ensures message delivery instead of fire-and-forget as you mentioned. As
a question though... is the XACK commands working in your current branch?
Since that is key to my usage of ensuring message consumption.

~~~
baystep
Essentially, I was going down this route if anyone's curious....

[http://code.flickr.net/2012/12/12/highly-available-real-
time...](http://code.flickr.net/2012/12/12/highly-available-real-time-
notifications/)

------
ricardobeat
Could the difference between `MAXLEN ~ 1000000` and `MAXLEN 100000` be handled
internally, by marking the overflowing items as deleted until a whole block
can be removed? Looks like this tombstone functionality is already planned,
would make the API simpler.

~~~
philsnow
I would suggest making the efficient behavior the default and let people use
`= 1000000` when they know they really want the expensive but exact behavior.

------
plasma
Are there plans for a "unique count" over an XRANGE?

I currently use multiple Sorted Sets (one set every 5 minutes) and Union 30-60
worth to produce a "rolling window" of uniques.

I can see an alternative where I just request a unique count of elements
within an XRANGE.

------
Goopplesoft
What sort of compression do the blocks undergo? E.g. does periodicity of the
timeseries help reduce the the space of the 64/128bit timestamp? Gorilla[1]
style compression would be great, although it'd likely make sub block level
range queries tough.

[1][http://www.vldb.org/pvldb/vol8/p1816-teller.pdf](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf)

~~~
antirez
Hello, yes IDs are delta compressed so they use actually just a few bytes per
entry (often just 2) instead of 16.

------
callumlocke
Mirror:
[https://webcache.googleusercontent.com/search?q=cache:https:...](https://webcache.googleusercontent.com/search?q=cache:https://blog.even.com/anti-
perks-4e2e49e80983)

------
erulabs
I have one follow up question - is TTL a planned feature? Being able to set a
TTL on the _stream itself_ and -also- on the messages would be extremely nice.
While MAXLEN prevents a queue from being extremely large, I also want to
remove "stale" data after a configurable time period.

Use case: A log of network latencies, where a user might currently `XREAD`
with a timestamp 10 minutes in the past, would be able to save on memory usage
by expiring log entries > 10 minutes, and then being able to `XREAD STREAMS
strm 0` and let Redis (and therefore the infrastructure, not my code, manage
data retention).

Also, how does this work re: evictions? Say a node is at max memory, will
entire _streams_ be evicted, or (I hope) the oldest messages in the LRUed or
LFUed queues.

------
orixilus
there's video[1] at the end explaining this new feature

[1]
[https://www.youtube.com/watch?v=ELDzy9lCFHQ](https://www.youtube.com/watch?v=ELDzy9lCFHQ)

------
yawniek
here is an approach i successfully used before: since timestamps are read from
clock the epoch could be a persisted value and a few nibbles could be used for
the instance id and a sequential number. with that you get 64bit numbers for
easy use and computation without loosing the time information at the cost of a
simple transformation function. it simplifies the interface much and generally
makes things faster.

many clients will even not need that timestamp anyways.

------
vikiomega9
Why not just use sequence ID? I'm confused about why a timestamp is important.
The sequence ID gives us ordering, is always guaranteed to be increasing.

~~~
antirez
Because with the way stream IDs are conceived you also get time-based range
queries for free. With time series this is very important in many use cases.

~~~
vikiomega9
I see, so this composite structure is in lieu of having two distinct fields
exposed in the API?

------
grandalf
This looks very cool. Can't wait to try it out.

------
notamy
After reading this I'm still not sure I understand. Time-series data I can
see, but is there a use-case outside of this?

~~~
noncoml
Poor-man’s Kafka?

------
richardjennings
I am looking forwards to trying this out as an easy access cqrs / event
sourcing entry point.

------
silverwind
The use of `+` and `-` in XRANGE seems inconsistent. Why not use `0` and `-1`
like LRANGE?

~~~
antirez
Because they do not mean a position, but a special ID.

~~~
lsiebert
It seems that "$" is a special ID for the last message, as opposed to the last
possible message.

I would humbly suggest that "^" would be a suitable symbol for the first
message in a stream. ^ and $ are used in regex (and vim) in a similar way.

That way you could write "XREAD BLOCK 5000 STREAMS newstream ^" and get all
the messages in a stream from the beginning, and then block until a new
message comes in all with a single command. You would still be able to add a
count if needed, to prevent client flooding.

~~~
antirez
Exactly, $ is the last message ID, + the greatest, - the smallest. I chose the
dollar exactly because of regex assonance. However the corresponding ^ is
kinda useless because with XREAD we specify the last ID we got, so it would
result in not returning the first element of the stream. It means that it's
more useful to specify just 0 in that case.

------
EGreg
Funny, we also built a lot of our technology around the Streams concept.

[https://github.com/Qbix/architecture/wiki/Internet-2.0](https://github.com/Qbix/architecture/wiki/Internet-2.0)

------
bootcat
nice to know, they have added a new data structure !

