Hacker News new | past | comments | ask | show | jobs | submit login
Redis 6.0 GA (antirez.com)
680 points by ingve 71 days ago | hide | past | favorite | 205 comments



Contrarian view here.

I really liked Redis for a long time. Simple, fast data-structures in-memory. That's it. Along the way there have been some nice enhancements like Lua, which solves a lot of the atomicity issues. But somewhere after 4.0 I feel they have lost their way. First they saw all the attention Kafka/event-stuff was receiving, so they baked-in this monstrous, complicated, stateful streams feature. Now we have SSL (do people really expose Redis on the internet??), ACLs, cluster stuff, and most relevant to me a new wire protocol.

To my thinking, Redis fit very well in the "lightweight linux thing" category. It seems they aspire to be enterprise software, and this may be a good move for Redis-the-Business, but it's not good for users like me who just want simple in-memory data-structures and as little state as possible. Forcing a new protocol that adds very little value (in my opinion) also seems like a great way to alienate your users.


I understand the sentiment, but things are a bit different than they may look. About SSL, there is no way out of this. I opposed to this feature for a long time, but simply now because of changes in regulations, policies and so forth, a lot of use cases are migrating to SSL internally even if Redis is not exposed. And frankly it is really a mess to handle SSL proxies now that everybody look like needs encryption. So what I did was the best I could do, when checking for PRs to merge: 1) Opt in, not compiled by default, no SSL libs requirements. 2) Connection abstraction, there is no SSL mentioned inside the code. Everything is in a different file.

About the "Kafka" thing, actually streams were wanted by myself, very strongly, and not suggested by Redis Labs. Let's start thinking at Redis ad a data structure server and at streams without the consumer groups part (which is totally optional). It was incredible we had no way to model a "log" in a serious way. No time series easily, using hacks and with huge memory usage because sorted sets are not the solution for this problem. But then why consumer groups? Because for a long time people had this problem of wanting a "persistent Pub/Sub": you can't lose messages just because clients disconnect in most use cases. Btw this Kafka monster is a total of 2775 lines of code, including comments. 1731 lines of code without comments. In other systems this is the prelude in the copyright notice.

But ACLs, in order to manage to survive 10 years without ACLs we had to resort to all kind of tricks: renaming commands to unguessable strings. Still with the panic of some library calling FLUSHALL for error because the developer was testing it in her/his laptop. Really ACLs have nothing to do with enterprise, but some safety is needed. The ACL monster is 1297 lines of code, and is one of the most user friendly security system you'll find in a database.

Actually all those features have a great impact on the users, huge impact on day to day operations, and are designed in order to be as simple as possible. And Redis Labs actually has only to lose from all this, because those were all potential "premium" features, instead now they are in and every other Redis provider will have it automatically as a standard. So... reality is a bit different, and it's not a conspiracy to gain market shares or alike.


My company has no choice- we have to use ssl internally for regulatory purposes. Right now we're using an stunnel solution for having out clients connect to redis- I am super excited that I'll be able to remove this workaround in the future!


There is a software named Hitch https://github.com/varnish/hitch that is super useful for enabling SSL to different services, like Redis.


Putting the server behind TLS is a minor part of the process.

If you want any kind of HA, you'll have multiple instances of Redis, with changes replicated from the writable node to the others.

That traffic needs to be encrypted too - and redis (pre 6.0) knows nothing about TLS.

So now you need a tunnel to each other Redis node.

Oh but you also want Sentinel to make sure a failure means a new primary node is elected... and sentinel doesn't speak TLS either, and they need to both speak to each other, and the redis nodes... so that's another set of TLS tunnels you need to setup.

I setup redis on 3 nodes for a customer, if you tried to draw the stunnel setup on paper, it'd look like you're illustrating a plate of spaghetti.


How is stunnel a workaround? Honestly that would seem like an ideal solution to me - "do one thing, do it well". Stunnel can focus on having a rock solid TLS implementation and Redis can focus on being a great DB.

Am I missing something?


Redis streams have been a phenomenal addition to my toolbelt in designing realtime ETL/ELT pipelines. Before I had to make do with a way more complicated Pub/Sub + job q (Tasktiger). That all became redundant thanks to Redis streams.

Thank you!

It would really be awesome if there was a built in way to attach/spill/persist individual streams to external data volumes (older/busy streams could run out of memory) and have it support hot swapping.

> Btw this Kafka monster is a total of 2775 lines of code, including comments. 1731 lines of code without comments. In other systems this is the prelude in the copyright notice.

Hilarious, funny and informative :D Upvoted!


Great to hear you enjoying streams - you got my upvote :)


Do you mind sharing more how you use Redis streams for ETL pipelines?


Happy to talk shop anytime, feel free to reach out.

In short - I like to have audit-able dataflows in my pipelines. Streams are inherently timestamped and provide a great way to see how data changed over time. For one, if you had a bug or regression in the pipeline, you can precisely track down the impacted time window - no guessing needed.


Just a quick thank you for your work. Redis has always been a fantastically easy-to-approach key storage for me and other people I’ve worked with.


> Really ACLs have nothing to do with enterprise, but some safety is needed.

Huzzah!

Let’s stop calling basic security features “enterprise”.

Locking basic security features behind a paywall is a protection racket, pure and simple.

Small companies, and lone developers, need security, too.

If we are making software for consumers who won’t know any better, why not encourage (and make it trivial) for fledglings to do the right thing from the very beginning?

Why does every single company have to go through the same security mistakes on their way to Series A/B/C? Why can’t we learn from our mistakes and make the doing the right thing not just accessible, but easily accessible.

Hat tip to antirez, et.al., on this one.


1000%. Basic security (and that includes an evolving basket of features) are not just for "enterprise." Neither from the developer's POV nor a user's. How many database hacks do people have to have reported as front page news about unsecured databases — where users didn't even change default security credentials — before people finally get that any database running anywhere is at risk — even on-prem with only your own people accessing it. Security is not an "advanced" feature. It is a foundational requirement before you even load data into a cluster.


I find it funny that the SSL cert on antirez.com is throwing PR_END_OF_FILE_ERROR


I don't know what your browser is doing, but it is not behaving correctly. Maybe you are connected to a corporate VPN that is doing weird things to TLS?

   bash-3.2$ telnet antirez.com 443
   Trying 109.74.203.151...
   telnet: connect to address 109.74.203.151: Connection refused
   telnet: Unable to connect to remote host


Security guy here. I'd argue that SSL and ACL are always good things to have, especially for systems that store data.

Modern security practices typically dictate a defense-in-depth approach. The ideas is that you will be compromised at some point (no security is perfect) and as such you should make any compromise that does happen as minimal as possible--you want to prevent attackers who get a foot in the door from rummaging around your network.

A key part of any defense-in-depth strategy are things like encryption and authentication/authorization. If you're using redis to store any kind of sensitive material, you want to make sure that only people on your network with the appropriate auth credentials can access it. This is one of the easiest ways to prevent drive-by data theft.

From here, SSL is a logical step. You need to ensure bad actors can't sniff network traffic and steal credentials.

I can't speak to streams or the other features you feel complicate Redis, but I think SSL+ACL are very important tools for increasing the cost to attackers that target redis instances leveraging those features.


Many systems don’t do TLS in process. TLS proxying is probably more common for systems deployed in the cloud (e.g. running nginx on the same node, or using a cloud load balancer).

AWS and GCP don’t even give you a way to install a cert yourself— you MUST use an ELB or bring your own certificate.


This is highly dependent on your environment. I work in finance and there is legislation saying we must encrypt all traffic on the wire.

Legislation aside, this also goes back to a defense-in-depth strategy; TLS proxying only works if the network behind the proxy will always be secure. You might be able to get away with running TLS on the same host as redis, but in all other cases I can think of you're going back to the 90's-era security policy of having a hard shell and a soft underbelly--anything that gets into the network behind your TLS proxy can sniff whatever traffic it wants.

EDIT: It occurs to me that you seem to be hinting at running redis as a public service. In that scenario it makes perfect sense to use a TLS proxy for versions of redis without SSL. That said, it's still important to encrypt things on your private network to ensure you aren't one breach away from having your whole stack blown open.


Regulated industry SRE here. I've run Redis at scale through stunnel, terminated through a proxy, and once Redis supported it, in-process.

In-process won by a mile, despite my feelings about redis from an operational perspective (read: not good). The added choreo, monitoring, and overall moving parts were strong contributors against external proxying.


Not sure what the argument is here. Many systems _do_ have TLS in process. Also, there are plenty of regulations/certifications that require encryption in transit. Terminating at a load balancer means you have an unencrypted hop.

> AWS and GCP don’t even give you a way to install a certain yourself.

If you mean as a service they provide, well, that’s what ACM[1] is for, no? I assume Google Cloud has something similar.

[1] https://aws.amazon.com/certificate-manager/


ACM doesn’t talk to EC2. AWS Enterprise Support will tell you with a straight face to let them handle TLS termination on the ELB/ALB, and keep things unencrypted in the VPC. Their claim is that the VPC has much more sophisticated security than TLS.


This is probably true. You can't eavesdrop on network traffic in a VPC because you never touch the actual layer 2 network, it's a virtualized network tunneled through the actual physical network, so you will never even see packets that aren't directed to you. I don't think there is a really strong security rationale for requiring SSL between an ELB and one of it's target groups, but from a regulatory standpoint it's probably easier to say "encrypt everything in transit." This is why ELBs don't do certificate validation as well. It's unnecessary and extremely cumbersome to implement well, so if you need to have SSL between the ELB and a host, you can just toss a self-signed cert on the host and be done with it.


Can you see traffic between hosts in the same VPC, even if they wouldn't otherwise have access via security groups?

The scenario I'm imagining is that an attacker manages to gain access to one box in VPC, and from there is able to snoop on (plaintext) traffic between an ELB that does TLS termination and some other box in the VPC that receives the (unencrypted) traffic.

If you encrypt all inter-box traffic, then this attacker still doesn't get to see anything. If not, then the attacker gets to snoop on that traffic.

I'm not sympathetic to lazy arguments like, "if an attacker has compromised one host in your VPC, it's game over". No, it's not. It's really really bad, but you can limit the amount of damage an attacker can do (and the amount of data they can exfiltrate) via defense-in-depth strategies like encrypting traffic between hosts.


You can't snoop on traffic between hosts in the same VPC. Here is a good video explaining why https://www.youtube.com/watch?v=3qln2u1Vr2E&t=1592s. The tl;dr; is that your guest OS (the EC2 instance) is not connected to the physical layer 2 network. The host OS hypervisor is and when it receives a packet from the physical NIC, if that packet is not directed to the guest OS then it won't be passed to it. So the NIC on the guest OS (your EC2 instance) will never even see the packets that are not intended for it. Of course this gets slightly more complicated because AWS added some tools for traffic mirroring. So theoretically someone with the right access could setup a mirror to a host they control in the VPC and sniff the traffic that way. But if someone were able to pull that off then you're likely f'ed either way.


You can enable backend authentication on the ELB if you want that (you need to provide the certificate)


Right, they let you do it now but iirc that is a relatively recent feature that was resisted for a long time. And for good reason I think. The purpose of certificate validation is to verify that the remote machine is who they say they are. But those guarantees are already provided by the VPC protocol. In order to impersonate a target instance you would need to MITM the traffic, which isn't possible in a VPC.


FWIW, you can re-encrypt from ALB/ELB to your instances.


This doesn't scale when you're using multiple replicating Redises, because every Redis needs to communicate to every other Redis. With TLS in-process, you can just sign keys and distribute them to hosts and you're done. With a tunnel like ghostunnel[1] (which we at Square built precisely for this type of purpose), you end up having to set up and configure n^{n-1} tunnels (which requires twice that number of processes) so that every host has a tunnel to every other host.

[1]: https://github.com/square/ghostunnel


Latency alone is a sufficient reason to do TLS in process.


> Now we have SSL (do people really expose Redis on the internet??).

This is 2020. The "hard outer shell, soft chewy center" model of security is dead and it's not coming back. Modern datacenters and cloud deployments use mTLS (mutually-authenticated TLS) between every service, everywhere, all the time.

There are some massive benefits to this. For starters, you can limit what services talk to one-another entirely through distribution and signing of keys. Yes, this adds a burden of complexity if you go that route. But suddenly you don't have to care as much about (for instance) many network-exploitable vulnerabilities in your services because someone with a foothold on your network can't even get talk to your service in the first place if they don't have the right TLS cert, which is only on the handful of machines and only readable by the specific services that are legitimately allowed to connect to it.

This is a much stronger guarantee than firewalling alone (though you should also use firewalling), because multiple services can be running on a host but only the applications that are allowed to talk to your service will have read access to that key.

On the flip side, you have stronger guarantees that the service you're connecting to really is the service you're expecting it to be. If you're storing sensitive information in Redis, you can know for sure that the port you've connected to is the right Redis and not another, less-sensitive application's.


I hear this in marketing presentation or company pitches all the time.

After that I as a consultant get access to the network and apart from some test that a developer stood up nothing matches the glossy talk.

Thanks god for Wireguard. It has truly been the savior deploying encrypted networks.


But of a selection bias, don't you think? Places that do it right probably don't need to hire consultants.


I used to think I was special: someone who comes in and discovers these ugly pockets of pus. The kind that with a single poke and they burst creating a very ugly problem.

In talking to other people high up on the technical side I realized it is a norm. The only question is if what I call "velocity of awesomeness of the product" makes the warts less important.


> After that I as a consultant get access to the network and apart from some test that a developer stood up nothing matches the glossy talk.

Or in my case recently... someone has generated a root certificate for the internal CA that uses an insecure crypto scheme, and Chrome still throws up a security error requiring users to click past the warnings to access the site.

"Can you generate and roll out a new cert please? This isn't really 'security'?"

"Oh we will get to it, can you just use the one you already have?"


> "Oh we will get to it, can you just use the one you already have?"

Cue 2 years going by. Same situation, except that the certificate has been regenerated with the same insecure crypto scheme.


I agree that many shops don't work this way, but they absolutely should. Anyone not developing a good defense-in-depth strategy, and just assuming that their edge firewalls will take care of them... well, they're one step away from a break-in and a data breach.

Our industry needs to do better, and not brush off good security as "glossy marketing talk".


The first step towards fixing a problem is to admit that most of our "best practices" are marketing talk. We do not actually do it.


I'll just say I work at a large SF unicorn where we do this. We're not at 100% (getting anything to 100% when you're big enough is impossible), but the vast majority of everything is behind TLS 1.2 with unique certificates per server/app pair.

We're hoping to use SPIFFE/SPIRE to bring adoption even higher.


I'm very impressed. I have only seen this in one place. It was a very old school brokerage and even they were running that over IPSEC network


Redis is painful to use in a highly regulated environment where all data must be encrypted in transit, all access logged and audited, etc. Personally, I think the requirements are over the top and focus on the wrong things a lot of the time. But it is what our compliance people say we must do.

We've spent hundreds of hours cobbling together a system that meets our regulatory requirements and still performs well. These features go a long way toward addressing this pain point. I think they've done a decent job making the extra complexity optional, too.


> Now we have SSL (do people really expose Redis on the internet??)

There are no secure networks. Your options are vpn, third party ssl, or ssl in the service. Sometimes, your datacenter/cloud will guarantee "secure" network (ie: manage vpn for you).

But in many instances having ssl "inside" can be simpler.

Postgres also offers secure transport.


>> do people really expose Redis on the internet?? The same logic could be applied to exposed MongoDB and we know there's been a plethora of leaked data in recent years.


how is this comparable? SSL is secure transport - if you leave you mongoDB wide open people will just as happily steal your data over SSL. The problem was never MITM attacks on in-flight data.


What about cross datacenter replication.


Is nobody using containers? SSL termination through a local proxy sidecar? That gives you the best of both worlds.


I don't run Redis, or anything else that has data storage, in a containerized environment. Those are dedicated machines to a dedicated purpose and I already have resource slicing and prioritization in place. They're called "virtual machines".


You might be fully aware, but containers doesn't have to be docker/kubernetes. Previously OpenVZ, and now LXC/LXD is great for replacing full VMs in a lot of scenarios. The isolation is great and it's way less resource intensive than full containers.


Quite aware, though others probably aren’t so it’s worth mentioning. But in my case, I press a button and I get an AWS or DigitalOcean or GCP instance.

I like containers for my stuff. It’s silly, IMO, to doubly encapsulate my datastores.


Adding SSL to redis is feature creep so you recommend instead running it as a container with a proxy sidecar.


Which will look exactly the same for absolutely any Server you put behind it.


Why not simply use wireguard instead of each tool implementing its own ssl support?


It's worth noting that VPN / SSL proxies provide box to box (or process to box) encryption, whereas native SSL support provides process to process encryption. The difference being that if an attacker manages to get access to the box then it becomes easier to capture traffic due to it going unencrypted between the app and the VPN/SSL proxies. Fundamentally, native SSL support provides strictly better protection than just VPNs or SSL proxies.

Now, given the context this may or may not be a distinction that you care about, but there certainly are times where you really do care.

(Besides, if I'm running a tcpdump on a box to try and figure out why the network is going wibbly I'm a lot happier knowing all traffic is encrypted and I'm not going to accidentally capture some PII. I've had to tcpdump within docker containers before too, so putting everything in containers doesn't necessarily solve this.)


I think this could be workable, but it probably depends a lot on context.

One reason off the top of my head would be regulatory/compliance issues around how things are encrypted. wireguard is relatively new, and some certifications required to do business in specific industries (finance, healthcare, etc) mandate protocols with a minimum level of maturity. wireguard may be good, but many regulators would probably not find it acceptable without a longer track record.

On a more concrete note, I'd consider any system that handles authentication to be inherently broken if it had no way to keep those credentials safe out of the box. TLS has long been a cheap-ish way to do this, as it's widely available and well understood by both implements and regulators.


Why virtualize the entire network layer when all you want is transport layer security?


I'd feel more safe as an admin by knowing I only have 1 port and 1 app (ie wireguard) being public than 10 with their own ports and security (ie redis and others).


This isn't about making things public, just resistant to tampering and sniffing. Yes if you want to connect networks together then wireguard is a good choice.

If your devices are already on the same network and instead you close down the firewall and you move everything into wireguard you've just moved your problem.


Exactly, far too much overhead. Also doesn't allow clients to easily connect.


It's much easier to configure once wireguard than configure on each application the ssl mecanism. Redis for example is very easy to use normally, but adding ssl makes it quite harder to setup/use.


WG is young, it probably didn't exist in a stable form when TLS took hold in most projects.

It's possible it may dethrone TLS in the future


This is the "vpn" option. It's a valid option. I don't think tightly coupling ssl is always a good idea - I just don't think it's a bad idea as an option/feature.

I don't use haproxy to secure my telnet sessions - I use ssh.


WireGuard is amazing, but I don't want one network interface per service. We have TCP/UDP ports for this.


I suppose I shouldn't be surprised at this level of pedantic cherry-picking, but yet... I still find myself bristling.


It's a bit unclear what you're meaning?

Reading over the comment you replied to, I was thinking similar myself. :/


I see Redis as a toolkit that collects a number of solutions to hard distributed system problems in a single tool. It is great for developers that have a number of use cases for these kinds of things but for which there is no need or justification to spool up yet another cluster of containers/vms/servers/load balancers/etc to support it. Redis already has to do these things to be reliable and consistent; directly exposing this ability to clients and modules is a very logical thing to do. Like it or not, Redis is a platform now.

If you really just want fast data structures in memory, use memcached. If you somehow feel that Redis is a better solution for you, perhaps you should carefully consider that you may be placing more weight on its platform features than you realize.


SSL is a pretty important feature for almost all apps that you run in the datacenter. The idea is not to securely send Redis data to an end user on an untrusted network, the idea is to reduce the blast radius of a compromise inside your datacenter. A good example is that Slack postmortem from a couple weeks ago -- they had a proxy running inside their datacenter, and it could be convinced to make connections to internal addresses. If the service it was trying to connect to required the client to have a valid TLS certificate, the proxy would likely not provide the right credentials (because who uses client certificates on the Internet), and the connection would simply fail. A big security bug would manifest as a higher error ration in the service, instead of letting an attacker poke around in their user data. (Network based policy is also good, but is often too broad a brush. You might want the proxy to be able to talk to a database server in your network to store some results; now you can't simply add a firewall rule that says "no traffic may pass to the internal network".)

Finally, you might remember that internal NSA slideshow with the "SSL added and removed here ;-)" when talking about how they stole user data from Google's internal network. After that leak, rollout of internal mutual authentication/encryption accelerated, because people were actually inside the network observing internal RPC calls. It wasn't theoretical, it was happening.

Ultimately, mTLS is a pretty simple way to get a large increase in security. It has its disadvantages; you now have another "moving part" that can interfere with deployments and debugging (an attacker can't tcpdump, and neither can you, easily), but given how many large companies have exposed user data through unintentional interactions, it is something worth considering. It's a technique where you have to go out of your way to cause surprising behavior, and that is always good for security and reliability.


I'm a dev advocate at Redis Labs so I'm just going to reply about the last point about the protocol from my developer PoV.

Thanks to RESP3 I was able to write this client that, combined with Zig's comptime metaprogramming, can do things that no other Redis client that I've seen can do.

The user gives the desired response type and the client is able to provide appropriate translation based on what the RESP3 reply is. This would still be possible with RESP2, but v3 makes it much more robust and explicit, to the point that the ease becomes transparent without looking magical and/or triggering confusing corner cases.

https://github.com/kristoff-it/zig-okredis


Agree on the Kafka, disagree on the SSL. There are compliance factors in place for certain use cases (PII).


As a comparison, we run (and I think this is pretty common) nginx proxy servers that point to app servers. The proxy servers handle SSL to the outside, whilst the connection to the app servers is simply http. Pretty sure that is an acceptable solution in most cases. So then this would apply to the SSL argument here as well.


All network traffic that leaves a host should be encrypted. You could have an exception for a physically isolated network in a secure cage, if you're adventurous. But most of use are in cloud environments, so encrypted traffic is required. Even with VPCs and Security Groups, you don't want to rely on network ACLs alone to prevent data from being intercepted.

If Redis does not support encryption natively, then you have to run a gateway like stunnel on every redis host. The redis clients mostly all already support connecting to a secure socket, but the server and cli client require manual stunnel configurations. Native support for encryption just removes this extra setup.


Encrypted doesn't mean TLS necessarily, and neither it means that it has to be at layer 7.

For the use case you are suggesting, wireguard network among your hosts would be simpler and without the need for each application to handle TLS.


And how would we then connect our section of Windows workers, which are needed to run some proprietary software?

As long as wireguard is Linux-only, it's not an universal solution. TLS is.


>> Pretty sure that is an acceptable solution in most cases

No, it's not. In almost all high-security compliance audits you are required to have the data encrypted anytime it's in-flight.


Some of our systems require in-flight and at rest encryption.


You can stick a proxy in front of apps that don't have features you need like mTLS, tracing, metrics, etc. to get those. Google "service mesh" to explore that space. But to some extent, I think it's all a bit easier if your apps just do the right thing out of the box. Less moving parts. Better integration testing.

Like medicine, every piece of software you use has effects and side-effects. If the advantage of the effects outweigh the disadvantages of the side-effects, then something is a good deal. But if you can avoid the side-effects entirely, that's best.


We did, too, when we were in startup mode. Now, nothing runs unencrypted internally.

Most tooling uses TLS, because when you do this at scale, you automate your CA and it is much easier to securely deal with than, eg, ssh certs. But we do use (LDAP centralized) ssh as well, mostly for humans.


Personally I'm jumping up and down for ACLs. I went so far as to implement a proof of concept Redis proxy that added ACLs a couple years ago, before I heard that they would be in 6. ACLs may be niche, but when you need 'em, you need 'em!


> To my thinking, Redis fit very well in the "lightweight linux thing" category.

It sounds like you don't follow Redis then.

That ship sailed years ago. Redis has at least 10 major features in addition to the caching you're talking about, including search. Redis is a kind of database now.

If you just want a cache, use memcached.

Half of my jiras at one company were related to enabling SSL for Redis due to compliance reasons (all for internal use.) Now those can be closed.


We found streams to be a breakthrough feature for pub-sub type data on IoT devices. That it can both be low-latency pub-sub and a stateful, short-lived cache is quite powerful to improve performance for many queries to the types of data generated by cameras and high-frequency sensor devices.

http://atomdocs.io/ https://github.com/elementary-robotics/atom


Having TLS support in the main client is useful because AWS only supports AUTH if you enable TLS. Running Redis without AUTH can be kind of dangerous because Redis can kind of speak HTTP* (I think you can define custom commands to fix this) so if you have web hooks in your system and don't properly filter internal addresses then you might allow external parties to run Redis commands against your system.

* it's been years since I looked at this so maybe Redis now ships with inbuilt protection against this.


> monstrous, complicated, stateful streams feature

It's two data structures (which were already in Redis for other reasons!), and an automatic sequential identifier. Everything else that's "stateful" about it is client-side state—the server is still just a data-structure server. A Redis stream is basically just a Redis sorted set that's coherent in the face of clients trying to consume it paginated as other clients insert into the middle of it.

Also, the code is in one file (https://github.com/antirez/redis/blob/unstable/src/t_stream.... ); that file is ~3KLOC. It's just another Redis Module, isolated into its own set of functions with no impact on the codebase as a whole. It's just one that's so widely applicable, to so many use-cases that people were already using Redis for (through Sidekiq/Resque/etc) that it makes sense to ship this particular module with Redis itself.

Would you get upset about bloat if Postgres upstreamed a highly-popular extension? It already has nine or ten installed by default, and a few more sitting in contrib/. But, of course, even upstreamed, none of those extensions are enabled by default, adding runtime overhead to your DB; you have to ask for them, just as if you were installing a third-party extension. Same here: if you don't use the Streams module, there's no overhead to its existence in the Redis codebase.

> do people really expose Redis on the internet??

Cloud DBaaS providers expose Redis instances "over the Internet", in the sense that they're in the same AZ but not within your VPC. To the extent that you can wireshark a data-center's virtual SDN, they need to encrypt this traffic.

Even PaaS providers do things this way, since they usually lean on third-party DBaaS providers. E.g. all of the Redis services you can attach to a Heroku app are consumed "over the Internet."

If you're using Redis through an IaaS provider's offering (e.g. AWS ElastiCache, Google Cloud Memorystore) then you get the benefit of them being able to spawn an instance "outside" your project/VPC (i.e. having it be managed by them), but have it nevertheless routed to an IP address inside your VPC. That might be enough security for you, if you don't have any legal requirements saying otherwise. For some people, it's not, and they need TLS on top anyway.

> cluster stuff

Have you looked at how it's done? It's just ease-of-use tooling around the obvious thing to do to scale Redis: partitioning the keyspace onto distinct Redis instances, and then routing requests to partitions based on the key. It's not like Redis has suddenly become a multi-master consensus system like Zookeeper; the router logic isn't even in the server codebase!


If Redis is already good enough for you, what actually changes for you to start disliking it? You don't have to use any of those features.


> do people really expose Redis on the internet?

Do people really still send database traffic unencrypted over unencrypted internal networks?


SSL/TLS is super important. I'm glad Redis added it. Now if only Varnish would get in the game.


The question would be if by adding those additional features, the experience to use the "basic/original" features got more cumbersome or the hardware requirements did change a lot. My guess it hasn't changed that much.


Super-simple, good-enough things don't last too long. They die when the ecosystem changes, or when another super-simple player comes along and looks a bit more shiny.

It's just how the world works. You have to conquer to survive.


SSL/encrypted connections is a requirement in some regulatory frameworks. For example, transmitting PHI needs to be done over an encrypted connection.


> do people really expose Redis on the internet??

You've never browsed unprotected IPs and ports huh? So many random redis instances just lying around.


> do people really expose Redis on the internet??

Accidentally on a hobby-server, because docker-compose automatically opens up firewall ports.


On SSL, with some corporate guidelines on moving to cloud providers you must prove comms between all hosts are encrypted


Thanks antirez for all the work you've done on redis. Personally I can say that no one had such an impact on my work as you did. Reading through your source code is educating and inspiring.

You are a hero.


Thanks, to me it means a lot that somebody out there perceives my work as something of positive :-). Live long and prosper.


Dude... I do not think you know truly how many people love Redis. For every contrarian armchair quarterback here on HN there are 1000 people who are out using/coding/hacking w/ Redis and not caring about the noise. Myself included. Thank you and all the contributors for the hard work on an absolutely killer tool.


We use Redis in production for millions of customers and never had an issue, super solid and the code looks great even from someone with only a poor knowledge of C. I'd definitely investigate the code further if I had the time as I'm sure it would be very educational. Thanks too for your engagement with the community, I think everyone appreciates that a lot.


I don't use redis, but the code quality is so high and it helped me understand how to write good looking C APIs which are also performant. Congratulations on the release.


I would also like to extend my sincere thank you to @antirez for building such a wonderful piece of software (lean & mean) that has not only helped me personally but has also played an instrumental part in the growth of several large tech entities over the past decade. Hats off to you sir and thank you again!


I'll also take the opportunity to say thank you, for creating Redis. I tried Redis when it first came out, mostly just experimenting with it. I had not used it in production for many years and have recently made aggressive use of Redis 5 as a diverse caching layer, replacing Memcached among other things. It has simplified my service infrastructure and it has been super reliable. It's delightfully easy to use. Probably my favorite software to work with right now.


I have to join the OC. Redis impact on my work has been huge. Thank you Antirez


Not only that, but one of my favourite and most comfortable t-shirts is a Redis Labs shirt. Kudos all round


Lots and lots of people - we met at RedisConf and it's still a fond memory! Thanks for all your work!


Your way of writing code is an inspiration to new developers like me. Thanks!


"Redis 6 is the biggest release of Redis ever, so even if it is stable, handle it with care, test it for your workload before putting it in production. We never saw big issues so far, but make sure to be careful. "

Congratulations, team!


I have a bit of a tangent question for more experienced back-end developers: where do you fit Redis (or other caching mechanisms) in the traditional MVC model? I haven't had a use-case for Redis yet, but I'd like to know how should I approach the architecture of an app using Redis.


We use it to crash our servers on larger customers, by using it to cache all our user entities, pull all of them out at runtime filter them in PHP then stampede ourselves when we clear the cache


8D


Well, the use for a cache is caching expensive operations. Sorry if this is just stating the obvious, but I'm not sure how else to answer how it fits into traditional MVC operations. It could be a front-end HTTP cache (although you'd probably use a CDN for that instead). It could be caching something expensive to look up or calculate, for which it's fine to not have up-to-the-second-current value.

Many people at least in Rails also use Redis to hold a queue for background processing, where it ends up working well, although hypothetically many other things could too.

You can also use redis for all sorts of other stuff that isn't caching or a queue.


You can use it to keep track of session cookies across multiple web workers.


I use it for caching, temporary content, pubsub, and distributed locking.

It's been particularly useful in load balancers / proxies for authorization, managing csrf, and tracking session activity to auto-log out users. I do this with OpenResty.

In async job or internal rpc systems, I use pubsub and streams for distributing work, ensuring exactly once processing, and getting results back to the caller.


Redis is a flexible tool. Some things you could potentially do:

- This DB query is pretty slow. Cache the result in Redis.

- I'm using server-side sessions, but I have multiple servers. Where should I store session data where it'll be super fast, but all my servers can access it? You could use Redis.

- I need to do distributed locking. Use Redis

- Simple job queue? Redis.

- I'm processing a lot of inbound webhooks, and I want to avoid trying to process the same one twice. I'd like to keep a list of all webhooks I've seen in the last week, and quickly check if a new webhook is in the list, but it needs to be really fast. Redis.

Basically, Redis has support for some nifty data structures, and it's very fast. There's a lot of places in most apps where being able share a very fast list, set, map, stream, whatever between servers can be useful. Of course, all the above uses cases can be solved by other more specialised tools too, and in some cases better.

(That being said, it's so useful and generally applicable that you should be careful not to ignore fundamental issues. For example, if you have an unavoidably slow query, then by all means cache it in Redis. But if all your queries are slow because you forgot to add indexes, maybe you should fix that instead of using Redis to cache every query!)


Redis can be a very powerful tool, it can also be a sign that something has gone wrong.

Redis is a high performance key/value store with adjustable consistency. It can be used as a caching layer, and it can also do a solid job of representing a basic message queue. It typically fits in on the data layer to store computed results for fast retrieval, but it can also behave like a read replica (storing data from outside your domain).

That being said, when Redis becomes a critical part of your stack for performance reasons it usually means something has gone wrong in your data layer. I often times see teams use an antipattern of taking data from their database, changing it's shape to what they want, and then storing that new model in redis. If you're the only consumer of your data, then your data model is incorrect and you're using Redis to paper over that fact.


This doesn't sound like the worst way to reduce expensive reads of normalised data. Is the implication that it should have been solved with views, was incorrectly normalised, or that a document/key-value store should have been used other than Redis?

Edit: I suppose a hypothetical can go many ways, it could be a poor data access pattern. What was the root cause in some of your experiences?


If your data model is too slow to work for your use cases, your data model is wrong. If your data model can't be indexed by your data store, your data model or your data store is wrong. If you're storing large amounts of performance sensitive normalized key-value data in a column based SQL database, again something has gone terrible wrong.


Agree with you, that's bad, there are also places which use redis as a primary data store. That's actually messed up.


Is this that bad? At our company we are using redis for real time recommendations. And this is our primary database for storing accumulated statistics about items and users. It has been important to us to have a quick way to share the data between the workers and we chose redis.


You're fine. The simplistic interface that Redis provides to sets, and therefore empowering pragmatic recommendation engines, is the perfect tool for this job.


There's nothing a priori wrong with using redis as a primary data store if your data is temporal and your durability needs are aligned with it's model.


For my site, I use Redis in addition to PostgreSQL for these purposes:

- User session storage

- Asynchronous "queues" for background processing using Streams (was able to eliminate RabbitMQ from my stack when I switched to this)

- Rate-limiting with a GCRA (via https://github.com/brandur/redis-cell)

- Bloom filter checks, specifically for making sure people aren't using passwords in data breaches (via https://github.com/RedisBloom/RedisBloom with data from https://haveibeenpwned.com/passwords)


We use redis for session storage, jobs with celery/redis-rq, view fragment caching, and certain query caching.

It can really give you a good boost in performance especially on frequently accessed pages were the content rarely changes - database queries are often "expensive" - so for frequently accessed data that doesn't change frequently - such as product descriptions - it can be a huge help.


STREAMS, best pubsub solution imho. I used it as a backing store for an MQTT frontend once, and also for generally coordinating worker processes to handle background tasks.


Just to add a bit on that, if you dynamically generate channel names according to a validatable naming convention that any consumer can predict (ideally with a client lib for generating them), you can do pretty complex message passing that doesn't blow up code complexity. Combine that with the locking and consumer groups built in, it's pretty much distributed computing "for free" even if stuck with multiprocessing for runtime scaling (e.g. Python/JS without the builtin concurrency or multithreading of VMs/hosted languages).


I made a poor man's rpc using hashsets for persistence and pubsub to notify for new calls Works great!


We use it for storing user sessions, for caching responses from a third-party API we access, and for imposing per-IP address rate limits on the use of that API. We've also previously used it for lightweight worker queues.


IF you want to use it for caching, THEN you would use it to cache stuff for your controller.

But, Redis is much more than caching. It supports all kinda of fun data-structures like sorted lists, timeouts, sets, pub-sub and more! You can almost think of it like memory that is held by another process. In that way, there are SO many uses cases.


I think a good example of this is session storage - just store the session with a ttl and now redis will automatically "expire the session" when the time is over.


Not sure about other DBs, but you can set a record TTL in DynamoDB as well.


cassandra supports TTLs as well


We use it as the main database for low latency (less than 150ms) response time machine learning services. Store a pretty massive amount of data as well - close to 750GB.


That's really dangerous. What will you do if data is gone?


Redis has persistence.


Isn't that just writing to disk every n seconds? Even if n is 1, data will be lost.


Redis has RDB (snapshots) and AOF (append log) with configurable fsync policies. You can also use replication to avoid a single server failure.

https://redis.io/topics/persistence


You can configure Redis for different persistence strategies. The default is to sync every second, but you can sync on every write.

The chapter on Redis persistence is, like Redis documentation in general, quite readable and informative.

https://redis.io/topics/persistence


yes - persisted and geo distributed


Redis can do far more than just caching.

However for your specific use case, considering a typical MVC web app with RDBMS data storage, you would add a check at the beginning of your Model method to return cached data if it exists, else go through the method execution and write the data to the cache just before returning it back to the controller. This way the cache would be 'warmed' on first call and data will be served directly from the cache (memory) next time till it is cleared, saving expensive disk I/O.


You need to be careful with caching inside models, because you want your models to reflect the current, completely up-to-date state of the application. Conceptually, the best place to do it is inside controllers, where you know when you can serve data which isn't completely up to date.

With skinny controllers, very often you have some specific places (eg service objects or similar layers) where the controller logic lives, and that is where you can do your caching.


Our main use case in $JOB for Redis is distributed locking. We usually do not need key-value storage and even if we do, we just go with DynamoDB instead.


Are you using the method[0] suggested by antirez for distributed locking?

0. https://redis.io/topics/distlock


We use Redisson, yes.


There's lots of functionality that Redis provides beyond a key-value store. For example the data structures that it supports. These are very powerful on their own. Also I understand after considerable investments vendor freedom is a bit of an illusion, but you know, if you can choose an OS technology under the hood, it's effectively like not nailing shut a door, but still leaving it closed.


I actually use it as primary database on some parts of the mmorpg I’m developing, Redis actually has ACID capabilities so it is actually very suitable to use as primary database in gaming platforms. That being said main game server requires Mongodb due to sheer size of the data.


Well, if you have only a single instance that doesn't support threading it is trivial to get those properties, but what about durability? Do you realize you store all of the data in the RAM?


Redis can dump and fsync data to disk WAL every query, and can run multiple commands in a transaction, so it can provide the highest guarantees possible for a database.

The catch is that the data must fit in RAM, you can only use 1 core for query processing, there is no support for long-lived transactions and no built-in SQL support.

Obviously in an ideal world something like Redis would be useless since proper databases would cover all use cases, but unfortunately the state of database software is disastrous.


This is your second comment in this thread about Redis being in-memory. Redis can function in-memory only but it also has at least 2 persistence stories (rdb snapshots, aof logs), and out of the box is configured use rdb.


Your data still has to fit in memory though.


Redis got a feature called AOF, I use it for every single command to save the command history to disk.

This does obviously decrease write performance of the Redis quite a bit, but read performance is mostly fine.

It is all about how many writes versus reads you expect per second. My system expects at most 1 write per sec vs 100 reads, so performance is fine and this way it is ACID compliant.


RAM is cheap and lots of realistic data sets never exceed some tens of GBs, even in extreme scenarios. What's wrong with keeping (a copy of) all data in RAM? It's what makes Redis fast yet simple.


Nothing wrong if it is a cache, I would have problem using it as a primary data store for important data as many do.


Here's a terrifying secret for you: if a dataset is small enough and used intensively enough, SQL databases like PostgreSQL will eventually have all of it in RAM. Better ditch those too!


why? what is wrong with the persistence mechanisms it has?


For PHP applications Redis has been a must for storing sessions, especially when distributing load between multiple servers. It is also used to collect data for Prometheus exporters, for example, for languages that don't share memory between requests.


It's definitely not a must when using multiple PHP servers, but it sure is common and useful.


As well as caching, we make really heavy use of Redis for Leaderboards for our games. The sorted sets are perfect for storing score along with the userid. Scanning the leaderboard, finding neighboring scores, etc are all really fast operations. This could probably be done with a number of other storage system but we already used Redis and we've never had a problem.


Isn't MVC a pattern for UI layers?

I'm not sure redis, a database, is relevant.


It is many things, but started as more of an architecture for user-data interaction, where the data is in a computer system - and the user wants to interact with it (the user in user-model-view-controller unfortunately fell off at some point).

See:

http://heim.ifi.uio.no/~trygver/themes/mvc/mvc-index.html

As for where something like redis fits - I don't think it'd show up on the design that concerns mvc, no (it could be a cache inside model, inside controller - or even inside view.. Caching ui fragments for example?).


I always use it as my default session store. I build out large python backends and I also have been using it as a celery backend, and prefer for it to memcache for those type of tasks also.


1. Use it for caching

2. Use it for quick lookups for user accounts

3. To queue up jobs that need to run whenever the job runner has slots available.

4. Use it to crash your entire web stack when you accidentally clear your redis instance


Use it to cache certain request payloads that are guaranteed not to change for a certain amount of time (e.g. 1-minute stock market aggregates that only change once a minute).


I use it as a general cache, especially for data that I know hitting the database will be expensive. I also use it for pubsub.


We use Redis to store user sessions. Spring Session Data Redis is an awesome project for this.


Just restart your memcache server. Now you will see the point of redis.


mostly as an intermediary for pubsub architectures. ZMQ is what I've used.


Really strange that how back-end developer you never had a use-case for Redis yet.

First you could check at

https://redis.io/topics/data-types-intro

If you look at single data type you can see how redis takes care about complexity(indicating the Big O notation) of each single data structure and operations.

Many devs use it for caching but in my opinion is super nice for evil-write applications.


I mostly work on solutions used in-house by the client. The most used app that I had created was used by maybe 50 people at the same time, and it was mostly manipulating spreadsheet-like data, so querying the database directly was fast enough.

I know broadly what Redis can be used for, I was just asking for some practical tips.


> Threaded I/O

Woah, I missed that redis changed their mind about this. Clearly haven't been paying attention.


It didn't, but we identified something that could be done without throwing away our vision. Basically Redis remains completely single threaded, but once is about to re-enter the event loop, when we have to write replies to all the clients, we use threads just for a moment in that code path, and return single threaded immediately after. It is not as performant as full threading, but we believe the right thing to do is sharding and share-nothing. Yet this 2x improvement for the single instance was almost for free so... btw it's disabled by default.


>btw it's disabled by default

Why is that?

Thanks for the explanation on Threaded I/O, I was about to ask the difference between Redis 6 and KeyDB.


KeyDB multithreads the entire event loop from end to end. This provides better performance [1], and combined with MVCC will put us in a place to support more expensive commands like searches without blocking other clients.

1. https://docs.keydb.dev/blog/2020/04/15/blog-post/


Makes sense, thanks for clarifying!


Congratulations to the Redis team, it's always nice to see improvements to the cluster story and tooling.


Threaded IO, Server assisted client caching, and Redis proxy in my opinion is an important milestones. Threaded IO will fully utilize multiple cores; and improvements are amazing. Server assisted caching will reduce client round-trips. Redis proxy will remove the debate of which client library to use, Jedis for example has not allowed reading from replicas or doing multiplexing distributed commands; proxy is gonna solve all those gripes. I can't wait for it to be stable and production ready.


This is great timing. I was just spinning up a new service using redis this morning. Thanks for the constant work you put into this project.


One caching pattern I find myself doing a lot on Redis is where multiple clients try to access the same cached value, but if it doesn't exist only one is allowed to revalidate it, all other clients wait until it's revalidated and get notified with the new value. Currently for the clients it involves:

1. subscribe for updates on the key you're trying to retrieve 2. retrieve cached value or set a lock 3. if lock acquired, unsubscribe, then fetch value 4. set the result in a key and publish event with the value

I wish there was a command that does: retrieve value of a key, if it doesn't exist lock the key and notify me with an id, if it's locked subscribe for the next update on the key. With a second command that acknowledges the lock with a new value

Now I know this can be implemented and I've done so multiple times. It's just that it's tricky to get right and consistent.


RESP3 looks very interesting, unfortunately I made the mistake of writing my own RESP2 protocol and now I don’t have time to upgrade to RESP3. I guess I’ll move to some third party C++ client library.


RESP3 is a strict superset of RESP2! You can just add to your implementation and end with RESP3 finally :-) It means also that a RESP3 parser can parse any RESP2 reply without issues, in case you use it against a RESP2 server.


Oh that is great to know, because there was a very specific reason why I implemented my own RESP2 client that was not possible with other libraries.


out of curiosity, what is that reason?


I need to pipeline around 100000 commands every 30 seconds as a transaction. So if any of the 100000 commands fail due to database failure, then I need to crash the server and restart.

That is the easy part though, problem is these commands are coming from 12 different cores with possibly that cores will increase to 24 on future. One core might issue 5000 commands, other core might issue 20000 commands, there is no possibility to predict how many commands will come from a specific thread. It is extremely important that this is done with highest performance possible otherwise my system will have bunch of other issues with catching up on other non-db related client-side commands.

So I devised a special Redis pipeline structure where each of the cores can issue commands without any kind of mutex during the course of a tick (server is tick based), mutexes for the threads are locked only at the end of the tick to compile different pipeline commands coming from different threads into one giant pipeline command along with their asynchronous callbacks.

So practically I'm utilizing a giant redis pipeline with 12 threads on Redis client side where I'm locking only 12 mutexes at the end of the tick, this is extremely high performance as far as letting server process other non-db related client requests while all that data is getting written to server.

Truth to be told it is an over-engineered crap and I guess it might be possible to implement something similar with other redis client libraries but this is something I already build so I would like to keep it as it is.


I know you've probably had a look at it and decided it wasn't worth it, but could it be possible to generate a LUA script on the client side and EVAL it once it's completely generated ?

You'd need to periodically SCRIPT FLUSH to not let old scripts linger


I think it can be done through LUA, or even just by using "MULTI" command but I would still have to handle asynchronous replies on the client-side where it would get hairy regardless.

In any case I'm fairly certain there are multiple ways to solve such problems, normally I wouldn't even bother writing an in-house database client library but RESP is so amazing and simple that it doesn't take much time build a client library for Redis.


Woa, that sounds pretty heavy duty stuff. Thanks for the writeup!


Congratulations @antirez and the team! Can't wait to play with it.


Having client cache a subset of data is very much needed for the most frequently accessed data.. but it seems the various clients will yet have to implement this? (it could have been the best feature in the list)

Previously I had to implement this myself -> https://github.com/Munawwar/redicache


We use it also for queueing messages to be consumed by logstash. Not as pub/sub but regular list and has never failed.


Redis is a workhorse.


Our redis guys do that as well, but it has ran out of memory on big days. Couldn't pop out of it fast enough I guess.


Great work. I’ve been quite critical of Redis for its lack of TLS before.

This is good to see. Well done!


Why does redis recommend to not use huge tables? I'd like to experiment with using DPDK with it, but that requires huge tables


Because when Redis forks to persist, every write will copy a 2MB huge page, and that's not good. All the memory will end to be copy-on-write-ed: latency spikes, big memory usage, ...


What is ACL? Can anybody please explain



Looks like content is not available


Sorry, link fixed now.


Perfect!


>So what changed between RC1 and today, other than stability?

>...

That's quite the list of changes after a release candidate...


Yep I use a non confirming release cycle and even actively backport new features form development branches to RCs or patchlevel releases if they are safe and don't have interactions with other code. This is done to make things available faster. Other times new features are almost like fixes because the old alternative resulted in user pain.


Congrats antirez and redis labs!!


Noob question - is redis streams completely able to replace the Kafka usecase?


Nope - similar concept, different stories.


It's actually a big release with a lot of changes!


[flagged]


This seems wildly off topic.

Redis has 342 contributors on GitHub. The SourceHut repo, which I assume is one of the most popular repos on SourceHut, only has six contributors with more than one commit in the last year: https://git.sr.ht/~sircmpwn/git.sr.ht/contributors

Github is an order of magnitude more popular than GitLab and Bitbucket, which in turn are an order of magnitude more popular than SourceHut.


>>” SourceHut, only has six contributors with more than one commit”

And wasn’t antirez practically the only developer of Redis for years as well.

Why does the number of developers of a piece of software indicate it’s better?

Having a small development teams has worked for numerous successful projects. Take SQLite as another example.

So why the downvote?


Not the guy you asked, but I didn't downmod your comments. However, I don't think moving would make sense right now for big projects. Here's why:

1. Sourcehut is still pretty new; people aren't as familiar with its interface and workflow (which is intentionally different from the github PR-based one). Better or worse aside, this would create an additional barrier to contributions.

3. Sourcehut is still in alpha as of writing this comment; though it's stable, stuff might change. This is no ding on the site, which is actually really good (I greatly prefer the lightweight UI to github), but projects like redis have different considerations than a personal hobby project.

2. It takes time and effort to move a project to different hosting, and from my experience, this grows in a greater-than-linear fashion as the size of the project increases. Any project must therefore ask itself if there is a compelling reason to expend that time on that goal, some benefit to developers or to users.


[flagged]


What do you care about TLS for a blog?


I've had ISPs insert ads/garbage into my pages.


To read an article you have to do all this?


> 6. RDB files are now faster to load. You can expect a 20/30% improvement, depending on the file actual composition (larger or smaller values). INFO is also faster now when there are many clients connected, this was a long time problem that now is finally gone.

Talk about humble bragging.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: