I really liked Redis for a long time. Simple, fast data-structures in-memory. That's it. Along the way there have been some nice enhancements like Lua, which solves a lot of the atomicity issues. But somewhere after 4.0 I feel they have lost their way. First they saw all the attention Kafka/event-stuff was receiving, so they baked-in this monstrous, complicated, stateful streams feature. Now we have SSL (do people really expose Redis on the internet??), ACLs, cluster stuff, and most relevant to me a new wire protocol.
To my thinking, Redis fit very well in the "lightweight linux thing" category. It seems they aspire to be enterprise software, and this may be a good move for Redis-the-Business, but it's not good for users like me who just want simple in-memory data-structures and as little state as possible. Forcing a new protocol that adds very little value (in my opinion) also seems like a great way to alienate your users.
About the "Kafka" thing, actually streams were wanted by myself, very strongly, and not suggested by Redis Labs. Let's start thinking at Redis ad a data structure server and at streams without the consumer groups part (which is totally optional). It was incredible we had no way to model a "log" in a serious way. No time series easily, using hacks and with huge memory usage because sorted sets are not the solution for this problem. But then why consumer groups? Because for a long time people had this problem of wanting a "persistent Pub/Sub": you can't lose messages just because clients disconnect in most use cases. Btw this Kafka monster is a total of 2775 lines of code, including comments. 1731 lines of code without comments. In other systems this is the prelude in the copyright notice.
But ACLs, in order to manage to survive 10 years without ACLs we had to resort to all kind of tricks: renaming commands to unguessable strings. Still with the panic of some library calling FLUSHALL for error because the developer was testing it in her/his laptop. Really ACLs have nothing to do with enterprise, but some safety is needed. The ACL monster is 1297 lines of code, and is one of the most user friendly security system you'll find in a database.
Actually all those features have a great impact on the users, huge impact on day to day operations, and are designed in order to be as simple as possible. And Redis Labs actually has only to lose from all this, because those were all potential "premium" features, instead now they are in and every other Redis provider will have it automatically as a standard. So... reality is a bit different, and it's not a conspiracy to gain market shares or alike.
If you want any kind of HA, you'll have multiple instances of Redis, with changes replicated from the writable node to the others.
That traffic needs to be encrypted too - and redis (pre 6.0) knows nothing about TLS.
So now you need a tunnel to each other Redis node.
Oh but you also want Sentinel to make sure a failure means a new primary node is elected... and sentinel doesn't speak TLS either, and they need to both speak to each other, and the redis nodes... so that's another set of TLS tunnels you need to setup.
I setup redis on 3 nodes for a customer, if you tried to draw the stunnel setup on paper, it'd look like you're illustrating a plate of spaghetti.
Am I missing something?
It would really be awesome if there was a built in way to attach/spill/persist individual streams to external data volumes (older/busy streams could run out of memory) and have it support hot swapping.
> Btw this Kafka monster is a total of 2775 lines of code, including comments. 1731 lines of code without comments. In other systems this is the prelude in the copyright notice.
Hilarious, funny and informative :D Upvoted!
In short - I like to have audit-able dataflows in my pipelines. Streams are inherently timestamped and provide a great way to see how data changed over time. For one, if you had a bug or regression in the pipeline, you can precisely track down the impacted time window - no guessing needed.
Let’s stop calling basic security features “enterprise”.
Locking basic security features behind a paywall is a protection racket, pure and simple.
Small companies, and lone developers, need security, too.
If we are making software for consumers who won’t know any better, why not encourage (and make it trivial) for fledglings to do the right thing from the very beginning?
Why does every single company have to go through the same security mistakes on their way to Series A/B/C? Why can’t we learn from our mistakes and make the doing the right thing not just accessible, but easily accessible.
Hat tip to antirez, et.al., on this one.
bash-3.2$ telnet antirez.com 443
telnet: connect to address 22.214.171.124: Connection refused
telnet: Unable to connect to remote host
Modern security practices typically dictate a defense-in-depth approach. The ideas is that you will be compromised at some point (no security is perfect) and as such you should make any compromise that does happen as minimal as possible--you want to prevent attackers who get a foot in the door from rummaging around your network.
A key part of any defense-in-depth strategy are things like encryption and authentication/authorization. If you're using redis to store any kind of sensitive material, you want to make sure that only people on your network with the appropriate auth credentials can access it. This is one of the easiest ways to prevent drive-by data theft.
From here, SSL is a logical step. You need to ensure bad actors can't sniff network traffic and steal credentials.
I can't speak to streams or the other features you feel complicate Redis, but I think SSL+ACL are very important tools for increasing the cost to attackers that target redis instances leveraging those features.
AWS and GCP don’t even give you a way to install a cert yourself— you MUST use an ELB or bring your own certificate.
Legislation aside, this also goes back to a defense-in-depth strategy; TLS proxying only works if the network behind the proxy will always be secure. You might be able to get away with running TLS on the same host as redis, but in all other cases I can think of you're going back to the 90's-era security policy of having a hard shell and a soft underbelly--anything that gets into the network behind your TLS proxy can sniff whatever traffic it wants.
EDIT: It occurs to me that you seem to be hinting at running redis as a public service. In that scenario it makes perfect sense to use a TLS proxy for versions of redis without SSL. That said, it's still important to encrypt things on your private network to ensure you aren't one breach away from having your whole stack blown open.
In-process won by a mile, despite my feelings about redis from an operational perspective (read: not good). The added choreo, monitoring, and overall moving parts were strong contributors against external proxying.
> AWS and GCP don’t even give you a way to install a certain yourself.
If you mean as a service they provide, well, that’s what ACM is for, no? I assume Google Cloud has something similar.
The scenario I'm imagining is that an attacker manages to gain access to one box in VPC, and from there is able to snoop on (plaintext) traffic between an ELB that does TLS termination and some other box in the VPC that receives the (unencrypted) traffic.
If you encrypt all inter-box traffic, then this attacker still doesn't get to see anything. If not, then the attacker gets to snoop on that traffic.
I'm not sympathetic to lazy arguments like, "if an attacker has compromised one host in your VPC, it's game over". No, it's not. It's really really bad, but you can limit the amount of damage an attacker can do (and the amount of data they can exfiltrate) via defense-in-depth strategies like encrypting traffic between hosts.
This is 2020. The "hard outer shell, soft chewy center" model of security is dead and it's not coming back. Modern datacenters and cloud deployments use mTLS (mutually-authenticated TLS) between every service, everywhere, all the time.
There are some massive benefits to this. For starters, you can limit what services talk to one-another entirely through distribution and signing of keys. Yes, this adds a burden of complexity if you go that route. But suddenly you don't have to care as much about (for instance) many network-exploitable vulnerabilities in your services because someone with a foothold on your network can't even get talk to your service in the first place if they don't have the right TLS cert, which is only on the handful of machines and only readable by the specific services that are legitimately allowed to connect to it.
This is a much stronger guarantee than firewalling alone (though you should also use firewalling), because multiple services can be running on a host but only the applications that are allowed to talk to your service will have read access to that key.
On the flip side, you have stronger guarantees that the service you're connecting to really is the service you're expecting it to be. If you're storing sensitive information in Redis, you can know for sure that the port you've connected to is the right Redis and not another, less-sensitive application's.
After that I as a consultant get access to the network and apart from some test that a developer stood up nothing matches the glossy talk.
Thanks god for Wireguard. It has truly been the savior deploying encrypted networks.
In talking to other people high up on the technical side I realized it is a norm. The only question is if what I call "velocity of awesomeness of the product" makes the warts less important.
Or in my case recently... someone has generated a root certificate for the internal CA that uses an insecure crypto scheme, and Chrome still throws up a security error requiring users to click past the warnings to access the site.
"Can you generate and roll out a new cert please? This isn't really 'security'?"
"Oh we will get to it, can you just use the one you already have?"
Cue 2 years going by. Same situation, except that the certificate has been regenerated with the same insecure crypto scheme.
Our industry needs to do better, and not brush off good security as "glossy marketing talk".
We're hoping to use SPIFFE/SPIRE to bring adoption even higher.
We've spent hundreds of hours cobbling together a system that meets our regulatory requirements and still performs well. These features go a long way toward addressing this pain point. I think they've done a decent job making the extra complexity optional, too.
There are no secure networks. Your options are vpn, third party ssl, or ssl in the service. Sometimes, your datacenter/cloud will guarantee "secure" network (ie: manage vpn for you).
But in many instances having ssl "inside" can be simpler.
Postgres also offers secure transport.
I like containers for my stuff. It’s silly, IMO, to doubly encapsulate my datastores.
Now, given the context this may or may not be a distinction that you care about, but there certainly are times where you really do care.
(Besides, if I'm running a tcpdump on a box to try and figure out why the network is going wibbly I'm a lot happier knowing all traffic is encrypted and I'm not going to accidentally capture some PII. I've had to tcpdump within docker containers before too, so putting everything in containers doesn't necessarily solve this.)
One reason off the top of my head would be regulatory/compliance issues around how things are encrypted. wireguard is relatively new, and some certifications required to do business in specific industries (finance, healthcare, etc) mandate protocols with a minimum level of maturity. wireguard may be good, but many regulators would probably not find it acceptable without a longer track record.
On a more concrete note, I'd consider any system that handles authentication to be inherently broken if it had no way to keep those credentials safe out of the box. TLS has long been a cheap-ish way to do this, as it's widely available and well understood by both implements and regulators.
If your devices are already on the same network and instead you close down the firewall and you move everything into wireguard you've just moved your problem.
It's possible it may dethrone TLS in the future
I don't use haproxy to secure my telnet sessions - I use ssh.
Reading over the comment you replied to, I was thinking similar myself. :/
If you really just want fast data structures in memory, use memcached. If you somehow feel that Redis is a better solution for you, perhaps you should carefully consider that you may be placing more weight on its platform features than you realize.
Finally, you might remember that internal NSA slideshow with the "SSL added and removed here ;-)" when talking about how they stole user data from Google's internal network. After that leak, rollout of internal mutual authentication/encryption accelerated, because people were actually inside the network observing internal RPC calls. It wasn't theoretical, it was happening.
Ultimately, mTLS is a pretty simple way to get a large increase in security. It has its disadvantages; you now have another "moving part" that can interfere with deployments and debugging (an attacker can't tcpdump, and neither can you, easily), but given how many large companies have exposed user data through unintentional interactions, it is something worth considering. It's a technique where you have to go out of your way to cause surprising behavior, and that is always good for security and reliability.
Thanks to RESP3 I was able to write this client that, combined with Zig's comptime metaprogramming, can do things that no other Redis client that I've seen can do.
The user gives the desired response type and the client is able to provide appropriate translation based on what the RESP3 reply is. This would still be possible with RESP2, but v3 makes it much more robust and explicit, to the point that the ease becomes transparent without looking magical and/or triggering confusing corner cases.
If Redis does not support encryption natively, then you have to run a gateway like stunnel on every redis host. The redis clients mostly all already support connecting to a secure socket, but the server and cli client require manual stunnel configurations. Native support for encryption just removes this extra setup.
For the use case you are suggesting, wireguard network among your hosts would be simpler and without the need for each application to handle TLS.
As long as wireguard is Linux-only, it's not an universal solution. TLS is.
No, it's not. In almost all high-security compliance audits you are required to have the data encrypted anytime it's in-flight.
Like medicine, every piece of software you use has effects and side-effects. If the advantage of the effects outweigh the disadvantages of the side-effects, then something is a good deal. But if you can avoid the side-effects entirely, that's best.
Most tooling uses TLS, because when you do this at scale, you automate your CA and it is much easier to securely deal with than, eg, ssh certs. But we do use (LDAP centralized) ssh as well, mostly for humans.
It sounds like you don't follow Redis then.
That ship sailed years ago. Redis has at least 10 major features in addition to the caching you're talking about, including search. Redis is a kind of database now.
If you just want a cache, use memcached.
Half of my jiras at one company were related to enabling SSL for Redis due to compliance reasons (all for internal use.) Now those can be closed.
* it's been years since I looked at this so maybe Redis now ships with inbuilt protection against this.
It's two data structures (which were already in Redis for other reasons!), and an automatic sequential identifier. Everything else that's "stateful" about it is client-side state—the server is still just a data-structure server. A Redis stream is basically just a Redis sorted set that's coherent in the face of clients trying to consume it paginated as other clients insert into the middle of it.
Also, the code is in one file (https://github.com/antirez/redis/blob/unstable/src/t_stream.... ); that file is ~3KLOC. It's just another Redis Module, isolated into its own set of functions with no impact on the codebase as a whole. It's just one that's so widely applicable, to so many use-cases that people were already using Redis for (through Sidekiq/Resque/etc) that it makes sense to ship this particular module with Redis itself.
Would you get upset about bloat if Postgres upstreamed a highly-popular extension? It already has nine or ten installed by default, and a few more sitting in contrib/. But, of course, even upstreamed, none of those extensions are enabled by default, adding runtime overhead to your DB; you have to ask for them, just as if you were installing a third-party extension. Same here: if you don't use the Streams module, there's no overhead to its existence in the Redis codebase.
> do people really expose Redis on the internet??
Cloud DBaaS providers expose Redis instances "over the Internet", in the sense that they're in the same AZ but not within your VPC. To the extent that you can wireshark a data-center's virtual SDN, they need to encrypt this traffic.
Even PaaS providers do things this way, since they usually lean on third-party DBaaS providers. E.g. all of the Redis services you can attach to a Heroku app are consumed "over the Internet."
If you're using Redis through an IaaS provider's offering (e.g. AWS ElastiCache, Google Cloud Memorystore) then you get the benefit of them being able to spawn an instance "outside" your project/VPC (i.e. having it be managed by them), but have it nevertheless routed to an IP address inside your VPC. That might be enough security for you, if you don't have any legal requirements saying otherwise. For some people, it's not, and they need TLS on top anyway.
> cluster stuff
Have you looked at how it's done? It's just ease-of-use tooling around the obvious thing to do to scale Redis: partitioning the keyspace onto distinct Redis instances, and then routing requests to partitions based on the key. It's not like Redis has suddenly become a multi-master consensus system like Zookeeper; the router logic isn't even in the server codebase!
Do people really still send database traffic unencrypted over unencrypted internal networks?
It's just how the world works. You have to conquer to survive.
You've never browsed unprotected IPs and ports huh? So many random redis instances just lying around.
Accidentally on a hobby-server, because docker-compose automatically opens up firewall ports.
You are a hero.
Many people at least in Rails also use Redis to hold a queue for background processing, where it ends up working well, although hypothetically many other things could too.
You can also use redis for all sorts of other stuff that isn't caching or a queue.
It's been particularly useful in load balancers / proxies for authorization, managing csrf, and tracking session activity to auto-log out users. I do this with OpenResty.
In async job or internal rpc systems, I use pubsub and streams for distributing work, ensuring exactly once processing, and getting results back to the caller.
- This DB query is pretty slow. Cache the result in Redis.
- I'm using server-side sessions, but I have multiple servers. Where should I store session data where it'll be super fast, but all my servers can access it? You could use Redis.
- I need to do distributed locking. Use Redis
- Simple job queue? Redis.
- I'm processing a lot of inbound webhooks, and I want to avoid trying to process the same one twice. I'd like to keep a list of all webhooks I've seen in the last week, and quickly check if a new webhook is in the list, but it needs to be really fast. Redis.
Basically, Redis has support for some nifty data structures, and it's very fast. There's a lot of places in most apps where being able share a very fast list, set, map, stream, whatever between servers can be useful. Of course, all the above uses cases can be solved by other more specialised tools too, and in some cases better.
(That being said, it's so useful and generally applicable that you should be careful not to ignore fundamental issues. For example, if you have an unavoidably slow query, then by all means cache it in Redis. But if all your queries are slow because you forgot to add indexes, maybe you should fix that instead of using Redis to cache every query!)
Redis is a high performance key/value store with adjustable consistency. It can be used as a caching layer, and it can also do a solid job of representing a basic message queue. It typically fits in on the data layer to store computed results for fast retrieval, but it can also behave like a read replica (storing data from outside your domain).
That being said, when Redis becomes a critical part of your stack for performance reasons it usually means something has gone wrong in your data layer. I often times see teams use an antipattern of taking data from their database, changing it's shape to what they want, and then storing that new model in redis. If you're the only consumer of your data, then your data model is incorrect and you're using Redis to paper over that fact.
Edit: I suppose a hypothetical can go many ways, it could be a poor data access pattern. What was the root cause in some of your experiences?
- User session storage
- Asynchronous "queues" for background processing using Streams (was able to eliminate RabbitMQ from my stack when I switched to this)
- Rate-limiting with a GCRA (via https://github.com/brandur/redis-cell)
- Bloom filter checks, specifically for making sure people aren't using passwords in data breaches (via https://github.com/RedisBloom/RedisBloom with data from https://haveibeenpwned.com/passwords)
It can really give you a good boost in performance especially on frequently accessed pages were the content rarely changes - database queries are often "expensive" - so for frequently accessed data that doesn't change frequently - such as product descriptions - it can be a huge help.
But, Redis is much more than caching. It supports all kinda of fun data-structures like sorted lists, timeouts, sets, pub-sub and more! You can almost think of it like memory that is held by another process. In that way, there are SO many uses cases.
The chapter on Redis persistence is, like Redis documentation in general, quite readable and informative.
However for your specific use case, considering a typical MVC web app with RDBMS data storage, you would add a check at the beginning of your Model method to return cached data if it exists, else go through the method execution and write the data to the cache just before returning it back to the controller. This way the cache would be 'warmed' on first call and data will be served directly from the cache (memory) next time till it is cleared, saving expensive disk I/O.
With skinny controllers, very often you have some specific places (eg service objects or similar layers) where the controller logic lives, and that is where you can do your caching.
The catch is that the data must fit in RAM, you can only use 1 core for query processing, there is no support for long-lived transactions and no built-in SQL support.
Obviously in an ideal world something like Redis would be useless since proper databases would cover all use cases, but unfortunately the state of database software is disastrous.
This does obviously decrease write performance of the Redis quite a bit, but read performance is mostly fine.
It is all about how many writes versus reads you expect per second. My system expects at most 1 write per sec vs 100 reads, so performance is fine and this way it is ACID compliant.
I'm not sure redis, a database, is relevant.
As for where something like redis fits - I don't think it'd show up on the design that concerns mvc, no (it could be a cache inside model, inside controller - or even inside view.. Caching ui fragments for example?).
2. Use it for quick lookups for user accounts
3. To queue up jobs that need to run whenever the job runner has slots available.
4. Use it to crash your entire web stack when you accidentally clear your redis instance
First you could check at
If you look at single data type you can see how redis takes care about complexity(indicating the Big O notation) of each single data structure and operations.
Many devs use it for caching but in my opinion is super nice for evil-write applications.
I know broadly what Redis can be used for, I was just asking for some practical tips.
Woah, I missed that redis changed their mind about this. Clearly haven't been paying attention.
Why is that?
Thanks for the explanation on Threaded I/O, I was about to ask the difference between Redis 6 and KeyDB.
1. subscribe for updates on the key you're trying to retrieve
2. retrieve cached value or set a lock
3. if lock acquired, unsubscribe, then fetch value
4. set the result in a key and publish event with the value
I wish there was a command that does:
retrieve value of a key, if it doesn't exist lock the key and notify me with an id, if it's locked subscribe for the next update on the key. With a second command that acknowledges the lock with a new value
Now I know this can be implemented and I've done so multiple times. It's just that it's tricky to get right and consistent.
That is the easy part though, problem is these commands are coming from 12 different cores with possibly that cores will increase to 24 on future. One core might issue 5000 commands, other core might issue 20000 commands, there is no possibility to predict how many commands will come from a specific thread. It is extremely important that this is done with highest performance possible otherwise my system will have bunch of other issues with catching up on other non-db related client-side commands.
So I devised a special Redis pipeline structure where each of the cores can issue commands without any kind of mutex during the course of a tick (server is tick based), mutexes for the threads are locked only at the end of the tick to compile different pipeline commands coming from different threads into one giant pipeline command along with their asynchronous callbacks.
So practically I'm utilizing a giant redis pipeline with 12 threads on Redis client side where I'm locking only 12 mutexes at the end of the tick, this is extremely high performance as far as letting server process other non-db related client requests while all that data is getting written to server.
Truth to be told it is an over-engineered crap and I guess it might be possible to implement something similar with other redis client libraries but this is something I already build so I would like to keep it as it is.
You'd need to periodically SCRIPT FLUSH to not let old scripts linger
In any case I'm fairly certain there are multiple ways to solve such problems, normally I wouldn't even bother writing an in-house database client library but RESP is so amazing and simple that it doesn't take much time build a client library for Redis.
Previously I had to implement this myself -> https://github.com/Munawwar/redicache
This is good to see. Well done!
That's quite the list of changes after a release candidate...
Redis has 342 contributors on GitHub. The SourceHut repo, which I assume is one of the most popular repos on SourceHut, only has six contributors with more than one commit in the last year: https://git.sr.ht/~sircmpwn/git.sr.ht/contributors
Github is an order of magnitude more popular than GitLab and Bitbucket, which in turn are an order of magnitude more popular than SourceHut.
And wasn’t antirez practically the only developer of Redis for years as well.
Why does the number of developers of a piece of software indicate it’s better?
Having a small development teams has worked for numerous successful projects. Take SQLite as another example.
So why the downvote?
1. Sourcehut is still pretty new; people aren't as familiar with its interface and workflow (which is intentionally different from the github PR-based one). Better or worse aside, this would create an additional barrier to contributions.
3. Sourcehut is still in alpha as of writing this comment; though it's stable, stuff might change. This is no ding on the site, which is actually really good (I greatly prefer the lightweight UI to github), but projects like redis have different considerations than a personal hobby project.
2. It takes time and effort to move a project to different hosting, and from my experience, this grows in a greater-than-linear fashion as the size of the project increases. Any project must therefore ask itself if there is a compelling reason to expend that time on that goal, some benefit to developers or to users.
Talk about humble bragging.