I posted that link because I've known antirez for a long time, worked with him at Linuxcare, and when he said he was working on something cool - I believed him!
In 2011 I wrote PHP client for Redis with features like tags, safe locks, map/reduce. It was heavily tested and users had no issues - and since there was no issues, it was removed from the list of clients after few years as "abandoned". Nobody even tried to contact me :)
Now I don't use PHP and barely use Redis so it's all in past. But that experience taught me a lot of new things.
Salvatore Sanfilippo, you are great programmer. Redis API is elegant, safe and effective. I personally had 0 data losses all years I use Redis. And this DB is always extremely fast.
And I'm here not promote my code, I'm here to congratulate Antirez and thank him for his great job and for being a good example of how to write API, stable and fast code, how to communicate with the community. His blog is a source of lessons, wise thoughts and inspiration.
Edit: This should have been a reply to the comment from EugeneOZ saying it's not an anecdote; I replied to the wrong comment by accident.
What does that teach us about our software development practices?
That’s often unobtainable with modern software development because we rely so much on things that change too often, but it doesn’t have to be that way.
It’s a paradigm shift of course, but I think our business really needs to take maintainability more seriously than it does. This goes for proprietary software as much as Open Source, but with Open Source there is the added layer of governance.
I work in the public sector and we operate quite a few Open Source projects with co-governance shared between different muniplacities, and the biggest cost is keeping up with updates to “includes”. It’s so expensive that our strategy has become to actively avoid tech stacks that change too often.
> That’s often unobtainable with modern software development because we rely so much on things that change too often, but it doesn’t have to be that way
The reason we rely on things that change often is because we want to leverage them to get products out faster. Many different layers of that (as every tech stack is essentially a product by someone) and we have lots of updates to deal with. The flipside of slow moving projects is bugs might not be fixed or new helpful features might not be coming in, meaning you have to build it yourself.
As a community we know and have known how to build mission critical software for decades, but we actively often decide not to do it because it isn't that important compared to other factors.
While the particular Etsy clone or t-shirt of the day, or customized shower curtain site will certainly come and go, it’d be an entirely different problem if visa, PayPal, stripe, swipe, or whatever payment processor packed it up and went home at random.
I've only recently switched after years of scepticism but for the sort of stuff I do it's more than good enough. It has its warts but so does every language that's stuck around.
I'd argue that medical software shouldn't be connected to networks because security is hard, and most people get it so wrong. If that's part of the design, then the goal you're talking about is attainable. But in many cases, software isn't useful for its purpose if it can't access a network, and so the idea of just leaving it alone for decades at a time is an actively bad goal.
I like react by the way, it’s just an example. But we’ve certainly had to spend a lot of dev time on JS frameworks in general.
The new paradigm has to be “plan to evolve with the ecosystem.” There are just too many moving parts to treat software as static.
I know it’s harder to build with security in mind in the modern connected world, but we have a Django app that hasn’t needed anything but security updates that runs perfectly fine as an example of a web-app that doesn’t need much development time post implementation. So it’s not like it’s impossible either.
Don’t get me wrong, we’ve been as guilty of “wow this new tech is cool” as anyone else, which is where the lessons come from.
To bring it back to the OP, redis is notable for being developed _very carefully and slowly and intentionally_, compared to much software. You won't get a feature in as quickly as you might want, but redis is comparatively rock-solid and backwards-compatibility-reliable software. These things are related. It takes time and effort and skill to make software that can change in backwards compat and reliable ways, takes lots of concern for writing it carefully in the first place.
Change of software is _not_ in fact easy. It might be easier than a bridge. But of course people just _don't change_ bridges, generally. We understand much better how to make requirements for a bridge that won't need to be changed for decades. Software might be easier to change than a bridge, but dealing with change is nonetheless without a doubt the most expensive and hardest part of producing software that will be used over a long term, and quality software is not cheap. And we haven't learned (and some think it may not be feasible ever) to make software that can last as a long as a bridge without changes.
Sometimes software is just finished, not abandoned!
In a closed environment, sure it's just finished, but the world is much bigger, not closed at all. Even the good old Instalshield required maintenance from Microsoft over their 16 bits compatibility layer.
Let's say you write a great piece of software that has no need of updating. But Redis keeps evolving - making new releases.
You could just each release of Redis run a series of checks and add a "verification" that we support version 1.0,1.1,1.2 of redis.
But even better is if someone else does this - why? Because it solves a problem I have seen a lot on government circles - the "we cannot use OSS because it is not supported"
But if another company says "we have reviewed, tested and used" version X of php-Redis then you suddenly have a self supporting eco-system.
Webshops that care about winning tenders can say "this software is supported by dozens of providers around the globe" so if it's original user goes off line there are still people who provably have skills and experience with it.
(note - I am in no way saying this is something you should have done or thought about or arranged - writing software is hard enough without these sort of long term, unprofitable, activities - it's just matches observations i have on getting oss into government (plug: http://www.oss4gov.org/manifesto)
Companies using it is nowhere near the same as 'supporting' it in the sense of providing assistance if there's a problem.
- we have tested version X of this and it's full test suite passes, and it runs against this version of redis server or it installs cleanly on RHEL 7.1
then it's a positive move
If you also sign off a different certificate saying
- we are a commercial entity that offers "support" (however we define that) for this software
then we are into a much more interesting eco-system
(yes I am looking for way more than I downloaded it and it works on my laptop :-)
There are companies that do that already. What's the issue? People that say "there's no support" either don't care to look, or have custom stuff that is not supportable via 'average' support companies.
Anyway, kudos for writing the client and kudos to Redis. It’s a brilliant bit of software.
Maybe there could be a new curated repo for packages, where users are asked to regularly vouch whether they are still using the package? The problem of course is motivating users to give those answers. Without a critical mass of users regularly vouching, the data isn't much help.
(My job is advocating a cryptocurrency that enables privacy through technology, but I think that cryptocurrencies should be a quite small percentage — if any — of most people's investment portfolios.)
Ezra was one of the good guys
Dropbox has a fixed set of features, most people care about some other set, and there will be some intersection and some things that are needed that are outside of Dropbox feature set, that will need to be implemented by the user anyway (like encryption, automation, etc.). There will also be some things that will go against the user's use case, and compromise it.
So compared to Dropbox, having two external hard drives and using rsync regularly will get you way faster point in time backups, faster access in case of recovery, privacy, transparent encryption (that you'll not have to care about during access to files), no worries about losing access/account takeover, one-time fixed payment for the drive that will last you probably more than 5 years, instead of subscription (where you'll pay after 1 year more than you'd pay for the drive alone), etc.
Having always connected extra internal backup drives will also give you some other options. Like if you're a heavy user of PostgreSQL, you can setup cluster replication locally with synchronous replicas on different drives, and you'll have your databases backed up. Better than dropbox in this use case, too.
OTOH, if your use case is collaboration, Dropbox may be better. But if you include encrpytion of individual files you want to collaborate on, it may be again more cumbersome. I don't know.
The fact that you think this list of inaccurate claims supports rather than refutes your original post suggests you should spend more time learning about what Dropbox does and calculating the operational overhead of supporting a homegrown solution. In particular, thinking about what ease of access means with an external drive could lead you to insights about correlated failure modes such as what happens when the same thief/power surge/accident takes your laptop and the drive sitting next to it, and you realize that if you’d used Dropbox you wouldn’t have lost more than a few seconds of work. Similarly, your scheme has no versioning, bitrot protection, etc. which people always discount until the first time they lose data.
One with 381 points, a few with 100 points, and it descends quite fast and the last few posts in the page have about 20 points.
Three with more than 400 points, many with 300/200/100 points, and only the last has less than 50 points.
Here is the link from Feb 25, 2009
Note this number = 494649
Today = 19 247493
Nineteen (19) Million Posts etc later (mas o menos)
Having worked for a couple of years at Redis Labs, I got to work closely with antirez, and that's been a transformative experience as well, which made me a better engineer and open source contributor.
Thank you, Salvatore. Here's to the next 10 years.
I've also been warned by fellow devs that much of the clustering logic is actually done in the client, and there's historically been a lack of mature clients for all languages. Even if you find a mature client, the complexity of the implementation implies that not all clients may behave identically, or may have different bugs. Then there's the issue of lock-in, where you become dependent on a specific client library and its development lifecycle. I don't know if all of this is true, but I also don't hear a lot of people talking about Redis Cluster these days.
I know you can use Redis in master/slave mode via Sentinel + Twemproxy, though even this solution seems to have some issues with data consistency. Running all three also appears a lot more complex than an integrated system.
I see a lot of comments implying that Redis is mainly used in single-node setups, so that might be where it shines?
I believe that for the Redis use case the cluster tradeoffs make sense because:
* Best effort consistency in the practice work quite well, even if it does not have any guarantee, but does certain things to avoid losing writes in trivial ways.
* If you want to cluster Redis, you want Redis, not a cluster that automatically becomes a lot more slower, memory hungry, and so forth. So replication should be asynchronous.
However what I may change in the future, and there are plans for that, is to add a failover strategy that does not just pick the replica that is more ahead in terms of received writes: but even stop the failover if there isn't a majority of slaves reachable. This improves certain properties, and if well orchestrated can also show strong properties if writes are acknowledged only after transferred to the majority of slaves.
Redis Cluster is used in many organizations right now. The next step is to improve it (in different ways than having strong properties mostly), and provide an official proxy for it.
I don't really understand the nature of the "tradeoffs" you mention. As Aphyr pointed out, Redis (with Sentinel) is not safe to use as a database, a queue or even as a lock service. That really narrows the possible use cases. I can absolutely see Redis being appropriate for many "lossy" applications: Caching, web sessions, rate-limiting counters, precomputed analytics data, intermediate outputs from distributed data processing pipelines, that kind of thing.
But the use cases where I'd reach for Redis seem a lot fewer than with data stores that have high consistency guarantees, such as FoundationDB, TiDB/TiKV, CockroachDB, or Cassandra/ScyllaDB. With the exception of TiDB, these are a bit easier to reason about since there's no Redis/Sentinel/Twemproxy split.
On the other hand, I certainly appreciate the specialized data structures and Lua support that Redis comes with.
We ended up not using it, but I always wanted to revisit the pattern.
BTW, thanks for such a wonderful piece of software!
BTW, Without a critical mass of users regularly vouching, the data isn't much help.
if someone is happily using the package in production, that won't tick any download numbers even though it's arguably the thing anyone considering a package would really want to know.
> "There is more: I believe that political correctness has a puritan root. As such it focuses on formalities, but actually it has a real root of prejudice against others. For instance Mark bullied me because I was not complying with his ideas, showing problems at accepting differences in the way people think. I believe that this environment is making it impossible to have important conversations."
I'm asking you to knock it off and discuss in good faith next time.
There are probably more old white dudes in Europe that suffered directly from slavery than there are Americans with Grandparents who were slaves. So please don‘t judge people by their skin color.
- that's not our use case (then why are you advertising distribution???)
- they didn't use this feature (WELL THEN, SHOW THAT FEATURE PASSING THE TESTS)
- we're not that kind of distributed (oh, you mean the one you can use?)
Jepsen has been the first step for trusting anything that involves data and distribution. Everything is going to fail under his hard stare, it just comes down to how much BS smokescreening is done by the purveyors of the software to judge how to trust it.
It also helped me earn a lot of praise for using redis-cli --pipe to bulk insert data which bringed down a data import job to a few minutes from 4 hours. I eventually built a wrapper around redis-cli --pipe for using it with a cluster.
From memory - there was a setting to turn on/off gzip compression for a list once it went beyond a certain size - do you have this enabled?
If you have a cluster of machines operating on a dataset, you can store that dataset in redis to get high performance reads and writes. In the simple case of a cache, it's a key value store. But other complex cases exist: A priority queue, an atomic transaction log, a lock server, and more.
It supports lua so if the data structure and operations you need doesn't exist you can generally build it yourself.
I'm not an applications programmer (and don't want to be one), so take what I say with a grain of salt, but I was first introduced to Redis several years ago during an analytical project working with a consultancy, and I asked what it was and why they wanted to bring it in.
"Its a memory-resident key-value store!"...
"So its...a hash table?"
"Its a memory resident key-value store!"...
"And don't modern languages already come with those and why aren't we just using those internal and mature solutions rather than bringing in an arbitrary new external dependency?"
"Its a memory-resident key-value store!"
The answer is, yes, exactly!. Only, it's a hash table that can be shared across all of the different processes on the server. So if you have e.g. two different web requests that want to update some value, then that's how you do that. The other main alternative is a regular database, but that's much heavier and isn't really built in terms of "data structures".
(Redis isn't just a hash table - it's a list, a set, a queue, etc, in other words, all the standard library of a programming language, only in a way that can be shared across all the processes.)
The one place I've run into it is in web development where it's used for caching? In some tutorials I've read.
Redis gives you amazing speed and because it provides a kv interface, the work is very easy.
And where the usefulness of K:V store is beaten by SQL is the point you would want to run a "JOIN" or "GROUP" on the data, for example if you wanted to count the number of keys containing a certain data point, correct?
Yes, the usefulness of SQL always is the join or group but with something like session, the idea is to just use it to dump values into the store which you don't want to reach into the database every time. So different teams will want to put in their own keys and data into the session object which means your DB session store becomes extremely difficult to maintain over time.
On the other hand, the joins and groups based on this can be handled later in time than in the req-response cycle itself.
I'm going to see if that resource is there by doing `GET external.foo.bar`. if there is nothing here, I perform the slow pull. After the slow pull is done, we store the result in redis under the same name `external.foo.bar` with a timeout for x seconds.
Next time that resource is requested from our code, it will be there, so `GET external.foo.bar` will get us that resource without having to perform a slow call to external service.
Also, where does Redis sit (my guess is between request & database)?
That said, caching is just one of the possible uses for Redis. I think of it as an easy way to share arrays, dictionaries, queues, etc between different applications. Then it's easy to see how it can be used for almost anything.
A simple non-caching example, but we use it for distributed locking, more specifically a distributed countdown latch. This is much cleaner and more performant (for us) than doing a similar operation in a traditional RDBMS.
Another very common example that is used in a lot of tutorials is maintaining a leaderboard using a sorted set.
In most of these scenarios, you are still persisting data to stable storage. So, you would take the performance hit and load the data from the database.
You can use it as a simple cache, and that's fine. But then why not just stick with memcached -- less is more?
There are probably some scenarios where single point of failure and data loss is fine and the additional data structures redis provides over memcached are handy (e.g. analytics), but I've never seen it used for that.
Don't jump to use it unless you really have performance/scaling issues.
This only scratches the surface of what is possible, but it's some things redis is used for.
My coworker once called redis "NoSQLite", which I think is a very apt description.
Also see: https://www.dbms2.com/2008/02/18/mike-stonebraker-calls-for-...
It's single-threaded, stores everything in RAM with optional persistence, and has Lua and modules so you can do more than the standard commands.
Great for caching, or anything distributed systems need quick access to.
So, no, they are not interchangeable! Only if you just store keys and strings and do not care about persistence.
k/v store is just 10% of what it can do. Your value doesn't need to be a value, it can be:
- an array or a set
- an associative map
- an ordered list by score
- A bit array
Also offers functionality like streams and pub/sub
need an atomic lock shared between multiple processes or servers (x)
need a set of unique values sorted by insertion time (x)
want to know the O(x) complexity of using any of the data-structures to design a system for scale (x)
need to notify multiple consumers of a modification ala pub-sub (x)
need to keep track of a stream of events for consumers (x)
need to do geo-spatial lookups (x)
oh and you want this thing to be durable to failure and easy to maintain (sentinel|cluster) (x)
redis is single process and easy to configure and understand that's my opinion of why it's so amazeballs.
 unicode not supported so x instead
You can store basically any common data structure in Redis and operate on individual elements as if they were local variables in your program.
I don't even want to know how many elasticache Redis servers data are just unsecured on a public IP because it's so easy to configure that way.
I expect it to always work and it always has. I really like it but am worried I trust it too much now. Please tell me I'm fine to trust single instances!
Redis cluster is more about availability than it is about consistency. If you are aware of that, it's a fine solution.
A couple of things I do with it:
- buffer log messages before we send them to elasticsearch via logstash. This is a simple queue. Technically it's a single point of failure but worst case we lose a few minutes of logging. This happens very rarely and typically not because of redis. This node is configured to drop older keys when memory runs out. We did this after a few log spikes killed our node by running out of memory. Since then, we've had zero issues.
- we have a few simple batch jobs that we trigger with an API or via a timed job in our cluster. To prevent these things running twice on different nodes, I implemented a simple lock mechanism via redis. Nodes that are about to run check first if they need to and abort if another node is already doing the same or recently completed doing this. This does not scale but it works great and I don't need extra moving parts for some routine things that we run a couple of times a day.
- some of our business logic ends up looking up or calculating the same things over and over again. We use a mix of on server in memory caching and shared state in redis for this. Keys for this have a ttl; if the key is missing the logic is to simply recalculate the value.
Once you have redis around, finding more uses for it is a bit of an antipattern. It does queuing but you probably should use a proper queue if you need one. It can store data but you probably want a proper database if you are going to store a lot of data, etc. It's great for prototyping though. Use it in moderation.
In my case they overwrote ~/.ssh/authorized_keys, /etc/group and /etc/passwd as well.
Most people run Redis in-memory only (in my experience, at least). Those that don't usually sync to disk only periodically, whether they intend to or not.
The only problems that crop up from that pattern are that many users (especially new ones, or people who haven't worked with Redis before) forget that it's fundamentally an ephemeral cache. Eventually maintenance or failure drops the in-memory dataset, and then a wide variety of disasters occur because it was being treated as a source of truth, or as a datastore with durability.
In situations where the ephemerality of in-memory data was consistently known (or when disk persistence was configured with some thought), I have had the same experience as most others here: Redis was one of the most reliable, least surprising pieces of infrastructure present.
...except for TTL handling with read-only replicas, I guess. That behavior (TTLs can get ignored on replicas) was really rough and surprising, but is fortunately now fixed. Shame on me for running an old enough version to keep getting bitten by it.
I've tried to keep it decoupled from TXR internals, and to abstract it from the OS a bit, so there is a considerable "struct lino_os" interface now.
There is a dependency on a "config.h" which provides some HAVE_* constants.
There is a user guide to the REPL in the TXR Lisp manual; that provides context for some of new interfaces, like what is the lino_set_atom_cb function for.
Super stoked the developer finally released the "output as odds" feature (though my favorite output setting is definitely still "matter-of-fact tone with a hint of strained patience").
I've never had to use the software myself, but I always read Antirez's posts and his interactions on here. I think it's a great example of technical leadership.
There is a Mark Twain quote that reminds me of this:
“In the beginning of a change the patriot is a scarce man, and brave, and hated and scorned. When his cause succeeds, the timid join him, for then it costs nothing to be a patriot.”
(edited with the full quote)
I guess some of all those hosts is doing something that goes haywire and results in this problem.
Would be cool if we could downrank pages like this somehow. Maybe tell the submitter "The page loads scripts (and therefore sends data to) 117 different servers. Please be aware that we will not show it on the front page until it has at least 117 upvotes".
There is lots of information on optimizing images and whenever I speak to designers they always say "don't worry, we optimize the images", which is great of course, but they completely overlook all the JS. Many websites don't only load more JS in terms of raw file size, but the parsing of all this JS then, again, takes more time than loading all of CSS and images, combined. And for what? Usually some annoying pop-ups, image sliders or some chat-box. In fact, those chat-boxes themselves load complete web pages in the iframe they are created in. I also see websites loading multiple versions of jQuery.
Anyway, 117 external requests for anything, let alone a blog, is insane.
Yeah, let's code like it's 1970's and multi-core CPUs are Sci-Fi ;)
Also, Redis source code is so much cleaner than memcached!
It's almost like less sometimes really is more, and that prioritizing simplicity of implementation does more than just make code maintainable--holding that value as a developer can "leak" all the way up to the reliability of your software, in the best possible way.