First, a key limitation that every architect should pay attention. Redis reaches the limits of what you can do in well-written single-threaded C. One of those limits is that you really, really, *really* don't want to go outside of RAM. Think about what is stored, and be sure not to waste space. (It is surprisingly easy to leak memory.)
Second, another use case. Replication in Redis is cheap. If your data is small and latency is a concern (eg happened to me with an adserver), then you can locate read-only Redis replicas everywhere. The speed of querying off of your local machine is not to be underestimated.
And third, it is worth spending time mastering Redis data structures. For example suppose you have a dynamic leaderboard for an active game. A Redis sorted set will happily let you instantly display any page of that leaderboard, live, with 10 million players and tens of thousands of updates per second. There are a lot of features like that which will be just perfect for the right scenario.
> One of those limits is that you really, really, really don't want to go outside of RAM. Think about what is stored, and be sure not to waste space. (It is surprisingly easy to leak memory.)
You can have massive amounts of RAM these days. You’re sooner to hit big-O limits from bad architectural decisions than run out of memory. If you do get to that point you likely have enough value in your usage to justify scaling out further and sharding.
> And third, it is worth spending time mastering Redis data structures.
Bingo. The true secret to properly using Redis: understanding the big-O complexity of each operation (…and ensuring that none of your interactions are more than logarithmic).
You can have massive amounts of RAM these days. You’re sooner to hit big-O limits from bad architectural decisions than run out of memory. If you do get to that point you likely have enough value in your usage to justify scaling out further and sharding.
Absolute disagreement.
It is very easily to accidentally leak a few hundred MB per week in a busy Redis system. The code will look and work fine...at first. It is correspondingly hard to track down and clean up the leak a few months later. (Particularly if there are multiple such to track down.) Yes, you can go for years just buying larger and larger EC2 instances. But that will also come with a shocking price tag.
I know of a number of organizations that this happened to. And pretty much every bad Redis story I hear about had this as a root cause. That is why I brought it up as an important consideration.
Redis excels as a memcached alternative with some useful operations. Where people get into trouble with redis is treating it as a persistent data store, when despite it's ability to replicate and persist, redis has some constraints you need to work within. At best think of redis as something that can hold a materialized view, but where it can become corrupted at any random time, so you'll need the ability to rematerialized it from something else. And second, you absolutely have to be conscious of how close you are to ram limits.
Redis is production-ready and it has a lot of features to help you track down problems with either memory or CPU usage. For example: `redis-cli --bigkeys` will help you find the very large keys. For smaller keys that occur too often, sampling a few hundred keys should be sufficient to help you find what type of keys are taking more space than necessary.
Once you get the Redis database designed well, there is a lot of things you can do before hitting the limit where you can't install any more RAMs onto a new machine. For example, there are no more than a billion .com domains out there. Say a single record takes 100 bytes on average, consisting of the domain name and a glue record pointing to the IP of its authoritative DNS server. Then it takes just 100GB of memory to store enough information to handle all queries to .com domains in the world. It's not so hard to obtain a machine with 768GB memory these days, and 2TB machines are not uncommon.
Redis starts to have issues at high scale, even on sophisticated hardware, that can be quite difficult to debug without a lot of additional effort and storage. It’s not just memory, but odd behavior (e.g. randomly dropped connections) with a lot of connected clients, or hot keys/nodes in a cluster configuration, etc.
These issues can exist in any system, but in my experience it’s especially tough (relatively) to identify and diagnose them with Redis. Once you add lua script usage it can get even worse.
A traditional RDBMs or filesystem is designed for high throughput and concurrency, even if some tasks are blocked on data. Additionally both have options to partition steadily growing things. If needed with old partitions being moved to tape backup while the server continues running.
Redis is a single threaded program acting against RAM whose philosophy is that it does things fast then moves to the next job. If it needs to access memory that got paged to disk, the whole server stops and waits to get it. Nobody can do anything.
Because Redis doesn't have to deal with locking and concurrency, it can run much faster on the same resources. But when concurrency is required, it is stuck because it doesn't have it.
There are workloads that will saturate a redis instance's CPU: using it as an LRU cache, eventually you will hit the configured memory limits and adding new keys will require finding old keys to delete. Eventually it may also require redis to do memory defragmentation which can be fairly intensive.
Nothing but lots of small (~100b) pipelined SETs and a small number of GETs here and there. Only 10MB/s but at 100k SETs/sec redis’s CPU core sits at 60-70%. Active defrag can easily send it into a death spiral.
> Replication in Redis is cheap. If your data is small and latency is a concern (eg happened to me with an adserver), then you can locate read-only Redis replicas everywhere. The speed of querying off of your local machine is not to be underestimated.
Do you face any consistency issues with doing this?
No. Replication time was measured in hundredths of a second, and Redis operations are atomic. So all queries got a consistent view of the data, and the lag to update was very reasonable.
It depends on definition of consistency, but it is not strongly consistent in theoretical terms[0]. But the ordering of update is guaranteed to be same, so if master is guaranteed to be internally consistent, so is the replica. And that property is enough for almost all usecases, except for maybe transactions.
During partions you may as well throw the play book away. You could have minority writes on both sides of the cluster and a big nadda to reconcile the two when they're mended. Redis is a great ssytem for what its built for and for the trade-offs that it makes to keep itself fast and lean. Redis is not CP and it will probably never care to support it. If data resiliency and correctness is important to you, Redis alone isn't sufficient. Several years ago, we tried sentinels mostly to avoid large costly rebuilds when an instance went down, and though it usually worked just fine, we certainly had single network disruptions large enough to throw off the cluster enough that required a manual rebuild.
For that application, there really wasn't. The results of the read were not used for writes, and the latency from when information was published to available was on par with the time a request to the master would have taken. The time from data published to available was faster than the time to switch tabs in a browser and manually check.
But your requirements will depend on the application. Financial transactions need explicit locking logic and atomic operations. Such as is provided by SELECT ... FOR UPDATE in SQL. So another application could have more demanding requirements. Which is why, in addition to answering whether I encountered problems, I gave the actual performance characteristics. So that anyone planning their application can know whether this is a good enough solution for you.
At desktop resolution, the floating table of contents menu blocks out two of the (excellent) illustrations (second and second-last). Deleting aside.toc was very helpful.
Yes, I would suggest increasing z-index on images so they pass above the toc. Adding a large dropshadow to the images the same color as the background would make it look like it fades out as it passes by. That's what I did for our blog that has a similar floating TOC + images that escape the text width.
I haven't read it, but the presentation is really beautiful! Is it possible to make it printer friendly by any chance? Default browser PDF is a mess. Long form content like this is much easier to read printed.
This is said to have a 10μs latency in the chart. But I'm fairly sure that is a calculation of bandwidth based on 1KB / 1GBps
10μs is about 3Km, so at most a 1.5Km round-trip.
For a chart labelled latency, I'm surprised to see bandwidth calculations included. Any network hop would actually have far greater latency, if nothing else because communication typically involves more than a single round-trip for acknowledgement, etc.
It might be worth making it clear some of the numbers are about bandwidth not latency.
The simplest scenario in the article is a single Redis instance residing on the same machine as the application. What's the benefit to this versus just storing data directly within the application?
Storing it in a different process at the very least lets you restart/deploy changes to/crash your application without losing that data. Or building your own replication/in-place upgrader/well you're just screwed if it crashes.
Sometimes your application has multiple instances even on the same machine. Scripting languages like Node or Python don‘t share memory. With Redis they can share state in a high-performance manner.
Your application and runtime are probably tuned to act as servers, with short-lived requests and little or no persistent stage, and they may not play well with keeping a bunch of persistent data around forever.
I personally first reached for Redis when I needed to asynchronously process a bunch of JSON uploaded by clients via POST. I initially just stuck them in a ConcurrentQueue in memory, but no matter how much I fiddled with HostedServices and BackgroundWorkers and whatever the MS documentation recommended, the ASP.NET Core app would occasionally 'lose' that queue before it could be consumed (or the consuming loop would get stuck, with the same result).
You are also probably running your app on a pretty high-level language, with bytecode and reflection and all that nice stuff - if not even an interpreted language - while Redis is raw C code and will outperform your homebrew double-linked list or hash set.
Storing the data directly inside the application still means you need to store it somewhere, likely a SQL database (such as PostgreSQL). These databases are insanely well engineered and very very fast, but compared to a key value store such as Redis and Memchached they are comparatively slow and resource hungry (because they are optimized for different things).
So if you can fetch some cached data from a Redis key, even if on the same machine, it will cost you significantly less than querying a relational database.
Not all applications can store data out of the box. For instance some ways of PHP have embedded caches, some others don't have cache by default and you would need to install cache software (for instance apcu).
Also, redis has many different types of data. For instance coding something similar to its "hash" data type is not trivial.
Redis persists on disk (well, it's optional), if you restart your server I'd assume that it'd be able to restore the disk data into memory, versus your applications memory, which would just be lost.
I'm not a Redis user, but that's based on what I've read
Really love this style of writing. Pairing the diagrams/illustrations with the easy to grok copy is really helpful for folks like myself who have been mainly focused on the front-end.
What tool do you use for your diagramming, is it all hand-drawn?
This is what I do for my presentations (on a wacom of course). I have gotten grief over it, but I work faster and get less distracted by the eccentricacies of powerpoint and figma. My handwriting is abysmal so I will do that in a "handwriting font" to sort of look hand drawn. Even if I have to convert them later for some big wig, at least I have my "rough draft". Plus it all feels a little more human.
It's a new article so it's relatively easy to explain:
HN automatically combines submissions so that subsequent submissions count as upvotes for the first submission.
If a popular source posts a new article, users will "rush" to post it to HN to reap that sweet karma and the winner will "catch" the upvotes of the others.
and a handful of posts by dang, sama, pg, etc. over the course of the years. most of the rest is what long-time users have just figured out through observation. There's a Git repo[1] out there that aggregates a lot of that stuff, but keep in mind that it's technically unofficial. That said, I think most of what's there is widely considered to be correct.
Thanks, that's a good summary of what I've seen referenced throughout my years here.
I can't find any reference to something like "combine the scores of new submissions of the same URL to the first submission's score" though. I guess that's either new information or incorrect.
I can't find any reference to something like "combine the scores of new submissions of the same URL to the first submission's score" though. I guess that's either new information or incorrect.
I think that falls into the "noticed through observation" bucket. I'm relatively sure that it is correct, as I've noticed that behavior myself. But, I have no official standing here and I could be totally wrong. But that sure seems to be what happens in my experience.
That's absolutely possible. This particular pattern has seemed pretty consistent over the years, but unless somebody from the HN admin crew chimes in, I guess we'll never be 100% sure.
You can try this yourself. Go to the ‘new’ page and submit an existing URL. You’ll be redirected to the existing post which will now have at least one more vote.
The saltiness isn't a good look here. Especially seeing as he's not the poster.
It's the HN algorithm which is probably due to the fact that other posts from his domain have done relatively well, plus the actual poster here has quite a bit of karma.
We use both MemoryStore and normal instances.
The latter for a use case where the data is shardable and so we run a redis process on each core and the client picks the right one. It saves a lot of money over using MemoryStore.
It also saves you from Google performing maintenance on the machine and deleting all your Lua scripts.
KeyDB is becoming increasingly popular though.
The biggest problem with Redis, at least in C++ land, is the client libraries. hiredis doesn’t support Redis Cluster, and other 3rd party clients that do are of unknown quality.
I've been using UpStash's serverless Redis offering and it's worked super well for my needs. Scales to zero/free which was nice for getting started, and using their http SDK didn't need to worry about concurrent connection limits when calling from simultaneous cloud functions. & not a second of downtime in the few months I've used it so far.
Want to move more of my app's datastore to Redis now that I've learned more about sorted sets etc.
To be clear, I am not affiliated with that web service other than as a now happy paying user. I only replied with my experience getting started running Redis on it since that was GP's question and I found it useful while first learning Redis and now in production.
I'm not too familiar with redis and this may well help, so thank you.
I see some data-types on the right. It surprises me that redis doesn't have a numeric data type. I understand that at its heart it is just a key-value store and doesn't ever need to do range-based lookup but it still surprises me.
One consequence of "everything is a string" I've run into (although probably a sign I'm "doing it wrong"), is serialisation overhead in the client.
If redis is expecting strings then it's left to the client to choose an appropriate serialisation which can have either performance or other pitfalls.
As someone who doesn't code for a living but teaches it to mostly novices, this helps (because before this I had no clue what it was except that it had something to do with databases.) Typically for my courses we just use some flavor of SQL and call it a day (and that kind of spoils us because of how declarative it tends to be) -- roughly, what's the "explain like I'm 10" use case for Redis over something else? From what I'm seeing, it's mostly an "efficiency" thing?
* you're building a web-ish application and need to store session data
* you don't want to go through the overhead of building a strongly typed relational table
* you know minimal operations stuff
* just use redis, its easy to deploy, easy to code for, and available on all major cloud platforms as a managed service
---
The problem is there are tradeoffs and session storage becomes a fundamental architectural decision once your application matures. So something you added as a once-off so you can get back to feature development is now a foundational pillar.
I have worked at places where every page load hits the database, and we've scaled ok, mainly because it was b2b stuff.
However a simple redis instance in front of the database serving as a readable cache changes the rules of the game significantly - depending on the complexity of your calculation and your end result subsequent "page loads" or whatever you are doing can be tens of thousands (or more) times as efficient, and if you decide to use an expensive database or a cloud database this can help you a lot.
Eventually the hard part is you might have bugs in synchronizing the state of redis and your database, look to existing implementations for your stack instead of reinventing the wheel.
Relational databases are optimized for typical operations over data structured in tables. So, joins and records. However sometime you want something simpler - like LIFO queue - and wouldn't mind to have is faster. Redis allows to have this - the variety of data structures it has is much bigger than with relational databases. They (Redis and RDBs) both have their uses, of course. Ideally you would structure your system to use one of them where appropriate according to data requests.
This is great, the visual explanations work really well
One thing that threw me off is that it says for an SSD a random read is 150μs, but 1MB sequential read is 1ms? Shouldn't sequential reads be faster, or are two different read sizes being compared or something? If so, the ambiguity may confuse some people to think random reads are faster
The way the Flash Translation Layer works is complicated, but long story short, there's still an advantage to sequential reads and writes on SSDs. The difference in latency and throughput isn't as dramatic as with spinning disks, but is still there. Random vs sequential writes have big implications for the long term health and performance of the SSD.
I'm building a new website and am using sidekiq for background job processing which relies on redis behind the scenes to store all the job data. I configured a high availability redis instance with `maxmemory-policy noeviction` to ensure no data is lost.
The website is still in its infancy so not thinking about scale for the next little while but curious if you have any tips or gotchas to keep an eye out for. Thanks!
I would ensure that the data size is managed if you hit the limits due to your policy Redis will stop responding to ensure the data it has isn’t lost. I would also turn on some sort of persistence for data recovery in case of catastrophic failure. Early on this is totally fine and I would setup some monitors in redis data size relative to memory and try to keep 20% overhead weird things start to happen when systems are memory constrained.
I am thinking of using Redis as a lightweight queuing mechanism. An event source will MULTI a small amount of metadata as a hash and append a list. Event sinks will BLPOP the list and retrieve and delete the metadata key. One requirement is the events survive power loss.
Is there anything inherently wrong with this? Gotchas? A mockup I've done works great so far.
In case the event sink crashes or the connection to Redis is lost, you could lose events. Redis Streams are better designed for use cases where more reliable delivery is needed and have a ton more features, though it comes with more complexity.
RabbitMQ. It's so cheap and easy to startup a super performant queuing broker with docker these days. And the libraries are all there, async ready and with established patterns. Closest to zero code you can get for this. You'll likely end up reimplementing all those patterns and support around them using redis.
If you want something quick and easy and dirty, go with Redis. But switch to Rabbit when you start having to write a lot of handling and other code.
I've been looking into tech stacks to make a collaborative editor and Redis CRDTs come up a lot. IIUC this requires a Redis db running in each users machine and they connect P2P with each other. Do I understand right? Anyone have good resources for this? I've also seen Riak come up as an alternative. Do they work similarly?
The only person stand out to witness a use case is a adserverer, I read the 1st 100lines of comments.
It is like california highway system particularly when I witnessed, the billboard is very outstanding. The jams an pits, people are very nice to them.
Not to make this an ad, but you can actually do better with Redis Enterprise using Redis on flash (part of the flexible and annual plans). It stores hot data in RAM and "warm" data in flash. Here is a good 68s video on the subject: https://www.youtube.com/watch?v=hFQnhPstqLM
It's not clear to me why it makes sense to use both RDB Files and AOF on the same Redis instance. Seems like AOF would always be the more accurate source of truth here. What am I missing?
The white cube in the traditional usage example - what does that represent? App code? Or so that cache miss to db implemented in some standardized way?
First, a key limitation that every architect should pay attention. Redis reaches the limits of what you can do in well-written single-threaded C. One of those limits is that you really, really, *really* don't want to go outside of RAM. Think about what is stored, and be sure not to waste space. (It is surprisingly easy to leak memory.)
Second, another use case. Replication in Redis is cheap. If your data is small and latency is a concern (eg happened to me with an adserver), then you can locate read-only Redis replicas everywhere. The speed of querying off of your local machine is not to be underestimated.
And third, it is worth spending time mastering Redis data structures. For example suppose you have a dynamic leaderboard for an active game. A Redis sorted set will happily let you instantly display any page of that leaderboard, live, with 10 million players and tens of thousands of updates per second. There are a lot of features like that which will be just perfect for the right scenario.