Hacker News new | past | comments | ask | show | jobs | submit login
Redis Anti-Patterns (redis.com)
186 points by nosqlseek 3 months ago | hide | past | favorite | 75 comments



Am I the only one that misses the honesty and heart of Antirez's blog posts from back in the day?

This is basically a low-quality ad for Redis Enterprise, focusing on the shortcomings of their main competitor: open source Redis. Except the folks trying to benefit from Redis' shortcomings is also in control of that project... what could go wrong?


Until I read your comment, I had no idea that antirez left the Redis project last year.[0]

> I write code in order to express myself, and I consider what I code an artifact, rather than just something useful to get things done. I would say that what I write is useful just as a side effect, but my first goal is to make something that is, in some way, beautiful. In essence, I would rather be remembered as a bad artist than a good programmer. Now I’m asked more and more, by the circumstances created by a project that became so important, to express myself less and to maintain the project more. And this is indeed exactly what Redis needs right now. But this is not what I want to do, and I stretched myself enough during the past years.

I imagine that this is a common sentiment on HN that many of us have felt from time to time. Good for him that he was able to act on it. I'm looking forward to seeing what else he produces in the future; Salvatore is a true genius.

[0] http://antirez.com/news/133


The open source project is technically maintained by a separate group, https://redis.io/topics/governance, which just happens to include a lot of people from Redis ltd. Redis ltd. should definitely do a better job at documenting features without it coming across as an ad for their paid offering.


I think it is a good quality ad as they don't present Redis Enterprise as the only alternative.


Yeah, i am agree with you


Yep. Fully agree with you.


I don't miss the weird misogynistic elements of his writing at all.

It's a shame that technical blogging in general has faded away so much though.


Redis is fast and great and I use it and love it a lot. That said, here are my top two:

---

Redis is best for simple key -> value use case

Redis is a key value store, typically id -> entry, great for caching stuff etc.

But if you're not ABSOLUTELY FING CERTAIN you will NEVER need anything more than that, as fast and great as Redis is, it's probably not the right tool for the job.

Take it from someone who begrudgingly implemented a userId -> user store using Redis, pointing out that a regular db would be a better option as it's all but certain there will be a need for email -> user, username -> user etc etc indeces

and then had to manually implement those indeces into the same Redis when, lo and behold, they became needed

gaaah

---

Redis often isn't meaningfully faster than MySQL, even for key -> value lookup

An AWS MySQL RDS that is sized to contain the entire db in memory and has the correct settings will also be de facto fully in memory and just as fast as Redis but now you're getting a whole relational db with all the added benefit that entails

it will also be vastly easier to grow as required (unless anything has changed since I last did this, if you reach the limit of your Redis instance, you have to create a one with more capacity and manually migrate all the entries (including real time changes) into the new one which is a huge pain)


As someone who recently migrated a project off of redis-as-a-main-store, please use a real database. I found that Postgres could do the same requests in about 2x-7x the time depending on what the requests were and this was without heavy optimization (if you limit the object sizes you could actually get away with covering indices which would probably get even closer).

Also, KeyDB[0] looks like quite the project. I used it when testing backups and some other stuff and it was quite easy to use, scaled to the compute well, is able to easily use flash (SSD/NVMe), pretty nice.

[0]: keydb.dev


  An AWS MySQL RDS that is sized to contain the entire db 
  in memory and has the correct settings will also be 
  defacto fully in memory and just as fast as Redis but now 
  you're getting a whole relational db with all the added 
  benefit that entails
Are you making this assertion based on your practical experience or making a theoretical point. If it is the latter then probably it might not be correct. This is what the book Designing Data Intensive Applications from Martin Kleppman has to say:

Counterintuitively, the performance advantage of in-memory databases is not due to the fact that they don't need to read from disk. Even a disk-based storage engine may never need to read from disk if you have enough memory, because the operating system caches recently used disk blocks in memory anyway. Rather, they can be faster because they can avoid the overheads of encoding in-memory data structures in a form that can be written to disk. (Chapter-03 Storage and Retrieval)

So it seems even the usage as you described the DB lookups won't be faster than Redis lookup. I haven't fully understood the reasoning author provides (any pointers providing more explanation are welcome). The book provides the paper https://dl.acm.org/doi/10.1145/1376616.1376713 (OLTP Through the Looking Glass, and What We Found There") as the basis for the above assertion.


Practical experience!

Traditional db def wont be faster than Redis. (at least not in my experience)

But what I'm saying is traditional db also often won't be meaningfully slower than Redis either, even for key -> value lookup where Redis shines.

Overwhelmingly, when I need to store and retrieve anything, I reach for traditional dbs first by default, unless there's good reason to use some other technology. If and when load starts to become a problem, tweaking the settings, adding more read replicas etc is often enough. But if it's not, it's always perfectly possible and simple enough to add Redis on top. I feel like that gives you the most bang for your buck and the best of both worlds. But YMMV!


Even a disk-based storage engine may never need to read from disk if you have enough memory, because the operating system caches recently used disk blocks in memory anyway.

OT, but this is the premise behind Kafka's log storage as well. Which is why it's great for sequential reads from the front of the log, and not as a general-purpose DB, which too many people want it to be.


The blog reiterates that using Redis as your primary database is totally possible, and a good idea even. I think that's the biggest Redis anti-pattern out there. I would never put data in Redis that I wasn't comfortable losing at some point. There are way too many other perfectly good, OSS options designed for durable storage (and the access patterns that come along with "primary databases").


People always repeat this but I have never understood why. Is there any reason to believe that a well configured and maintained instance of Redis is more prone to data loss than a similar setup of any traditional database software?


> Is there any reason to believe

Yes, by the very nature of how redis implements persistence (RDF snapshot format + AOF) compared to traditional btree-backed storage with WALs.


Could you explain why that has a bearing on the risk of data loss? Note that AOF is a WAL.

(Also, I think you mean RDB – unless you're talking about something else entirely of which I'm unaware?)


> Note that AOF is a WAL

The redis AOF is written to after the command is executed and in-memory data is modified. It is not a WAL.


Ah, fair enough, that's true. It's written to the AOF before processing the next query (https://github.com/redis/redis/blob/unstable/src/server.c#L4...), so I believe you get read-your-writes consistency per node (in terms of the AOF), but I suppose it is technically written afterwards rather than before.


> well configured and maintained

Well there's the rub -- table stakes for traditional database software (the "boring technology") is not losing your data. Redis has AOF which can be tuned and this SO post[0] is a good intro to how they work and the different settings involved. Pieces of software like Postgres have settings that get you to dont-lose-my-data on day 1.

Also people generally rant and rave about the performance of Redis when AOF-write-on-every-write (aka WAL) is not turned on. It also kind of annoys me how AOF is caleld AOF when there is a very good well known term for this already -- WAL.

[0]: https://stackoverflow.com/questions/40939756/difference-betw...


The redis AOF is a post-write log, so it's not a WAL.


Ah thanks for the correction, it's not a Write Ahead Log at all, and they made the right choice in not overloading it to mean Write After or anything silly like that. That irk was over nothing.

Given that being true then there is no way to run Redis without the spectre of data loss, period -- just a question of how much you are OK with possibly losing (which to be fair is probably fairly small with most healthy systems).


Redis as a main store is acceptable if you are ready to lose all data between two writes to disk, and you don't care about when data is actually written. It might be ok if all you're storing is likes and votes, but probably not if you're storing posts and comments.


Not a db expert. How would such failure occur? How other db would handle it? It's not like other db's don't get corrupted, you can search `database-name corrupted` and find people trying to get help, usually the suggestion is use a backup instead of trying to fix it, so what does it matter how often it fsyncs?. Would PostgreSQL help me in OVH data center fire?

What does it matter if it writes before or after, the op was either success or failure, and then the server crashed with whatever state was in RAM (which is moot by now, since the server is down).


It's all about minimizing the risk of data loss. The risk is always there, but some DB are much safer than other.

For example, Server crash is a much rarer issue than process restart. In Redis, if the process supervisors (systemd, docker, k8s, etc..) decide to restart your Redis process for whatever reasons, you risk losing ACKed data, because Redis ACKed before the data is written to disk.

In a safer database system, the data is not ACKed until it can safely survive a process restart. Furthermore, many database systems can be configured so that the data is safely persisted in multiple servers before ACKing, and therefore survive even a server lost incident.


> 11. Storing JSON blobs in a string

Oh boy. Turns out 95% of the usage of Redis is an anti-pattern.


Eh, "anti-patterns" are like "best practices". It really all depends on the actual situation (and hard data). I like to think that programmers get paid the big bucks to actually think, not just do what a blog post tells us to do (and this one seems to mostly be telling us to pay for Redis Enterprise).


> I like to think that programmers get paid the big bucks to actually think, not just do what a blog post tells us to do

I feel this is a very cynical take on the reality of the software development world.

How much "thinking" anyone does or can do is completely irrelevant if they have no info to start with. "Thinking" only works if either a developer is omniscient, or managed to gather enough reliable info to be able to make decent decisions. As omniscience is not a thing, the next best thing is to gather reliable info from reputable sources.

Now, pray tell, who is expected to have more insight onto how to use Redis the right way than the Redis guys?

I understand that there is a tendency for software developers to think everyone else is an idiot and they are the only ones whose own decisions are good and well justified, but that's just your narcisism getting the best if you. Everyone is operating on a time budget to onboard onto things and deliver results, and if you don't have the benefit of being omniscient or drawing from the prior experience of someone who happened to do things in an acceptable manner then the best thing you can do is read "blog posts" from the project showcasing best practices, isn't it?


This comment is quite ironic. A condescending comment against perceived condescending comment.

To answer your question.

In marketing there is a rule, the more your audience knows the less effective marketing is.

While trying not come over as condescending. If you study cs data structures and distributed computing. Then the first comment is obvious.


I agree. I much rather like the term "convention" or in specific cases "style guide".

It is more honest to call it that way and more _useful_. Because it is much clearer that you're following a convention, what the positives are and specifically what the grounds for discussion around it is.


HASH sounds pretty cool though. Also is "marshalling" JSON really a term that people use instead of deserializing?


Marshaling is serializing.


Based on a Google search of the term, it seems that Go uses that terminology, but others might not...


> In computer science, marshalling or marshaling (US spelling) is the process of transforming the memory representation of an object into a data format suitable for storage or transmission

https://en.m.wikipedia.org/wiki/Marshalling_(computer_scienc...


Immediately followed by

> Marshalling can be somewhat similar or synonymous to serialization. Marshalling is describing an intent or process to transfer some object from a client to server, intent is to have the same object that is present in one running program, to be present in another running program, i.e. object on a client to be transferred to and present on the server. Serialization does not necessarily have this intent since it is only concerned about transforming data into a, for example, stream of bytes. One could say that marshalling might be done in some other way from serialization, but some form of serialization is usually used.[1]

I'm most familiar with the term "marshaling" from COM, where it is quite distinct from serialization.


In Java-land, marshalling is used in xml libraries, and serializing in json libraries. Wonder why they ended up differently.


Python calls it "pickling"[0] which is also pretty cool.

While others have already given examples re "marshalling" I'll go ahead and add one to the pile: Ruby also uses this term[1]. However, in my ~10 years of working with Rub daily, I have never Marshalled anything, nor have I known anyone who has. On the contrary, the few pythonists I talk to on the regular seem to be pickling all the time.

...but I seem to be maybe getting off track here.

[0] https://docs.python.org/3/library/pickle.html

[1] https://ruby-doc.org/core-2.6.3/Marshal.html


Pickling is just one specific type of marshalling, so the terms are not equivalent. I'm not familiar with Ruby, but from your link it's clear that the Marshal library is also for one specific marshaling format.

By the way, pickling is great for transmitting data at runtime between processes that you own, because it can marshal almost any Python object and it's very fast. But it's bad at storing data (on disk or in a database) because it can execute arbitrary code and the format of the objects can change between package releases. That's quite different from marshaling as JSON, which is a very stable format supporting a deliberately restricted set of types.


You're right—my bad.


Typically, one wouldn't use the term 'pickling' for JSON however, it's usually used for a more general serialization process. Most of the time I just hear people talk about 'loading' and 'dumping' JSON, also from the Python JSON library.


C++/COM used the term marshalling a lot if I recall.


The term is quite commonly used in Java world for serialization, mainly DTOs (e.g. JAXB).


> Hence, it is recommended to use HASH data structure and also, RedisJSON module.

There's a catch. HASH data structure saves everything as STRING. Your {"count": 1} would turn into {"count": "1"}

> Hence, it is recommended to use a SORTED SET

SORTED SET will eventually leads to "Large databases running on a single shard"


Eh, I use hincrby and hincrbyfloat to increment an integer in a hash..


I think the point was that you lose the type information provided by JSON when you store the data in a hash. There is no way to tell the difference between a string "1" and an integer 1 when stored as values in a Redis Hash, since the integer will be converted to a string.


Please and for the love of all that is good, use a proper database to store sensitive and critical data.


The fixed title and search box at the top take up an awful lot of prime screen area on mobile. Made it really hard to read. Redis doesn't need to advertise to shoulder surfers on the subway.

Edit: to comment on the content, the last two points seem to say don't store JSON as strings, and then DO store JSON as strings. Is there some more clarification out there on when to use either approach?


There were a number of sentences that just didn’t make sense to me. Not sure if it’s cuz I’m tired or if it was written by an ESL person or what.


Yeah, I assume English was not the author's first language, but tried to read through it charitably with that in mind.


Still can't believe Redis doesn't support TTL on items inside a list. A queue of expiring items. It's the only use case I've ever had for the thing.


We ended up implementing that in KeyDB. The Redis team always thought it would overcomplicate the code, but it’s been a long standing request.


> Running Ephemeral Redis as a primary database

s/Ephemeral//

Redis is a fine cache, but using it as the primary data store is the way of pain.


+1 in general.

However, we used Redis as a session store and we were OK with users going through the pain of re-login if Redis lost the data.

So, in a way I guess the use of the word "Ephemeral" is correct IMO.


I hate that they keep repeating the "Most Loved Database" awards. I wonder how many people love Redis as a Cache, but not as a main database.


Missing dark pattern: Ad hoc schemas.

The handful of Redis-using projects I've joined all had values referencing other keys. And logic that depended on those references. So much needless heartache.

Redis is key/value, aka NoSQL. Meaning no schema, no references.

Initially, I wanted to hate Redis. Despite its widespread misuse, I came to love it. Like with many things, looking past the fans to see a thing's true nature takes some work.


Can't believe Redis doesn't support TTL for items in a list. It's literally the only thing I've needed from Redis.


Curious to know what your use-case is for this feature


Work queue with per item deadline?


The comment section seem to hate on Redis Enterprise, but FYI ReJSON and RediSearch are open source and are good addons / plugins.

Why implement a JSON parser in your code when redis can take care of it? And with search added you don't have to keep track of multiple ordered sets or secondary indices. You and insert a JSON and do all sort of operations on it (basic No-SQL operations).

I think it improves dev-ex a lot. Can't wait for Redis + JSON + Search to be production ready


> ReJSON and RediSearch are open source

They're not. I just checked.



As I said, I checked them. They're Source Available. Not Open Source. Read the licenses in the repos.


Ah, my bad. Looks like they are not open source. But you can use them if you are not building a database product (as mentioned in their license). Here the downside is we can't have a managed service. (Or maybe we can if we can load the module to a managed service, haven't tried that)


We ended up implementing signaling of hotkeys in our fork of redis. is the cli thing really that useful for anything beyond confirming a key is hot?


I can't come up with a concrete use case for using multiple keys to access the same data (#8).

I'm trying to wrap my head around an example of a single value that would be accessed ~1-million times a second that would be cached and served by Redis, and not cached closer to the consumer (such as a web server).


One type of data that can beneficially be exposed under different keys is information pertaining to financial instruments (e.g. instrument reference data or market quotes). Some users may wish to look them up using an ID assigned by Bloomberg (a FIGI). Some other users may instead be familiar with an ID assigned by Reuters (a RIC). Or an ISIN, a CUSIP, or a SEDOL, and so on. In some cases it's simpler to store the data under multiple keys than to build "joins" into the client (given we don't have the luxury of RDBMS SQL logic).


Financial data is a great example. Occasionally companies change their ticker symbol. multi-key access for the same data means that using everything existing under the covers the same data can be shared, graphs generated, etc - the very next morning when FOO changes to BAR.


They don't mean have multiple keys storing the same data, but rather create your data schema in such a way that keys are accessed more uniformly.


Can you explain, I'm not seeing how you get that from the anti-pattern.


Who says it’s an external consumer asking for the data? Lots of backend services have asynchronous processes or batch jobs that are themselves essentially the consumer.


Right.

And it's not uncommon for backend services to access internal data via REST endpoints.


But do I want to store data in my microservices? Esp if I've scaled them horizontally. Maybe I have 4 or 8 or 20 instances, I don't want 20 copies of my cache. I want one copy in Redis. I might also want shared evictions/TTLs or other semantics on the data.


> But do I want to store data in my microservices?

That is a trade-off, and gets to the meat of my question.

If I understand the anti-pattern recommendation, if you have a single data item that needs to be accessed at a very high throughput rate, if you choose to use REDIS, it's best to have n-copies indexed by different keys, in order to properly distribute the data across multiple-shards to have that specific through-put.

The tradeoff would be having one-copy of the data (a single key in REDIS) and the complexity of caching it locally, OR have it cached at REDIS, and the complexity of managing having multiple copies of the same data in REDIS with equally distributed keys.

My original question--what would be a usecase for choosing the latter.


Oh ya good point. I understand what you’re getting at now.


Besides using “KEYS” command, there are also commands that take a long time to run that should be avoided like storing 10s of millions of items in a set and then deleting the whole set itself with a single delete command.


This is an excellent advertorial for Redis Enterprise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: