Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Looking for a technical debate: Why MOT use Redis as a primary DB?
5 points by node-bayarea on July 9, 2021 | hide | past | favorite | 14 comments
I'm looking for technical debate and I know what MySQL/PG and Redis are capable of. What I'm looking for is a debate where you *disagree* that following. I want to start with an analogy.

Gasoline cars vs. Electric cars We all know that you can use an electric battery to run a car.

But the thing is, when it comes to a gasoline car, even though it does literally have a battery, it’s not used for running the car. It uses batteries for starting the cars (generating an electric spark to light up the gas), A/C, audio systems, lights, sensors, locks, and so on but not for running the car. Instead, it relies on an internal combustion engine (ICE) to run the car.

It turns out, ICE cars are highly inefficient. Only 16% to 25% of the power that’s generated actually makes it into the wheels. On the other hand, electric vehicles provide about 90% power to the wheels! Further, EVs also have major and additional advantages when it comes to the environment, repair costs, and so on.

If you are looking from the First Principles, even though virtually most cars that are built even today are gasoline cars, the fundamental truth is that they use an inefficient system.

Now if you look at an electric car, it takes advantage of this inefficiency to build a new type of car. In this case, it simply gets rid of the complex and highly inefficient engine and replaces it with a large battery and a motor to directly spin the wheels.

------------- Now coming to the Databases...

In the traditional architecture, you have a primary database (Postgres, Mongo, etc.) and a secondary database, a.k.a, cache (e.g. Redis or Memcache). The primary DB is used to store all the data and support CRUD operations. The caching DB is used for caching, session storage, rate-limiting, IP-whitelisting, Pub/Sub, queuing, and many other things.

And if you think about it, when there is a cache-hit, we are practically using the secondary DB for part of the CRUD operations, but still not fully utilizing it as a primary database.

Does this remind you of the issue with gasoline cars? Just like they literally carry a battery to power numerous things except moving the car, the traditional architectures use things like Redis for everything else except as the main DB.

Do you see the similarities?

What if we use the First Principles thinking to do what the electric car did? Similar to how EVs got rid of the engine, what if we get rid of the slow and inefficient primary database and simply use the cache DB as the main database?

https://redislabs.com/blog/dbless-architecture-and-why-its-the-future/



One word: rollbacks

That's not really much of a "debate", but then again neither is "you totally don't need that feature we don't implement" from the marketing department :P

https://redislabs.com/blog/you-dont-need-transaction-rollbac...


I left Redis Labs a while ago and have been openly speaking my mind when it comes to the good and bad of my experience at the company.

https://kristoff.it/blog/addio-redis/

I stand by my words and in fact a good chunk of the company was not happy with my reasoning in that post (it's "better" to be coy than upfront about these things, according to some schools of thought), that said: you don't need rollbacks in Redis, whoever argues the opposite has never spent enough time learning how to use it.


I don't think that's the spirit of that blog you pointed out. Redis is quite different from traditional databases. And so things work differently and some features may be important in RDBMS, might not be that useful in Redis because of how different they are.

I think the blog talks about that in two sections. "First reason to use rollbacks: concurrency" and "Second reason to use rollbacks: leveraging index constraints" says why it's different in Redis.

Going back to the car analogy, if you are an electric vehicle, some things are completely obsolete when compared with gasoline vehicle.


Generally the idea of caches is that they contain a small subset of all your data in a type of storage that's more expensive, but faster. The extra complexity of primary databases is largely down to the fact that they have to use disk storage.

BTW, you're way off the mark with the numbers in your car analogy! I know the precise numbers are not the point, but it's really not 90% vs 25% efficiency. The oft-quoted figure of 25% efficiency refers to thermal efficiency. It's a measure not of how much power is wasted by the car, but how much of the energy in the fuel is turned into useful work. To compare electric cars, you have to consider how the electricity is generated. It's mostly generated by burning fossil fuels. It's more efficient than petrol cars because it's done at such a large scale, but it's not 90%. It's more like 40-50% afaik.


Hi there, I agree that traditionally (and even now) that's how cache works. But the system like Redis that provides that service has gotten much more powerful over the years and so you can use it as a traditional Db.

The car efficiency numbers came from Inside EVs as the article points out. I think they did a pretty thorough job. https://insideevs.com/features/392202/ice-vs-ev-inefficient-...


What you're describing sounds a lot of why NoSQL databases came into being. A cache is usually very fast at a very narrow data problem. Caches are almost always flat bits of data with no structure. They're also eventually consistent. They are purpose built data structures for a very small piece of the problem done over and over.

An RDMS is a centralized, consistent source of truth, with more normalized data, that can answer most questions with reasonable performance.

When you remove the RDMS, what happens?

- You lose the ability to answer ad-hoc relational questions (what stuff does the user own? And in those things, which of them is located in Chicago?). This means building a new feature means building a new "cache" aka a brand new data structure _just for this one use case_ that might be a one-off

- You lose a centralized PoV on your data consistency. One view of the cache says the user's item is in Chicago. Another says its en route to LA from Chicago. How do you resolve these conflicts? Are you going to build your own consistency systems? Based on what exactly? Now you've essentially built a new kind of distributed database.

- Much of the time we don't need caches. If you always have to bust your cache, the cost of constantly rebuilding a cache, just to throw it away, can greatly exceed any value from that cache. Most people can just read from the RDMS and get what they need for 90% of the use cases in most apps.

BTW I used Redis as a primary DB for a few years for Quepid. You can read the full story here

https://www.slideshare.net/AllThingsOpen/stop-worrying-love-...

Long story short, it was fine, but Redis didn't allow for much structure. There was a lot of "logical" relational data modeled very awkwardly. It made it hard to extend beyond the original data model.



Redis, at least the early versions (haven't upgraded since 2018) don't deal with bloat very well. Which means if you keep adding/deleting items, the amount of memory it will use up could easily be twice what it actually would be if you just dumped everything to a file.


Do you mean there is some kind of memory leak? Maybe you should try upgrading to the latest Redis. It's a lot more feature rich and powerful these days.


can you fit all data in memory?


You, if you look at very large customers of ElastiCache or Redis, or just companies like Twitter, they all use a very large Redis clusters to store Terabytes of data. You can also use"Redis on Flash" and save a lot of money when compared to DynamoDB and others for similar size of data. https://redislabs.com/redis-enterprise/technology/redis-on-f...


how to handle node failure? what about data persistence,

data integrity without ACID?

complex queries?


Don’t use redis for those requirements

For failure use sentinel or cluster - works great


You probably want Clickhouse




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: