> JunoDB is unmatched when it comes to meeting PayPal’s extreme scale, security, and availability needs.
It would be nice to see some benchmarks or just a mention of any kind of number. TiKV is a CNCF donated project with roughly the same architecture and has been deployed in larger clusters than 200 nodes.
I've only ever used SQL and relational databases. What are the use cases of Key-value stores? What's the canonical example of where they are a clearly the right solution?
In a nutshell, you trade in the various guarantees relational databases provide, like transactions, relational integrity, and high levels of queryability for higher performance on what a key/value store can do, simplicity of API, and some things that can be easier for the system to provide precisely because it is offering fewer guarantees. Particularly distribution... SQL is hard to distribute precisely because of the guarantees it offers.
When to use it? You must be sure you have a case where the additional guarantees relationship databases provide are not necessary, and you either need the simplicity of usage or deployment. Or in rare cases, speed, but whereas most people seem to act as if this the main reason, I consider it a relatively poor reason. A relational DB with its guarantees turned off (i.e., no relations, no transactions, tables with just a key and a value) perform fairly closely to a key/value store for a wide range of scaling needs. There is a top end where this matters, but fewer programmers need this than think they need this. Still, it is a valid concern, and if you don't need relational guarantees it may let you scale down the instance size.
Most of the cases I see where key/value stores are a good choice relate more to the simplicity than performance. I love me my Postgres, but nothing compares to the simplicity of just tossing up a Memcache somewhere and solving my problem, if that's what my problem calls for. No schema, no migrations, an API so simple my local programming language may well simply integrate it with my native associative array syntax... if you are careful to use it only where you aren't going to need relational functionality the bang for the buck can be very nice.
I'm a big fan of relational DBs. But I do still have a Redis instance running in my environment. I know that my relational DB can easily do what Redis is doing but there are two big reasons that I will disclose below for why I still use Redis.
1) The system that is reading the data from Redis is a front-end system. I have concerns with security. As a front-end system, it can be accessed from the Internet and potentially hacked. As such, I make sure this system does not have access to the DB.
2) Preserve DB connections, size (affects costs of storing backups and time to hydrate), CPU, and memory. I would really like to keep my DB as trim and spry as possible. I am using traditional relational DB and not distributed. So in an effort to keep my complexity in DB management down, I offload some work to Redis. I could have created a second relational DB that has low security requirements and no relational object mapping, but I didnt think of that till now (I will explore this further in the next few days). In my case the data that is stored in Redis is accessed a lot and consists of data from multiple tables, so it was a good pick for moving to Redis. Performance wise, I suspect relational DB could perform jus as fast, but again I want to offload traffic off of the DB to not have to grow it or go distributed. If I was a better DB admin, I could probably created views or other relational DB features. But I felt it was easy to just let my API backend code (which has access to data that is also not in DB or might need to be formatted or calculated) construct the final data object then store a copy in Redis, whenever the data is updated.
Great writeup. I use a KV store for work, but only because the alternative for our data size (>1PB, >1 trillion keys), sharded sql, is totally awful. Distributed KV stores like foundationDB, dynamoDB actually do offer transactions , which is a huge win and I think their main selling point.
I guess I'm like you - I really don't understand why someone who had another choice would opt in to a KV store (barring something like memcache, or something really high performance using an embedded KV store).
It'd be pretty silly to use a relational database for something that trivially shards across servers, often doesn't have any consistency requirements, and only needs 3 columns (request, response, expiry time).
I don't think it's even possible to use postgres as a traditional cache. Eg this trigger looks extremely slow and I would be unsurprised if a human holding ctrl+shift+r could single-handedly DoS a cache like this: https://stackoverflow.com/questions/26046816/is-there-a-way-...
> I don't think it's even possible to use postgres as a traditional cache.
The link I posted shows that you can use Postgres as a traditional cache. Specifically, the SO link showing that Postgres cannot do caching (expiring old data) is explicitly addressed in the link I posted.
One would have to got out of their way to miss the point. And miss it entirely.
theoretically possible is far away from practical. That's my point. It would take quite a bit of work to do even subset of what most other solutions come out of the box, at most likely much lower performance (the reason to use cache in first place).
You can put a fucking text files on the disk as cache and `find -atime X -delete` to clear it, doesn't mean you should use it.
That link just says "Use Postgres for caching instead of Redis with UNLOGGED tables". That's not helpful and going one link deeper, it says "reports range from 10% to more than 90% in speed gain" which is not that fast.
> the SO link showing that Postgres cannot do caching (expiring old data) is explicitly addressed
Nope, UNLOGGED tables don't expire old data. They just fill up forever unless you use a trigger like the one in the answer I linked.
I also don't see anyone claiming they can saturate their network interface with postgres the same way they can with any KV store.
I'm sure there's a rate of concurrent writes where that wouldn't be true anymore, but if so, you're likely not going to be benefiting that much from the approach to caching.
The ideal use-case is when there is one, and only one, property used for retrieval which is guaranteed unique by the system. Less ideal, but often very performant, is when retrieval always uses one property which may not be unique.
Once retrieval requires anything other than a single predefined property, querying key-value persistent stores degrade into linear searches.
Relational dbs are mostly keyval stores inside. An index is a keyval store projection of another keyval store (but say, keying it for another subset of the value). B-tree indices and hashmaps are ways to represent keyval stores for quick look-up (b-trees are convenient as they allow range look-ups, & are automatically sorted, while hashmaps aren't, but have lower overhead for lookup and storage).
In essence everything is keyval. A sparse array is an ordered keyval store with integer keys (also technically everything is ordered, too, but some orders are stable, and useful, while others aren't). A dense array is an adjacency-optimized version of a sparse array where the key is implicit based on computable offset within a larger dense array, your address space. RAM address space is also variations on that theme. Raw disk storage. And file systems. Everything is. Maybe I spend too much time messing with storage, but I can't see it any other way at this point.
A concrete example would be a users shopping cart, as they build it. You don’t need the niceties of a fully ACID compliant DB, you need write performance, and extremely high availability.
That was at least a chief use case spotlighted in the original Dynamo paper by Amazon that what the precursor to AWS’ DynamoDB paper.
Not to say that couldn’t be done with Postgres but of course they were dealing with insane scale on Amazon Day.
it's a common misconception that modern non-relational stores (such as DynamoDB) aren't ACID compliant. DynamoDB offers ACID transactions, even across tables, as of several years ago.
not that you're saying they don't, but some people might interpret your comment that way.
> DynamoDB offers ACID transactions, even across tables, as of several years ago.
It depends on how DynamoDB is used[0]:
Transactional operations provide atomicity, consistency,
isolation, and durability (ACID) guarantees only within
the region where the write is made originally.
Transactions are not supported across regions in global
tables.
Granted, this likely handles most use-cases and the restriction enforced makes complete sense.
> A concrete example would be a users shopping cart, as they build it. You don’t need the niceties of a fully ACID compliant DB, you need write performance, and extremely high availability.
Using a key-value store for shopping carts can work for awhile, especially for the use-case you describe, but fails when system functionality grows beyond retrieving only by a cart ID.
And when using a persistent store which does not provide ACID capabilities, the system will ultimately have to enforce at least atomicity and consistency via server logic.
> when system functionality grows beyond retrieving only by a cart ID.
Either you manage your system so it never ever uses any other key than a cart ID (services have been running for decades keeping the same unique ID, that's not some unreasonable thing).
Or you migrate your data to match the completely wild new requirement, and taking costly steps to deal with a funfamental business change would be seen as reasonable in most orgs.
At my old job we used DynamoDB in a microservice architecture and I always thought it was a perfect fit. Not sure about JunoDB, but DynamoDB is notoriously difficult to index for various querying patterns. This seemed like less of an issue for microservices because the models are generally very compact and easy to query.
I think FoundationDB could meet the "extreme scale, security, and availability needs" of PayPal, I'd bet Apple's is more extreme, and they've shown ~500 core clusters doing well into the millions of ops/s
This is based on RocksDB which is "sorted key value" store like LevelDB (HBase, Hypertable etc.) and keep sorting and merging keys while flushing to disk at certain point.
Redis in comparison is a different thing if I'm not wrong.
It's a valid point - RocksDB can be thought of as the internal backend implementation of the DB. It's the front end API that matters as to whether Redis could fulfill the same purpose.
Juno is a disk-based store - a closer comparison would be to Mongo. And back when Juno development started, Mongo would not have seemed like a very good option (if it even is today.)
MongoDB is a document database not a key-value store.
That distinction is massively important when you're talking about distributed systems as key-value has far less edge cases to consider. Also its architecture is quite different as it doesn't have the concept of proxies.
Basically in the realm of databases the two are nothing alike.
Right and don’t you understand that Redis has the availability guarantees, scaling constraints, and memory architecture for all use cases?! Why would you ever need a different KV store when you have Redis!
If you're smarter than the hundreds of engineers designing and building the distributed systems powering the world's largest applications then well done.
Personally, whenever I see a system like this I try to look for why they didn't go with an existing solution. And in 99% of cases either an existing solution never existed or they had a unique requirement that necessitated building something from scratch.
Honestly, I work at a FAANG and, sadly, the answer is often a combination of people paying absolutely zero attention to tech that's NIH (e.g. SWEs working on DBs who have never used Postgres) and the fact that no one gets promoted for implementing a solution using 3rd party software. The system is setup such that you need to show you've developed something of sufficient complexity, and using OSS just doesn't look good in that context.
I'd expect an engineer designing a new database, even for internal use, to be familiar with the competition's strengths/weaknesses even if they're going in a different direction.
>If you're smarter than the hundreds of engineers designing and building the distributed systems powering the world's largest applications then well done.
There is plenty of absolute code abominations powering "world's largest applications"
Engineer competence is also vaguely related to quality of the infrastructure, yes, you need smart people to make big complex things, but you also need smart people to manage ungodly legacy enterprise spaghetti
You're assuming a level of engineering/scientific "purity" that doesn't really exist within orgs to the idealistic extent that we would like to imagine. I'm not saying that I'm smarter than every FAANG engineer, but having met quite a few and worked in large orgs myself, it's easy to say some of these tools come from pride/arrogance/a need to have your name on something/a fundamental misunderstanding of existing tooling. Not necessarily because they're unraveling parts of the universe yet unexplored and need a bespoke weapon to tackle new issues.
Syria is a nation under heavy sanctions; "Alep" is the old name for Aleppo, apparently still in use. Governments around the word demand financial institutions prevent fraud, funding to terrorist organisations, money laundering, etc. and can give heavy fines if they don't do enough, so you end up with this kind of stuff, for better or worse.
I had a friend that worked for ISIS: Innovative Solutions In Space. They had a lot of problems with all sorts of financial institutions once that other ISIS started becoming better known. They've since rebranded to ISISPACE for that reason.
Seems to be based on RocksDB. But I wonder if the persistence it is like Redis's persistence (where the persistence is just snapshot/txn-log style)
> JunoDB storage server instances accept operation requests from proxy and store data in memory or persistent storage using RocksDB. Each storage server instance is responsible for a set of shards, ensuring smooth and efficient data storage and management.
Redis essentially can’t store more data than what fits in RAM.
While it has persistence options, they’re for durability and backup, not to increase the storage available.
JunoDB appears to store data primarily on disk, and limits storage by disk size, but then caches in memory as necessary perhaps. Quite different in behaviour and trade offs to Redis.
Kansas is a state in the United States of America. While DBs and caching systems do operate in Kansas, the state of Kansas itself cannot easily be utilized as a key value store. Mainly because Kansas is not software but a piece of land with people and government. While it is reasonable to assume that the people and government of Kansas can create a key value store, it would require a law change to consider said software to be officially recognized as being part of Kansas.
It would be nice to see some benchmarks or just a mention of any kind of number. TiKV is a CNCF donated project with roughly the same architecture and has been deployed in larger clusters than 200 nodes.