I'm curious about this as well. I often see people talk about CockroachDB in production, but I don't think I've ever heard of anyone running Yugabyte. But it is definitely under active development.
I found two threads discussing it from the past year:
Yugabyte (as with CockroachDB and TiDB) is based on mapping relations to an LSM-tree-based KV store, where ranges of keys get mapped to different nodes managed through a Raft group. That kind of structure has very different performance characteristics compared to Postgres' page-based MVCC. In particular, LSM trees are not a free lunch.
Query execution is also very different when a table's data is spread over multiple nodes. For example, joins are done on the query executor side by executing remote scans against each participating storage node and then merging the results. That's always going to be slower than a system that already has all the data locally.
YB also lacks some index optimizations. There is some work to make bitmap index scans work in YB, which will give a huge performance boost to many queries, but it's incomplete. YB does have some optimizations (like loose index scans) that Postgres does not have. So it's probably fair to say that YB is probably a lot slower than PG for some things and a little faster at others.
I think it's fundamentally not a bad architecture, just different from Postgres. So even though they took the higher layers from Postgres, there's a whole bunch of rearchitecting needed in order to make the higher layers work with the lower ones. You do get some Postgres stuff for free, but I wonder if the amount of work here is worth it in the end. So much in Postgres makes the assumption of a local page heap.
What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions. YDB is on Postgres 12, which came out in 2019, and is slowly upgrading to 15, which came out 2022. By the time they've upgraded to 15, it will probably be 2-3 versions behind, and the work continues.
Worth noting: Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign.
Regarding the last point << Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign. >>
> What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions.
The first upgrade is the hardest, but after that we will have the framework in place to perform consecutive upgrades much sooner.
When the pg11 to pg15 upgrade becomes available it will be in-place online without affecting the DMLs, no other pg fork offers this capability today.
I was referring to the effort by the developers to keep the forked codebase itself up to date with mainline. Isn't that the main hurdle?
My understanding is that you are patching a lot of core Postgres code rather than providing the functionality through any kind of plugin interface, so every time there is a major Postgres release, "rebasing" on top of it is a large effort.
That, to my knowledge, is why Greenplum fell behind so much. It took them four years to get from 9.6 to 12, and I believe that's where they are today.
Cutis is an extension. That's the best you can get by being outside the core db.
If you want true distributed architecture then you need to change the QO, DDL, transaction, even query stat components. At which point it ends up being a fork.
Yes, the merges are hard. But pg12 changed lots of fundamental things making it very challenging. Pg15 to pg17 should be much simpler.
I found two threads discussing it from the past year:
https://news.ycombinator.com/item?id=39430411
https://news.ycombinator.com/item?id=38914764
Yugabyte (as with CockroachDB and TiDB) is based on mapping relations to an LSM-tree-based KV store, where ranges of keys get mapped to different nodes managed through a Raft group. That kind of structure has very different performance characteristics compared to Postgres' page-based MVCC. In particular, LSM trees are not a free lunch.
Query execution is also very different when a table's data is spread over multiple nodes. For example, joins are done on the query executor side by executing remote scans against each participating storage node and then merging the results. That's always going to be slower than a system that already has all the data locally.
YB also lacks some index optimizations. There is some work to make bitmap index scans work in YB, which will give a huge performance boost to many queries, but it's incomplete. YB does have some optimizations (like loose index scans) that Postgres does not have. So it's probably fair to say that YB is probably a lot slower than PG for some things and a little faster at others.
I think it's fundamentally not a bad architecture, just different from Postgres. So even though they took the higher layers from Postgres, there's a whole bunch of rearchitecting needed in order to make the higher layers work with the lower ones. You do get some Postgres stuff for free, but I wonder if the amount of work here is worth it in the end. So much in Postgres makes the assumption of a local page heap.
What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions. YDB is on Postgres 12, which came out in 2019, and is slowly upgrading to 15, which came out 2022. By the time they've upgraded to 15, it will probably be 2-3 versions behind, and the work continues.
Worth noting: Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign.