I love headers but I wish you could split them in two so that private functions and variables can line in the c file. This would help reduce a lot of header bloat as well.
It is perfectly valid to use more than one header files: some of them can be public (meant to be seen by users of your library), others can be private or internal (only used by your own sources).
Also, usually it's pretty rare to have things internal to one C file that need explicit prototypes. It's easier to just put things in the right order so the funtion definition etc is before its use.
> What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions.
The first upgrade is the hardest, but after that we will have the framework in place to perform consecutive upgrades much sooner.
When the pg11 to pg15 upgrade becomes available it will be in-place online without affecting the DMLs, no other pg fork offers this capability today.
I was referring to the effort by the developers to keep the forked codebase itself up to date with mainline. Isn't that the main hurdle?
My understanding is that you are patching a lot of core Postgres code rather than providing the functionality through any kind of plugin interface, so every time there is a major Postgres release, "rebasing" on top of it is a large effort.
That, to my knowledge, is why Greenplum fell behind so much. It took them four years to get from 9.6 to 12, and I believe that's where they are today.
Cutis is an extension. That's the best you can get by being outside the core db.
If you want true distributed architecture then you need to change the QO, DDL, transaction, even query stat components. At which point it ends up being a fork.
Yes, the merges are hard. But pg12 changed lots of fundamental things making it very challenging. Pg15 to pg17 should be much simpler.
The difference between "pets" and "cattle" are that pets have state and need to be taken care of, you can't recreate them from scratch trivially. Cattle are stateless and can be created and destroyed easily.
The whole point of a database is to contain the state - as a pet - so the rest of your application can be stateless - as cattle.
To really get cattle database systems, you need a self-managing cluster architecture that puts things on autopilot like Neon where you've got >=2 copies of each row and can tolerate losing any single box without unavailability.
The article should be titled "Why Figma HAD TO reinvent the wheel with PostgresSQL".
When you have a legacy system and not enough time, or will to move off of it the only option is to get inventive and build with what you have.
There is always a price. In this case the database team did something quick, cheap and easily. But the Application teams now have to deal with handling all the nuaces of the system. Maybe Figma has more people in these Apps teams with time on their hands to handle it.
Sharding is the easy part. Eventually you need to implement distributed transactions, taking a consistent backup across shards, PITR, resharding, load balancing, and the list goes on... That takes exponentially more number of people and time and mainly risk.
It works for Figma(for now), but for it to work as a solution for other companies with different hardware, data schema and access patterns will add even more complexity to the mix.
It's a excellent solution but I don't think it be good enough in the long run.
No one needs 10,000 machines.
They likely have scale issues and cannot scale their database quick enough so they planned for the worst spike possible which is likely for a hour on cyber Monday.
reply