harisund1990's comments

harisund1990 · 2025-01-12T05:32:02 1736659922

I love headers but I wish you could split them in two so that private functions and variables can line in the c file. This would help reduce a lot of header bloat as well.

lzsiga · 2025-01-12T06:28:45 1736663325

It is perfectly valid to use more than one header files: some of them can be public (meant to be seen by users of your library), others can be private or internal (only used by your own sources).

chikere232 · 2025-01-12T08:43:07 1736671387

Also, usually it's pretty rare to have things internal to one C file that need explicit prototypes. It's easier to just put things in the right order so the funtion definition etc is before its use.

harisund1990 · 2024-12-10T08:21:07 1733818867

> What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions.

The first upgrade is the hardest, but after that we will have the framework in place to perform consecutive upgrades much sooner. When the pg11 to pg15 upgrade becomes available it will be in-place online without affecting the DMLs, no other pg fork offers this capability today.

atombender · 2024-12-10T11:10:19 1733829019

I was referring to the effort by the developers to keep the forked codebase itself up to date with mainline. Isn't that the main hurdle?

My understanding is that you are patching a lot of core Postgres code rather than providing the functionality through any kind of plugin interface, so every time there is a major Postgres release, "rebasing" on top of it is a large effort.

That, to my knowledge, is why Greenplum fell behind so much. It took them four years to get from 9.6 to 12, and I believe that's where they are today.

harisund1990 · 2024-12-10T17:34:24 1733852064

Cutis is an extension. That's the best you can get by being outside the core db. If you want true distributed architecture then you need to change the QO, DDL, transaction, even query stat components. At which point it ends up being a fork.

Yes, the merges are hard. But pg12 changed lots of fundamental things making it very challenging. Pg15 to pg17 should be much simpler.

harisund1990 · 2024-08-19T20:45:14 1724100314

Yugabyte does automatic sharding

harisund1990 · 2024-03-30T15:16:44 1711811804

It's easier to manage 1 database instead of 1000s

nine_k · 2024-03-30T15:37:03 1711813023

It's more expensive to screw up one all-important database than one of a thousand.

Same logic allies to compute boxes, see "pets vs cattle" from 15-20 years ago.

jitl · 2024-03-30T17:12:31 1711818751

The difference between "pets" and "cattle" are that pets have state and need to be taken care of, you can't recreate them from scratch trivially. Cattle are stateless and can be created and destroyed easily.

The whole point of a database is to contain the state - as a pet - so the rest of your application can be stateless - as cattle.

To really get cattle database systems, you need a self-managing cluster architecture that puts things on autopilot like Neon where you've got >=2 copies of each row and can tolerate losing any single box without unavailability.

nine_k · 2024-03-30T19:03:07 1711825387

This is fair.

But restoring a small DB from a fresh backup, if things go really wrong, is faster, and does not affect other customers.

I completely agree wrt having a hot spare / cluster with transparent failover and management.

sitkack · 2024-03-30T15:48:49 1711813729

SQLite requires near zero management.

harisund1990 · 2024-03-30T14:41:38 1711809698

The article should be titled "Why Figma HAD TO reinvent the wheel with PostgresSQL". When you have a legacy system and not enough time, or will to move off of it the only option is to get inventive and build with what you have.

There is always a price. In this case the database team did something quick, cheap and easily. But the Application teams now have to deal with handling all the nuaces of the system. Maybe Figma has more people in these Apps teams with time on their hands to handle it.

harisund1990 · 2024-03-30T14:35:45 1711809345

Sharding is the easy part. Eventually you need to implement distributed transactions, taking a consistent backup across shards, PITR, resharding, load balancing, and the list goes on... That takes exponentially more number of people and time and mainly risk.

It works for Figma(for now), but for it to work as a solution for other companies with different hardware, data schema and access patterns will add even more complexity to the mix.

It's a excellent solution but I don't think it be good enough in the long run.

harisund1990 · on Nov 25, 2023

1% of Visa maybe?

harisund1990 · on Nov 25, 2023

No one needs 10,000 machines. They likely have scale issues and cannot scale their database quick enough so they planned for the worst spike possible which is likely for a hour on cyber Monday.