Hacking on PostgreSQL Is Hard

mritchie712 · 2024-05-05T11:22:01

The first commenter on the blog makes a very good point:

> The thing I like to remind people is that this project has been around for decades, during which over a hundred people (or whatever COUNT(*) FROM contributors is) have picked through the code trying to find low hanging fruit to contribute. Everything left to do is astonishingly hard; if it weren't it would have been done already.

There are no easy wins left to contribute.

I've had ideas for a few pg extensions, I wonder if these are as difficult to contribute.

fforflo · 2024-05-05T11:45:34

Development-wise they're completely different. Extensions are much easier / more isolated. Developing a pg extension is all about compiling some C code into a .so object and dynamically using a templated Makefile to load it into the running Postgres process.

If you want some boilerplate to get you started I've been using this cookie-cutter template to bootstrap lots of extensions https://github.com/Florents-Tselai/cookiecutter-postgres-ext...

Demo walkthrough: https://www.youtube.com/watch?v=zVxY3ZmE5bU

Contributing to core Postgres, it's an entirely different story.

eatonphil · 2024-05-05T13:15:37

> There are no easy wins left to contribute.

I don't think that's true. Watch the pgsql-docs mailing list and you will see a steady stream of people pointing issues in the docs.

Moreover, there is a steady stream of new features. New features that need polish and docs. I've got a patch in the mailing list (maybe, hopefully, knock on wood) to add the first code samples to demonstrate Table Access Methods.

For that matter, I don't think there has yet been committed an a second /dev/null or in-memory Table Access Method. Though I'm aware of one person's effort to add a /dev/null Table Access Method. But I don't think this has been merged yet.

This is unrelated to what Robert is talking about though. I don't love the contribution process. But there is definitely a lot to contribute to if you do spend time and keep an a eye out for where improvement might be useful/accepted.

levkk · 2024-05-05T13:18:25

I'd say hacking isn't the same as building new production quality features.

Just for fun or for educational purposes, I'd want to change something about the database, and it won't be easy because internals aren't that well documented beyond code comments and C makes it hard to find what's what (no rust-style docs or code map).

Given months of digging around I'm sure I'll be able to do something small, but "hacking" is a side gig, not a full-time job.

avi_vallarapu · 2024-05-05T11:38:42

I posted my thought on another thread too : https://news.ycombinator.com/item?id=40231332

- Postgres documentation is one of the well maintained database documentations. This also means that developers, committers ensure changes to documentations for every relevant patch.

- talk about bugs in postgres compared to MySQl or Oracle or etc databases. Bugs are comparatively lesser or generally rare even if you are supporting postgres services as a vendor with lots of customers. the reason is the efforts involved by a strong team of developers in not accepting anything and everything, there are strict best practices, reviews, discussions, tests, and a lot more that makes it difficult for a patch or a feature to make it to a release.

- ultimately, more easy is the acceptance of a patch, more the number of bugs.

I love Postgres the way it is today and it still is the dbms of the year and developers most loved database.

I wish we have more Contributors, committers, developers and also users and companies supporting Postgres so that the time to push a feature gets more faster and reasonably easier with more support.

steve_rambo · 2024-05-05T10:37:32

Another discussion a couple of days ago:

https://news.ycombinator.com/item?id=40231332

Ozzie_osman · 2024-05-05T13:48:01

I remember reading the entire thread/debate on whether to move to 64-bit XIDs. https://www.postgresql.org/message-id/flat/CACG=ezZe1NQSCnfH...

Over 150 replies, 2.5 years, and still no resolution. Which I guess could be good or bad, depending on how you look at it.

tkiolp4 · 2024-05-05T10:56:54

Do you think we are going to reach a point in which it’s very hard (perhaps impossible) to maintain big codebases (like postgres) due to the lack of maintainers? Would that mean the end of such projects? Or perhaps big corporations would take over at a high price (forking the project and publishing it under a price tag)?

__s · 2024-05-05T11:28:14

Doesn't really make sense. The big tech companies pay many contributors as employees, allowing projects like postgres to offer a portable solution

Microsoft sells SQL Server with its proprietary features while also offering managed Postgres

Postgres development being hard because it's mature also means its indefinitely good since it's mature. A core slowly improving a solid codebase keeps stability

fforflo · 2024-05-05T11:49:56

That's an ongoing discussion in the Postgres community [0], primarily because Postgres is 100% written in C.

As for the future, I don't know. Not many companies can maintain full forks. I've noticed, however, from my professional experience that businesses are willing to hire freelancers to code specific things. (e.g., I've had a few Postgres gigs to write custom Postgres extensions. 50% C and 50% SQL/PgSQL). But yeah, I guess lots of C projects will become like COBOL ones.

0: https://redmonk.com/jgovernor/2023/10/10/postgres-the-next-g...

ozim · 2024-05-05T18:07:52

I see a different option. At some point someone will start a new db project that will take over.

ghelmer · 2024-05-06T01:50:58

The barrier to entry for programmers on projects like PostgreSQL and FreeBSD is high, with good and bad results. It seems you have to be very committed to the project (which may involve support from one’s employer) to join and contribute. It requires deep understanding of the codebase and preparedness to deal with fallout when changes inevitably cause problems. That’s good in that the developers are highly invested in building a quality product. But the high barrier to entry makes it difficult to attract new developers, and it is very difficult for those with a passing interest to get fixes and improvements into the codebase.

RoboTeddy · 2024-05-05T12:53:17

How’s PostgreSQL’s code quality? If projects have tons of technical debt or poor abstractions it can often be hard to make significant changes. Is that the case here, or no?

convolvatron · 2024-05-05T14:12:44

eh...you know, if you're in the right parts its actually pretty pleasant. there is alot of good design in Postgres.

otoh, there is some awful legacy stuff, and some really awfully crosscutting stuff around physical logging (I just looked at the locking around running queries on a replica, and that's clearly never going to be correct)

despite the fact that its in C, given a couple major refactors that will never happen, it could be really nice

anarazel · 2024-05-05T14:33:52

> I just looked at the locking around running queries on a replica, and that's clearly never going to be correct

Uh, huh. Details please?

RedShift1 · 2024-05-05T11:03:52

Hot take: this is good? It's a database, you want it to be predictable and reliable as it's literally the foundation of many applications and processes. It's not another toy project, mistakes can have disastrous consequences so you really want those extra layers of scrutiny.

CodesInChaos · 2024-05-05T11:17:09

The main problem this post complains about is that it's difficult to implement changes correctly, even as an expert. That's definitely not a good thing.

pylua · 2024-05-05T12:55:46

I read the article and I can’t understand at a simple level why it’s hard to contribute. Is it easy to break things? Is it hard to determine if something is broken ? If so, maybe it just wasn’t really designed with those items in mind, or maybe there just isn’t adequate testing.

Adding on a new feature that is hard to understand is maybe a signal that the design does not support it.

RedShift1 · 2024-05-05T12:50:59

Yes but it's a database server, I expect lots of things to be non-trivial here?

layer8 · 2024-05-05T14:03:49

Sure, but the design should make it reasonably efficient and reliable to reason about the behavior. The assumptions, requirements, and guarantees should be clear (documented) at all relevant code locations.

There is a lot of mentioning of testing in this thread and in TFA. However, testing is not the main tool to achieve correctness. Testing is merely a sanity check. Correctness is achieved by being able to adequately reason about the logic of the code, making sure that all assumptions and guarantees are met.

TFA sounds like that reasoning is hard in the PostgreSQL code base. But it could also be that the author was approaching things in a too cavalier manner.

Kinrany · 2024-05-05T13:04:13

Expected doesn't mean good though.

amlib · 2024-05-05T13:11:22

Sure, but is there any other mature and complex (or even more complex) database software easier to hack then Postgres? I imagine something like Microsoft SQL is just as hard and Oracle is known to be a complete mess, much worse than Postgres in this regard.