anytime a huge multi-decades-old FOSS project lands a milestone, I can't help but equate it to something like a moon landing.
So much (unpaid) work and thought goes into stewarding open software. Kudos to the whole team. Software infra is just as important as bridges and roads -- here's hoping we can fund it at least as well, for humanity's sake. [1]
PostgreSQL is also one of the most impressive projects out there in terms of being community maintained, as opposed to many large FOSS projects which have some kind of corporate backing employing the majority of the core team.
3 of 7 work at EDB, and the core team doesn’t drive the project roadmap. And EDB hackers fail to get patches in all the time, just like everyone else :)
Yes, and Andres (literally the “main hat” they wear at Microsoft is work on PostgreSQL, and Tom, etc. I don’t understand why pointing out what should be obvious gets people to lose their mind here and is such a controversy. PostgreSQL like Linux is a massively commercially sponsored project - that there are some volunteers doesn’t change that.
I used to be paid full time to mostly work on a Linux kernel subsystem. This discussion is silly.
Agreed - just think about the billions, possibly trillions of economic value (jobs, shareholder value, utility to society, etc) that a project like postgres or ruby on rails has created.
They just changed the versioning scheme; it used to be that e.g. 9.3 -> 9.4 was a major version (i.e. can't be upgraded in-place). Starting with PG 10 major versions are now 10 -> 11, etc. I don't believe the major release cadence itself changed that much
It seems like they fell more inline with Semantic Versioning when that format came in vogue. Semantic Versioning is what most devs expect now; it makes sense to communicate the version in a format that has a broadly understood meaning for devs.
I think there is no semantic versioning anymore, they just release one major number version every year, regardless how large and backward compatible changes are there.
schools either never worked or stopped working. after all, look, almost all the people who went to school don't give a fuck about underperforming underfunded school systems. (or healthcare or ... or if they care they are ignorant and clueless about what to do with the problem, and easily fell prey to political dogma of some group.)
Teachers are under funded in the US. In many states if your lucky teachers wages top out at the median income. And they often have little to no funding for class room decorations or much beyond very basic materials.
If we don't pay our teachers decently how can we expect them to put in the effort to do the very difficult task of raising the next generation? (And yes teachers raise their students just as much as parents do)
well, real value of salaries of teachers are plummeting. of course the same thing happens in other hard-to-scale demand-for-quality service sectors (like healthcare, but also construction work, where standards simply increased, which have a lot of associated labor, which now cost a lot more due to multiple factors, eg. less immigration, shrinking of the active population - end of the positive effects of the baby boom demographic boost, of course there's also the inescapable Baumol effect).
nah, there's a broad consensus of US (and alas also in most countries') population of not wanting to pay for "externalities", they want cheap gas and cheap labor, cultural homogeneity, and so on.
if a few thousand private school darlings can bamboozle hundreds of millions for decades, then the problem is not just with them.
of course it's cultural and it's due to the very strong biases of the last who-knows-how-many centuries.
> and it's not relevant for a topic about databases
it's very relevant for the whole concept of FOSS "moon landings" and the commons
This is such a misleading understatement. There is a ton of money. Many of the core Postgres contributors have fully remunerated employment where much of their time is dedicated to postgresql development. This is fully deserved and appropriate - but Tom Lane, et al are not working on this for free (that they still might if they had to do is another thing) but these are all highly compensated contributors. The total payroll spend on postgresql is distributed and decentralized but it is clearly in the multimillions.
People should not be mislead about true costs regardless how they’re meted out.
SELECT js,
js IS JSON OBJECT "object?",
js IS JSON ARRAY "array?",
js IS JSON ARRAY WITH UNIQUE KEYS "array w. UK?",
js IS JSON ARRAY WITHOUT UNIQUE KEYS "array w/o UK?"
FROM (VALUES ('[{"a":"1"},
{"b":"2","b":"3"}]')) foo(js);
Curious what your use case is for wanting direct_io? Every DBA I've ever worked with when setting up a new database, the first thing they want to do is enable direct io. My worst experience was with IBM DB2 mounted over NFS talking to netapp. Performance complaints would come from customers and land on the CEO's desk. He'd go to the software team and tell them to fix it. They'd say the DB is slow. Then he'd go to the DBAs and tell them to tune the DB. They'd say there's nothing more to do, we need faster disks. So he'd end up in front of me on the sysop team asking if we had any faster disks laying around (we didn't and buying more wasn't in the budget).
Since it was NFS, you could just use tcpdump and watch what DB2 was doing on the wire. It was happily poking away sending and receiving packets all 1K in size (the current configured DB block size) with peak read and and write speeds of about 11MB/s. Since the DBAs didn't want to change settings on a production DB, I set up a testing environment, begged them to play with the direct io and block size settings on this new instance and figure out the best performance. When I checked back days later, it was set up exactly the same, "we follow best practices, use 1K block size and force direct io".
I ended up creating a VM under the guise of "we need a data warehouse" with 1/4 the cpus and ram as the DB2 machines and installed postgresql 9.2. Did a minimum amount of tuning, mostly just turning off fsync for WAL writes, then spent a week filling it up with 5TB of data and 15 billion rows from the production DB. Ran one of our analytic queries that had grown to taking 30 hours on DB2, it ran in 6 hours. The packet sizes over NFS were 32-64MB in size and getting peak speeds of 180-220MB/s on the wire.
The 1k packets you saw probably correspond to the default block size being used for the DB, that is a vestige of using spinning disks. That you were using NFS or any kind of networked filesystem is what I'd say is a performance hostile environment. Did no one think of just not using NFS?
This was before 2012, AWS did not exist. The company had to find rackspace in a data center. Which we couldn't. One of the funding customers "loaned" us a couple of slots in their on premise data center which fit only a bladecenter and single netapp. NFS had advantages, you could dynamically resize live mount points, etc. Plus as I mentioned, using postgresql did away with our performance issues.
> Did a minimum amount of tuning, mostly just turning off fsync for WAL writes
That is not something I would suggest to people on production systems, as that would give you a good chance of data loss when the system halts.
So, out of interest, were there any circumstances why turning off WAL fsync was considered a good choice in your situation?
I probably meant to say "synchronous_commit", which is how data is written to disk from the WAL. If you want full data guarantees, with regular hard drives, you'd be looking at less than 200 transactions per second. You set synchronous_commit to off, and suddenly you can do 10k transactions per second. You can tune when the WAL gets flushed to disk based on time and/or size. So you can set the amount of recent data loss you are comfortable with.
From the docs:
"setting this parameter to off does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty"
> If you want full data guarantees, with regular hard drives, you'd be looking at less than 200 transactions per second.
That sounds like an anecdote from the time before SSDs.
> "setting this parameter to off does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. "
If your application is told that the transactions were committed, but your DB actually hasn't, you got a problem, methinks.
Synchronous replication exists to avoid more of the transactions committed problem.
Thing is DB work is all about tradeoffs, so you'll never be free of them.
As the parent explained, PostgreSQL actually lets you make a very good trade: you get full cached level performance but lose only a couple of transactions at most, instead of data corruption. If that's not ok for you, the leash can be tightened to lose nothing, but then your hardware had better keep up with your platform.
Re-read their post. It was a secondary system setup to just run these analytics which were loaded from the production system. No issues if the whole machine had to be rebuilt.
I'm thinking lower resource usage (no double caching of data), shorter path to data so I would expect fewer bad things might happen during commit, and better performance in terms of transactions per second. Otherwise, I can't explain it any better than one of the lead developers himself:
https://www.postgresql.org/message-id/20210223100344.llw5an2...
The new debug_io_direct flag only triggers direct IO in very limited cases, and is only tangentially related to the AIO patchset discussed in that thread.
Note that the documentation on the config flag explicitly warns about not using it in production:
> Currently this feature reduces performance, and is intended for developer testing only.
Also note that very few things will actually do IO during commit - the only IO that I can think of are 1.) the WAL-logging of the commit (often small, a few 100 bytes at most), and 2.) replying to the COMMIT command (10s of bytes at most). It is quite unlikely that this will see much performance benefit from IO_DIRECT without further infrastructure inside PostgreSQL around io_uring and other async kernel IO apis.
Yes when direct io is brought up, it is usually followed with the "double buffering" argument. That is valid, but only if your disk speeds are well above 1,000MB/s. Outside of hardware like that, you are always going to be waiting on disk.
From Linus himself[0]
"The thing that has always disturbed me about O_DIRECT is that the whole
interface is just stupid, and was probably designed by a deranged monkey
on some serious mind-controlling substances"
Have you read the entire thread you linked? People explained to Linus why direct IO is important to them. Besides, it's 20 years old and there were even no SSDs back then. The lack of direct IO in PG was one of (one of) the reasons why Uber moved to MySQL (https://www.uber.com/en-PL/blog/postgres-to-mysql-migration/, "The Buffer Pool" Section).
With buffered IO you will likely store a lot of the same data in memory twice- once in DB's memory and then in page cache. Now you can just give the memory used by page cache directly to DB, because it knows better what and when it needs.
The thread is large, maybe I read the whole thing at one time, but the point of it is literally someone asking why using direct io is slower. That is the point I make, you can't just enable direct io (which by passes all the logic to speed up reads and writes) and expect increased performance without a lot of extra up front work.
But back to my first question, I am still curious what your workload is that you feel will benefit from direct io :)
As others have written, this work entails implementing asynchronous IO.
One system I worked with that was more heavily loaded than the usual intranet CRUD was an SMS lottery run by a radio station. They had "bonus rounds" that lasted a few minutes. When the host said "now send!" we were receiving a huge amount of data at the same time. However, it worked for years on Postgres 7.4 and regular hard drives :)
can anyone point to the COPY FROM improvements mentioned that can result in up to 300% performance improvements is it the line in the release notes about "ASCII string detection"?
To make my database fast, I often have to do a vacuum full on some key tables. Which is basically a freeze all access to the table, and copy byte by byte to a new physical file. So as your data size doubles, the vacuum full time doubles. Have a table that is so big I basically can't vacuum full it anymore (in an acceptable amount of downtime).
I’m not familiar with the VACUUM changes but the situation you’re describing suggests something is wrong with the table definition or database configuration.
Postgres docs are quite clear. Table space is not reclaimed without a vacuum full. So delete a column in a big table? you are storing that data forever.
Not quite true. Regular VACUUM marks dead tuples as available for re-use, so the system can overwrite the dead tuples with fresh ones after a VACUUM. VACUUM FULL completely rewrites and repacks the table.
Postgres has a free space map, which allows it to reuse the space of deleted tuples for new or updated tuples created in the same table. If no new/updated tuples are created in the table after the DELETE, you have fragmentation, which means the file remains large so the space can't be used for other tables (or other unrelated data residing in the same filesystem).
The nature of fragmentation means that you need to move a lot of data around to actually make that file smaller. In Postgres, that's typically done with VACUUM FULL. The problem of fragmentation is not unique to Postgres, it's a fundamental issue; but perhaps other systems are able to move the data around in a less disruptive way.
If you just delete a column, that creates a different type of fragmentation within the tuples themselves (e.g. you delete the middle column, and the tuple itself doesn't shrink, it just ignores that middle column). You are right that can be a problem. Postgres could be improved to rewrite tuples to eliminate the wasted space from deleted columns in the middle, which would probably be (computationally) worth it to do if it's already performing cleanup on the page.
...okay ? What's the use case where data shrinks ?
The data that will be "not removed" will just be used by new data.
Only real use case is "we've loaded way too many data, removed it, and want to recover that space because we will never need it", and that is not enough to matter, as usually database have its own filesystem and most filesystems can't be shrunk online so any shrinking needs downtime
If rewriting the table is part of a regular workflow, then database configuration can play a significant role in its performance. However I still contend that rewriting tables regularly is an anti pattern.
Are you able to partition any of your tables so that you can VACUUM FULL the partitions individually?
Yah partition is my eventual end goal. It’s getting less and less awkward with each PG version to partition. It’s still a little awkward in Django. But heck, I haven’t tried it in a year, maybe worth another shot.
git diff REL_16_RC1..REL_16_0 doesn't show any changes that would require a pg_upgrade (at least from my read), so you should be able to upgrade without it.
But I just installed the latest Debian with Postgres 15, haha. I don't even think I'm using any features past 11 (websearch_to_tsquery), so I'll need to research anything new that might be useful to me.
So much (unpaid) work and thought goes into stewarding open software. Kudos to the whole team. Software infra is just as important as bridges and roads -- here's hoping we can fund it at least as well, for humanity's sake. [1]
[1]: https://www.fordfoundation.org/work/learning/research-report...