Worth reading up on Postgres' MVCC model for concurrency.[0] Short version is th...

williamdclt · 2024-10-14T15:58:23.000000Z

As much as I have gripes with the autovac, I’m surprised at the idea of getting to such a broken state. 100M rows is not small but not huge, how frequent is “frequent updates”? How long ago was that (there’s been a lot of changes in autovac since v9)?

“Stops working completely” should not be a thing, it could be vacuuming slower than the update frequency (although that’d be surprising) but I don’t know of any reason it’d just stop?

That being said I’ve also had issues with autovac (on aurora to be fair, couldn’t say if it was aurora-specific) like it running constantly without vacuuming anything, like there was an old transaction idling (there wasn’t)

sgarland · 2024-10-14T16:50:19.000000Z

On decently-sized tables (100,000,000 is, as you say, not small but not huge), if you haven’t tuned cost limiting and/or various parameters for controlling autovacuum workers, it’s entirely possible for it to effectively do nothing, especially if you’re in the cloud with backing disks that have limited IOPS / throughput.

It continues to baffle me why AWS picks some truly terrible defaults for parameter groups. I understand most of them come from the RDBMS defaults, but AWS has the luxury of knowing precisely how many CPUs and RAM any given instance has. On any decently-sized instance, it should allocate far more memory for maintenance_work_mem, for example.

mjr00 · 2024-10-14T16:49:22.000000Z

It's been a while, but IIRC it was on pg12. "Stopped working completely" I'm basing on the vacuum statistics saying the last auto-vacuum started weeks ago for these tables and never actually finished. Frequent updates means regularly rewriting 10 million rows (at various places) throughout the table. I also should mention that there were 100+ materialized views built off this table which I'm sure had an impact.

In any case, this got resolved but caused a huge operational headache, and isn't something that would have been a problem with MySQL. I feel like that's the main reason VACUUM gets hated on; all of the problems with it are solvable, but you only find those problems by running into them, and when you run into them on your production database it ends up somewhere between "pain in the ass" and "total nightmare" to resolve.

InsideOutSanta · 2024-10-14T15:11:29.000000Z

Thanks for that, that's valuable information.