One massive one that we’ve been struggling with for months.
Some PostgreSQL query errors are not handled correctly. It results in a fatal handling which restarts the entire server. It takes up to 30 seconds for the full reload meanwhile users are getting 504s.
The error is benign and totally retriable in the userland. It’s a query timeout. No need to handle it as fatal.