The lazy migration tradeoff would keep me up at night. Thousands of databases at different schema versions running simultaneously — your app code needs to handle every version in the chain at once.
Debugging "works for user A, crashes for user B" becomes "check which schema version their actor's database is on." That's a rough on-call experience.
Schema-per-tenant in a single Postgres instance gives you the same isolation without version skew. Migrations are atomic — done or not done, no in-between states.
The read-after-write problem on replicas is what actually hurts in production. Write goes to primary, next request hits a replica that's a few hundred milliseconds behind, and the data just isn't there. Annoying to debug because it's intermittent and disappears under low load.
Routing reads back to primary for a few seconds after a write is the obvious fix but PgBouncer can't do it. Good to see it handled at this layer.
Postgres as a search backend is one of those decisions that looks wrong on paper but works really well in practice. tsvector handles full-text, pg_trgm does fuzzy matching, pgvector covers semantic — and you don't need to babysit an Elasticsearch cluster or worry about sync lag.
The part that's easy to overlook: your search index is transactionally consistent with everything else. No stale results because some background sync job fell over at 3am.
With 3000+ schemas I'd keep an eye on GIN index bloat. The per-index overhead across that many schemas adds up and autovac has trouble keeping pace.
Debugging "works for user A, crashes for user B" becomes "check which schema version their actor's database is on." That's a rough on-call experience.
Schema-per-tenant in a single Postgres instance gives you the same isolation without version skew. Migrations are atomic — done or not done, no in-between states.
reply