> How do deal with horizontal scaling or failovers in general?
We prefer vertical scaling over horizontal. The entire company is one server, which works for us since most of our customers are close to it.
In the future if we needed geographically distributed read replicas and failover protection I would probably go with LiteFS [1].
> if only one process can write to the db, how do you architect your system so that writes are performed by just one of the instances (the writer)
For multiple machines, LiteFS has ways of dealing with this. But the general idea applies to our system as well. All of our SQL is in either a read.sql or write.sql file. On startup, we open a connection pool for all the read queries and open a single connection for all the write queries. This keeps things clean and separated.
> how do you gracefully upgrade your app without any downtime?
We use a systemd socket and service. When the service is restarted with a new binary, systemd delays request processing until it's back online. It's really simple (thanks to our monolith setup) and entails zero downtime except a 1-2 second delay on unlucky requests. But as far as I can tell, no requests are lost.
We prefer vertical scaling over horizontal. The entire company is one server, which works for us since most of our customers are close to it.
In the future if we needed geographically distributed read replicas and failover protection I would probably go with LiteFS [1].
> if only one process can write to the db, how do you architect your system so that writes are performed by just one of the instances (the writer)
For multiple machines, LiteFS has ways of dealing with this. But the general idea applies to our system as well. All of our SQL is in either a read.sql or write.sql file. On startup, we open a connection pool for all the read queries and open a single connection for all the write queries. This keeps things clean and separated.
> how do you gracefully upgrade your app without any downtime?
We use a systemd socket and service. When the service is restarted with a new binary, systemd delays request processing until it's back online. It's really simple (thanks to our monolith setup) and entails zero downtime except a 1-2 second delay on unlucky requests. But as far as I can tell, no requests are lost.
[1]: https://fly.io/docs/litefs/