Hacker News new | past | comments | ask | show | jobs | submit login

Backups are fundamentally limited to eventual consistent, there is no need for databases to be synchronously replicated for backups. I mean splitting database has no effect on backup consistency, although a more decent way of dealing with it is not splitting database, but simply running an async replica to do backups from.

Are you are talking past each other...? I suspect the grandparent post is concerned about the loss of ACID properties for updating data and metadata if they are split across two databases. This is a concern for regular application access too. Meanwhile, you assume that this consistency problem is already solved in the application, and so of course the backup and restoration problem can also then be handled.

To the grandparent: you typically solve this application consistency issue by using something like immutable object store semantics with versioned references. The object store is capable of answering requests for multiple versions of the asset, and the application metadata store tracks individual versions. You can sequence the order in which you commit to these stores, so the asset is always available before it is published in the metadata store. Alternatively, you can make consumers aware of temporary unavailability, so they can wait for the asset to become available even if they stumble on the new version reference before the content is committed.

You can also find hybrids where the metadata store is used to track the asset lifecycle, exposing previous and next asset version references in a sort of two-phase commit protocol at the application level. This can allow consuming applications to choose whether they wait for the latest or move forward with older data. It also makes it easier to handle failure/recovery procedures such as canceling an update that is taking too long and reclaiming resources.

Backups on databases like PostgreSQL are strongly consistent, regardless of where you take it from (master or replica). Postgres replication is strictly sequential wrt transaction commit order.

As such, splitting the database may incur in significant consistency issues that a backup doesn't incur into.

I believe this splitting technique is not a good one except for potentially narrow use cases.

Strong consistency is not what you think then. You can only do stale reads from backups.

I thought ahachete meant that a database backup is atomic, i e. it will only contain fully completed transactions. The problem with data split across databases then is that transactions don't span multiple databases, so you can't get an atomic snapshot of data spanning multiple databases.

Are you saying that even a single-database backup is not atomic?

No, it's atomic. It's just not important if separate databases are not atomic to each other when you recover from backups, you will still have data loss and an inconsistent state. I.e. things that should be in the database missing, and that shouldn't be overwriting missing IDs, etc. Backups cannot be strongly consistent, so you have to take the exact same approaches to deal with this whether you store everything in a single database or in two separate ones.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact