> "My second point was that this — streaming a "non-lossy" database as a change log into one or more "lossy" ones — is such a common operation that it should be a solved problem. It certainly requires something more than a queue."
This is almost exactly the cross-DC replication problem, which is a subject of active research.
A changelog on the source side is only sort of helpful. It's useful to advise which rows may have changed, but given that you don't trust the target database, you also need to do repair.
Correct repair is impossible without full syncs (or at least partial sync since known-perfectly-synced snapshot), unless your data model is idempotent and commutative. On-the-fly repair requires out-of-order re-application of previous operations.
The easiest way to reason about commutivity is to just make everything in your database immutable. So this is a solved problem, but it requires compromises in the data model that people are mostly unwilling to live with.
You can do pretty well if your target database supports idempotent operations.
If you're trying to do pretty well, then you can do a Merkle Tree comparison of both sides at some time (T0 = now() - epsilon) to efficiently search into which records have been lost or misplaced. Then you re-sync them. Here, for efficency, your merkle tree implementation will ideally to span field(s) that are related to the updated_at time, so that only a small subset of the tree is changing all the time. This is a tricky thing to tune.
You'll still be "open to odd inconsistencies when updates are applied out of order" if you haven't made your data model immutable, but I think this is mostly inline with your hopes.
You'd implement the merkle trees yourself in the application layer. Alternately, you could use hash lists. It'd be somewhat similar to how you'd implement geohashing. Let's say you just take a SHA-X of each row represented as a hex varchar, then do something like "SELECT COUNT(*) GROUP BY SUBSTR(`sha_column`, 0, n)". If there's a count mismatch, then drill down into it by checking the first two chars, the first three chars, etc. Materialize some of these views as needed. It's ugly and tricky to tune.
Merkle trees aren't interesting in the no-repair case, as the changelog is more direct and has no downside.
This is almost exactly the cross-DC replication problem, which is a subject of active research.
A changelog on the source side is only sort of helpful. It's useful to advise which rows may have changed, but given that you don't trust the target database, you also need to do repair.
Correct repair is impossible without full syncs (or at least partial sync since known-perfectly-synced snapshot), unless your data model is idempotent and commutative. On-the-fly repair requires out-of-order re-application of previous operations.
The easiest way to reason about commutivity is to just make everything in your database immutable. So this is a solved problem, but it requires compromises in the data model that people are mostly unwilling to live with.
You can do pretty well if your target database supports idempotent operations.
If you're trying to do pretty well, then you can do a Merkle Tree comparison of both sides at some time (T0 = now() - epsilon) to efficiently search into which records have been lost or misplaced. Then you re-sync them. Here, for efficency, your merkle tree implementation will ideally to span field(s) that are related to the updated_at time, so that only a small subset of the tree is changing all the time. This is a tricky thing to tune.
You'll still be "open to odd inconsistencies when updates are applied out of order" if you haven't made your data model immutable, but I think this is mostly inline with your hopes.