
Ask HN: How would you keep 2 APIs in sync? - pbowyer
Since Mulesoft&#x27;s acquisition I&#x27;ve been thinking how I would build something similar. A couple of projects I&#x27;m involved in have mentioned &quot;We need bi-directional sync between Salesforce and X, Y and Z&quot;, which has always been followed by &quot;but MuleSoft and PieSync cost serious $$$$$&quot;.<p>Have have&#x2F;would you build it? The resources I&#x27;ve found are about syncing files between two endpoints, or databases between mobile devices and the server (which is similar, but often encapsulated in the storage).<p>I already have a page of edge-cases where data would be incorrectly overwritten. E.g.:<p><pre><code>  09:00:00 Data polled from systems A and B (for these are &#x27;enterprise&#x27; systems without webhooks or notifications or changefeeds)
  09:00:01 A record (which is in the polled batch) is changed in system A
  09:00:07 The changes are written back to each system, OVERWRITING THE CHANGE IN SYSTEM A MADE AT 09:00:01</code></pre>
Without the ability to lock system A while the sync is done, I don&#x27;t see how this can be prevented?
References appreciated; I&#x27;m on holiday soon and up for a lot of reading. StackExchange has been very silent; &#x27;ESB&#x27; and &#x27;bi-directional sync&#x27; aren&#x27;t the right keywords?<p>Finally, any open source implementations I should look at?
======
indogooner
> Without the ability to lock system A It can be done using watermarking. Let
> us take the database example. Assume we have a timestamp column in database
> A. The column can be used for syncing records to other storage system.

Typical implementations use a query (select records where timestamp >=
'initial_val' and timestamp <= 'max_val/desired_val') and then get the data.
Next run typically starts from max_val of previous run.

For open source you can look at Sqoop[1] or Gobblin[2] which do this

[1]
[https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html](https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html)
[2] [https://gobblin.readthedocs.io/](https://gobblin.readthedocs.io/)

~~~
pbowyer
Thanks for the info re watermarking and the links.

Re watermarking, would it prevent the overwriting example? If you could do
conditional updates via the API then yes, because you could say "WHERE the
record timestamp = the timestamp we have". But if not, the change in System A
will be overwritten because no one knows it has changed.

