Hacker News new | past | comments | ask | show | jobs | submit login

Kleppman is no idiot, and I learned a lot from his book. Highly recommended :)

This seems very usefull, but this part destroys some of the magic:

> The only case Automerge cannot handle automatically, because there is no well-defined resolution, is when users concurrently update the same property in the same object (or, similarly, the same index in the same list). In this case, Automerge arbitrarily picks one of the concurrently written values as the "winner"

I guess that's a pragmatic choice, but isn't avoiding this the whole point of conflict free types?




It goes on:

> Although only one of the concurrently written values shows up in the object, the other values are not lost. They are merely relegated to a conflicts object.

If you have a procedure for resolving conflicts on scalars (for example, a field is a maximum, and you can take the maximum of the conflicting values), you can detect the conflicts and resolve them manually.

On a brief inspection, it appears that the conflict objects hold the conflicting values, but not the previous value, so there's no way to do a three-way merge here. That makes it impossible to resolve conflicts on a field which is a counter, for example.


Although it's not implimented you could create a Counter CRDT using an array of numbers, appending each addition or negation, the 'value' being the sum of the array. This can be compressed by each client changing its most recent value if a sync has not been sent. There is probably also other optimisations you can do too, probably around keeping track of which client made which change and updating its value.


Well, two concurrent changes to the same property is basically a race condition and that can't be cleanly resolved in an automated way. You either pick a winner or you do manual merging like Git, but I guess in applications of such data structures you don't want to do anything manually.


Within a string, most CRDTs will keep both changes effectively concatenating them.

If you have something like a Int value and need to handle it more gracefully in your application than the arbitrary selection by the CRDT toolkit you can keep it within an array/list and then handle the merging yourself. So it starts off as [10], each user when changing it would delete the value and append a new one. That way if you have two users simultaneously change it, one to [15] and the other to [5]. The resulting synced value would be [15,5], you can then handle how to combine the values yourself (addition, average, etc) in a determinist way.

I haven't looked into it but some CRDT toolkits may store Ints by keeping track of changes to it, but the above allows you to do it yourself.


No, this is not the way. The proper way to handle it is that each client trying to update the property tries to get a write lock on it; that write lock has to wait for any other write lock to complete, and if it can't get a lock it needs to let the client know it failed to update. At a bare minimum this is what has to happen to maintain any kind of data integrity.

But by all means, if you think it's basically a race condition any time you have ten clients trying to update the same row in a database and you don't care which one wins, you should be able to write a very compelling and unusual poker website.


You've posted a lot of negative comments but seem to not understand or have any experience with CRDTs.

The whole point is that state modifications on the data structure by separate parties can converge to a final representation without any coordination. The math behind it is proven, however turning mathematical set-theory into a usable JSON interface is where the problem occurs.

A free-form JSON doc can't be completely supported but these kind of edge-cases around property updates are easily handled by using an array to hold modifications instead. CRDTs even have state-based and operation-based usage models to handle different scenarios, and then there's an entirely separate but parallel tech called operational transforms for other situations.

All of this is not only well-studied but widely used in many collaborative applications (eg: Google Docs).


Nitpick: Google Docs uses the server-mediated Operational Transformation (OT), not peer-to-peer CRDTs.


The whole point of CRDTs is that they work offline with no central authority and therefore no locks.


Oh. This is just a local storage thing? Well what's the point of worrying about concurrency at all then?

Just FWIW; I'm hostile to this notion because it breaks decades of best practice about data integrity and claims to solve issues that aren't actually issues in real databases; plus the people boosting it seem to not understand anything about concurrency. If it's just a replacement for indexedDB or sqlite or something, well, who cares...


It's not a replacement for indexedDB or sqlite, and it's not about concurency. It's about merging conflicting changes.

This is a good article explaining the concept and implementation of a CRDT: https://www.inkandswitch.com/peritext/


Let's say you are developing a design application (think figma) which works in the browser. You don't want to lock the whole project just because someone needs to change a label somewhere. With CRDTs n people can open and edit the project at the same time and then push their changes and have them merged.

Very similar to git. When working with git, you don't lock the file or line you are working on.


It's a very narrow use-case of conflict-free data type. Sure if you are willing to wait for however long to get a lock before you can edit something.

The main goal of CRDT is for long-split branch such as can happen in federated systems or with offline management. (think git)


Bad design. Would you ever put something into production that couldn't tell if two people were trying to write the same object at the same time and roll one back?


You seem to only conceive of the web as "html+JS frontend communicating in real time to some server"

We have decades of distributed systems without a central server. Such as git, bit-torrent, Mastodon, Matrix, and the whole web3 mess. It's for these use-case that CRDT helps solve real problems.


Why would you ever put a hyper focused distributed system in a centralized architecture?

You're choosing a sledge hammer for a screw.

This is a database that can work when you're offline. Your central server has nothing to do with this and is entirely incapable of even working.


A distributed lock. Amazing.


CRDT ensures a final state for all users.


CRDTs are mathematical data structures being applied here to create a usable JSON API, but they don't perfectly handle every possible mutation to a document.

The most common approach here is to not rewrite the same property but use an array with modifications continually appended. This has the benefit of also supporting undo/redo behavior, while the state can be compressed in snapshots for performance.


Randomly choosing a value means that references can be broken.

I'm guessing that automerge would not be able handle a JSON that needs to adhere to a JSON Schema. A system that only applies changes when the resulting JSON meets the Schema would a challenge to develop and very valuable.

If there is not data model that is being followed, the JSON structure must be assume to be just a bag of values.


>> Randomly choosing a value means that references can be broken

It also means no data is reliable; hence the entire thing is worse than useless.


It's only stochastically useless. I'm sure it's very useful most of the time.

And keep in mind, if software didn't randomly break for mysterious reasons we might be out of a job, eh?


I think CouchDB/PouchDB do this similarly, the winner is deterministic though?


CouchDB and PouchDB have no idea how to merge a conflict, if two copies are edited prior to syncing, one version it marked as the 'winner' and the other as a conflict version. As the developer you can then either chose to dispose of conflicts or have your own way of merging them.

This is actually a perfect use case for CRDTs such as Automerge and Yjs, I actually built a proof of concept combining Yjs with PouchDB to handle the conflicting edits:

https://discuss.yjs.dev/t/distributed-offline-editing-with-c...


No, the point is avoiding conflicts where possible. Cases like "users concurrently update the same property in the same object" are rare in real life.


No, this is what row-level locks are for.

And concurrent updates happen all the time in real life. This is why every DB in (serious) production has transactions and rollbacks. It happens all the time. If you're not aware of concurrency issues on your DB or your mid-level backend code isn't equipped to handle them, you're writing bad code indeed.


Please reread the "same property in the same object" part. You need row+column level locks for that.

In cases where users update the same row but different [independent] columns, you can avoid unnecessary conflicts. Either by redesigning relational model or by using CRDTs.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: