I'm part of the team that makes Zoho Writer (a Google Docs alternative) - https://writer.zoho.com
We went with OT for our real-time syncing of edits in 2010 and a decade later, we are still sticking with OT for reasons I already stated sometime back - https://news.ycombinator.com/item?id=24186883
However, in the spirit of "There are no solutions, only trade-offs" CRDTs are absolutely necessary for certain type of syncing - like syncing a set of database nodes.
But for systems which already mandate a central server (SaaS/Cloud) and especially for a complex problem like rich-text editing (i.e semantic trees) I still think OT provides better trade-offs than CRDT.
I respect Joseph's conviction on CRDTs being the future, so I guess we'll figure this out sometime soon.
The Zoho ecosystem is this weird place where you can find almost everything, virtually for free. If you’ve never looked before, check it out - it’s expansive.
Frustratingly though, there are so many features heaped in that there is no cohesion. Things are frequently buggy, unreliable and disjointed. I’d almost be able to forgive it but unfortunately the support is really terrible too.
I assessed a lot of crm software and each one I kept finding things they didn’t have that zoho had but for the reasons above we ultimately chose something else. Which is a shame, because I would pay them a lot more than they ask, for them to just be a little better.
> Things are frequently buggy, unreliable and disjointed. I’d almost be able to forgive it but unfortunately the support is really terrible too.
This is my experience. After years of paying for the service, for multiple users, using tools that seemed to have been forgotten by the dev team, my credit card failed to process one month.
Support was not interested in helping me. Not at all. Ended up having to move my team's workflow off Zoho. I had been begging them to bill my card, but they just kept sending me generic emails asking me to try again. Phone support was even less helpful.
We went with OT five years ago in CKEditor 5 and we have the same experience. While it would be tempting to try CRDT, it seems to come with too significant tradeoffs in our scenario.
The other aspect of CRDT that's in some scary is what @mdpye explains in https://news.ycombinator.com/item?id=24621113. With OT, if we get the corner cases right, we can tune and extend the implementation. With CRDT that might be game over.
I haven't yet completely watched Martin's talk on CRDTs, so I might come back and stand corrected. For now these are some well known trade-offs
A central server: Most OT algorithms depend on a central system for intention preservation. CRDTs are truly distributed and need no central server at all.
Memory: Traditionally CRDTs consume more memory because deletions are preserved. OT lets you garbage collect some operations since a central system is already recording those ops and sequencing them as well.
Analysing and cancelling ops: OT lets you easily analyse incoming ops and modify/dummy-ify/cancel them without breaking the consistency. This convenience is not necessary for most cases, but really important for rich-text editing. For example when someone merges a couple of table cells when another user is deleting a column, we need to analyze these operations and modify them so as not to end-up with an invalid table structure.
Absolutely. And as a next step, you distribute your OT per document so that no one of them has too much load. This is exactly how Google docs work and for the most part it works well.
The downside is that if a single document gets a spike of traffic because it goes viral, then that document won't scale.
Right, bur could one of the clients be elected to the OT leader role without having tje smarts at a backend? I guess it boils down to trusting the integrity of the client.
Seems like another one (based off the article) is ease of use as well. I'm not familiar with either algorithm, but sounds like OT is less complex and easier to understand, which IMO is a decent tradeoff worth considering.
Having worked a little with both, my impression is that OT can get very complex in implementation edge cases. CRDTs are incredibly difficult to design, but if you successfully design one which can model your features, implementation is pretty straightforward.
A real world implication is that if you want to add a new operation to a system (like, table column merge, or moving a range of text), with OT, you can probably find a way to extend what you have to get it in there, with a painfully non-linear cost as you add more new operations. With CRDTs, you may find yourself entirely back at the drawing board. But the stuff you do support, you will support pretty well and reliably...
Personally, I prefer CRDTs for their elegance, but it can be difficult in a world of evolving requirements
> A real world implication is that if you want to add a new operation to a system (...) with OT, you can probably find a way to extend what you have to get it in there, with a painfully non-linear cost as you add more new operations
Exactly the experience that we have in CKEditor 5. OT is awfully complex. However, we can fine tune it for new operations and different scenarios of usage (e.g. undoing changes is one of them). With CRDT, at least to my understanding, we might've been limited to what we'd took into consideration on day one. Seeing how many things we learnt very late in the process (5 years so far, some documented in https://ckeditor.com/blog/Lessons-learned-from-creating-a-ri...) makes me thing that we'd sink the company twice already.
I worked on a prototype for a couple of months which would have been a partial competitor to CK. We went with CRDTs because it was quicker to get something simple and reliable off the ground to demonstrate, but we were pretty sure we'd have to switch to support something for the longer term.
It wasn't chosen to go forward as a focus of the business, but it was one of the most fun periods of commercial programming I've done :)
I agree complexity is worth considering, though part of me wonders how important that is in this case. The reason for this intuition is that this is one of core parts of what they’re selling.
If you’re going to invest your complexity budget somewhere, it seems like this is a good place for companies dealing with these structures.
Dealing with text is still an active area of research for CRDTs. While the problem has been theoretically solved, the solutions require much more memory/bandwidth than OT does.[1] Conversely, CRDTs are significantly better at replicating graphs.
yjs[2] is one CRDT that handles text reasonably well, but it can still run into performance edge cases (as they plainly/honestly admit in their README).
Have you watched the presentation referenced in the article? CRDTs have come a long way. You can get down to about a 150% increase in size over a plain text representation. Since most rich text formats store the edit history anyway, this is closer to feasible than the numbers might suggest.
The transform operation is more simple if you know the order of things. For example in OT: nr2) Delete H from index 0. nr1) Insert "Hello" at index 0. You know that nr1 should come before nr2 because of a central counter. But with CRDT it's a) Delete character id 0, b) Insert "Hello" at character with id 0.
I think intention preservation requires that you know what the clients know about. So you still need something like a vector clock to model what the dependency relationship between nr2 and nr1 are.
Yes, in OT the clients can send what they think is the global counter. Then each client or the server can transform the edit operation from the edit history eg. move the index left/right. In CRDT each index is instead an ID and deleted characters has to be saved as "tomestones" so you know where they where.
"Operation Transformation" = "a system that supports collaboration functionalities by separating the high-level transformation (or integration) control from the low-level transformation functions"
Source: OT's Wikipedia article
But I felt the same. Never heard of "Operation Transformation" before and both OT and its alias were equally opaque to me.
I'm going to come clean, I am a shitty software engineer.
I'd like to have things like CRDTs under my radar to pull from when architecting technical solutions. But I don't.
I have been coddled by the bullshit of "just-good-enough" web development for nearly a decade, and I feel like I will be haunted by it forever at this point.
I WANT to employ mathematically proven solutions, but not only has it been 10+ years since studying computer science (and with it, the relevant mathematics) but I never even graduated.
So here I am, another under-educated over-paid GED holding Rails engineer putting together shit solutions I know will eventually leak.
So, having said all of that, is there hope for adding things like this to my toolbelt? And where do I even start? I feel like going back to school full-time is out of the question, because I have a life now. Maybe online classes? I'm curious to hear thoughts from people who maybe came from a similar pit of despair, or otherwise understand what I am talking about.
It's just... sad to know enough that I produce less than optimal work but don't know enough to confidently prevent it from happening over and over and over again. Is this just the profession I have locked myself into?
For every million bricklayers, there may be one or two material scientists devising new clay formulations to make stronger, cheaper bricks.
The irony is that the bricklayers are often paid better, and they can be certain that their profession will remain in demand for the foreseeable future.
The material scientist may hit a dead end, or their company may axe their entire department to "cut costs".
One of the smartest human beings I've ever met ended up at Google, working on something very similar to CRTDs for the Google Wave project. The outcome was a commercial failure and his team was disbanded.
I did the same classes as him at University, and I use almost none of that knowledge in my rather pedestrian job. I make nearly twice what he does and I have nearly perfect job security. So there's that.
This is exactly why I've focused on boring, mainstream tech skills. I'd rather be doing audio dsp or machine learning or fancy AR apps but I figure vanilla js/web chops are going to be in demand for the foreseeable future and pay well enough to support a decent lifestyle.
Look at it this way: Other people tell us what they think we should do with our lives. What we should find interesting, or fulfilling, or is good for our career.
Meanwhile, I feel like I'm an adult and can make my own decisions. I don't care if what I like is not "cool" or "hip" or some "up and coming technology". I don't have to be in research or the cutting edge of something to make interesting personal discoveries.
In my career I've accidentally stumbled upon multidimensional OLAP cubes (in the form of SQL Server Analysis Services) and discovered that they're quite fun to work with, actually. The reputation of business analytics as something "dry and boring" is undeserved. It's a wonderful mix of human problems like "manufacturing calendars" and quirky sales bonus calculations on top of fancy optimisation tricks to make calculations over terabytes of data nearly instant.
I've also had the incredibly gratifying experience of making a bog-standard web app with only a few mildly tricky technical bits for a department of education. It helped about a million kids read more than they would have otherwise. That's something. In fact, I'm certain that my one library app made kids read more books they wouldn't have otherwise than a dozen English teachers.
My parents were educators, and I outperformed them a hundred fold without ever stepping foot in a classroom as a teacher.
I actually did take a couple of years off and try to make a living on my own selling music software with a lot of handwritten DSP code in it. I learned a lot and it was a lot of fun but I also realized that to a certain extent programming is programming no matter what the domain is. Things that seem exotic and sexy on the surface actually involve a lot of the same considerations and problem solving that building a website does.
So yeah there's no shame in working in more mundane & utilitarian areas. And, like you said, you might reach more people with your work that way.
I had the same experience doing game programming. It's not at all like gaming, and almost exactly like all other programming, except with more linear algebra.
Google Wave used OT, though, not CRDTs. Are you saying that Wave also tried to use CRDTs, but failed?
Wave was disbanded, but the technology was apparently integrated into other things such as Google Docs, which uses OT for collaboration quite succesfully (despite the criticisms in OP's article).
> So, having said all of that, is there hope for adding things like this to my toolbelt?
Yes; but not yet. Right now if you want you can use yjs or sharedb. But the APIs aren't spectacular. Eventually I want these things integrated into svelte / react / swiftui and postgres / sqlite / whatever so it just sort of all works together and we get nice tutorials taking you through what you need to know.
We aren't there yet. Automerge is slow (but this is being worked on). Yjs is hard to use well with databases, and its missing some features. We'll get there. The point of all the mathematical formalisms is that if we do it right, it should just work and you shouldn't have to understand the internals to use it.
I'm sharing the vision with you that things should just work. It literally takes you two lines of code to make any of the supported editors collaborative. The Yjs types are a minimal abstraction over the crdt model. I hope that other people build more abstractions around Yjs (for example a react store that simply syncs).
The thing is.. all of what you are talking about is already possible. A wasm port will eventually bring some performance improvements. But Yjs is already very close to optimal performance and it works in all browsers.
But I really am interested in more feature request. I'm mainly interested in building collaborative apps with CRDTs. CRDTs as a data model is an entirely new concept and we need to get experience on how to build apps with CRDTs. And also we need to find out what is still missing.
So head over to discuss.yjs.dev and share your experience.
- The gorgeous automerge API frontend ported across
I want all of that too. Although I would add Automerge's API on-top of Yjs as an additional layer. It really does play nicely with Web Frameworks like react. Although for building editor bindings I would choose the Yjs types because they allow faster access to the data. Yjs only provides a minimal abstraction around the CRDT. I hope that other people will build nice stores/abstractions around Yjs that make interacting with shared data easier.
- Presence information and markers (this is sort of supported now?)
Presence is handled by a different CRDT (the Awareness object in y-protocols). All providers implement the Awareness protocol. In many cases you want one awareness instance for several documents, so it makes sense to separate them.
Markers is probably what I refer to as relative positions: references to specific positions in the document. It even plays nicely with undo-redo.
- Better support for long lived documents. (I want to be able to pull historical deleted stuff out of a document so it doesn't grow unbounded).
Yjs does this reasonably well. To a point that you can even build 3d applications like relm.us with Yjs. But you are right that more work could be invested in researching even better solutions.
- Support for moving objects (as well as inserting and deleting)
Moving of elements can already be modeled with Yjs [1]. Although the approaches described are not very convenient to use. Automerge implements this feature natively, but I'm a bit hesitant to implement the next best approach because I see many downsides that conflicts with features I plan to implement. There is a discussion thread where I outline a few of the tradeoffs [1].
Loved your blog - but I think the parent's question was how to gain the skills of working/designing OT/CRDTs etc on their own. (I am clarifying the question - since I'm interested in the answer as well)
I honestly think a good skill to have as a programmer is to know when something is a complex, niche problem with inherently "fragile" implementations (as in, it is really easy to make mistakes) that is therefore best solved by a few domain experts working on an open source library, and just use that library.
Based on everything I've seen about them, OT/CRDTs sound like they are that.
(Author here) I agree with that conclusion. I think they're really fun to implement and explore. I've been working in the area for a decade. And I've implemented over a dozen variants of OT systems by now. And despite all that I still consistently make mistakes that are only found with a fuzzer. Even thorough unit testing is never quite enough.
I recommend implementing them for the fun of it. I love it. But don't put anything you write into production until you can leave a randomized test suite running overnight without errors.
This guy just asked how to get into them because he is interested in them.
You just replied don't. Have you considered he wants to this regardless of not applying them purely commercially and wants to perform research, or in a novel industrial capacity?
I wasn't telling them not to learn this, or at least that wasn't the intent. Learning for the sake of learning should never be discouraged, and I apologize if I came across that way! I was responding to this last sentence:
> It's just... sad to know enough that I produce less than optimal work but don't know enough to confidently prevent it from happening over and over and over again.
Basically, the notion that not knowing how to implement this means the resulting product is sub-optimal. My counter-argument is that delegating work that needs expert domain knowledge and skills to the experts and use the fruits of their labor will result in a better product.
Also worth checking the new CloudFlare 10th cakeday announcement for their serverless approach to statefull objects. As I understand it, they can accomplish some of the basic CRDT use cases without the technical overhead.
You are not a shitty software engineer who's been "coddled." What you likely are is dissatisfied with where your career is going (and who amongst us is not in some way) and nervous about not having a name brand pedigree so you are turning it inward and trying to understand it in ways that imply the cause and effect are localized to yourself. That's understandable but I wonder if it will get you anywhere. Stop beating yourself up and start exploring what you really want.
Do you want a brand name? Go work at a FAANG. Trust me, they're hiring. You'll get there and like me you'll realize they too have their problems and maybe the "over-paid Rails shit solutions" you "know will eventually leak" maybe weren't so bad after all.
Do you want to level up as an engineer? Go work at a hyper-growth startup (probably series B) that is falling over from its scale problems. You will probably end up solving some oddly challenging and novel engineering problem along the way.
Do you actually not mind this stuff but hate feeling like you have to always keep up with the Jones and it gives you anxiety? Go find a therapist that you have good chemistry with and see if you can't work out why you feel this way and how to fix it. To be honest, the corporate rat race thrives at making people who are otherwise doing quite well with their career progression feel like they're not because that makes them easier to exploit. Life is not an exam to be min-maxed. You don't take that fancy career or lifetime earnings with you when you die, so unless you're doing it for the intrinsic joy of doing it, there's an argument to be made that becoming better at engineering has no guarantee of making your life better.
See this blog post for just what it is -- a breathless discovery of a hammer and an author who now wants to use it for everything but may not be at the point where they realize it might not be a great tool for everything. You the person are probably a lot more competent and capable than you're letting on here. And if that's the case and you're unhappy for reasons in your control, hone in on why that is and see if you can't make life moves to change it. There's never been a better time to make a career move.
To be honest, the corporate rat race thrives at making people who are otherwise doing quite well with their career progression feel like they're not because that makes them easier to exploit. Life is not an exam to be min-maxed. You don't take that fancy career or lifetime earnings with you when you die, so unless you're doing it for the intrinsic joy of doing it, there's an argument to be made that becoming better at engineering has no guarantee of making your life better.
This. Especially that last line. I feel most of the time I'm just wanting to get more and more, both financially and challenges wise. But to what end? Your words here really calmed me down. I got me thinking. I felt like a dog running after a car.
You and me both, my friend. Perhaps now more than ever before, we are not so much dissatisfied with our careers (with a dissatisfaction that has always been present), but more able to think deeply about why that is the case. In my opinion, the answer isn't pleasant and yet, it's strangely comforting (even freeing) in a sense.
Now that I have accepted that my lasting happiness will not be found in a prestigious job or fame as an engineer, where shall I find it? Perhaps the same place other humans have for the last ten thousand years. In the center of a quiet meadow in the woods. Next to an animal that considers me a friend. In the arms of someone I love. Plucking strings on an instrument, singing my stories.
Is this response a coping mechanism, or an acting choice to minimally engage with the parts of human existence which entail naked power brokering and empire building? Maybe it's a little of both. May we both find our own peace somehow, friend.
When this guy discusses OT vs CRDT he's discussing algorithms for some very specific use cases. Namely, concurrent document editing a la Google Docs. Realistically, there's maybe 50 people in the world who are versed in these kinds of algorithms at a really deep level. 99% of us are users. If I was pressed into it, I maybe have the chops to implement a reference implementation for something like this, but I've never come with a country mile of an organization that has the resources to invest in something like that. You can be a great programmer with a very successful career as someone who only knows how to read an API and use it. And unless you are tasked with creating a concurrent document editor, you will probably never need to understand the API for a CRDT.
generally agree, but I bet there is more like 500-5000 people that understand this kind of stuff. Most modern web tools support at least some basic functionality of synchronously working on the same document.
Concurrency is really broad problem space with a million applications, but not real-time editing of an indeterminate data object. Concurrent modification of a row in a database is a perfect example of deeply complex problem that 99% of us don't really need to understand beyond how to use the API. If you know how to commit a transaction, you don't need to read the thousands of dissertations on how to process them optimally.
I wrote an article for folks like you (because I also felt like this was my life at one point).
In this article I explain the basics of the simplest CRDTS the way a web developer without a deep knowledge of CS could relate to it: https://statagroup.com/articles/editing-shared-resources-crd... . I don't present the concept 100% fully, but I give a starting point for someone who "only knows rails". I found most online tutorials dive directly into the theory.
I included all the resources I used to learn the concept on the page too.
I feel ya, I did not graduate either, and didn't go for CS. After having Linux as a hobby, I made my way into a web dev job. The "good enough" work is killing me. I feel like I am pushing and pulling my team to the light, even simple things like DRY.
I think one thing I'm learning is where to spend my effort. Its definitely a balance, but I'm putting more time/passion into my personal projects/learning, whereas before I saw more opportunities for overlap with my passion and work. When your work doesn't recognize that effort or actively fights against those efforts, it's draining and frustrating.
I've also been thinking about dedicating some serious time to learning. I keep thinking about the idea of taking a month or two off from work. Just treat myself to my own mini-semester of homeschool. I'm somewhat confident I could learn enough to land a better job, but more importantly, one I enjoy more.
You guys are overestimating the degree to which concepts from CS curricula are relevant to the average engineer's work. "Pushing and pulling your team to the light" is normal, can't be solved by getting a CS degree, and is more about your team dynamics than anything.
No, you have not locked yourself into anything, you can always grow. Comfort can breed complacency, so my initial advice is to quit your job and try to get a new one that's more demanding. It forces you to kick your own butt into learning new stuff.
I am an under-educated over-paid GED holding Glorified Sysadmin, and while a lot of the shit solutions I work on will eventually leak, I do spend a non-trivial amount of time looking for better solutions. I also try to keep my ears open for what other people are working on and read up on concepts I'm unfamiliar with, whether they crop up on HN or through work.
I learned about CDRTs several years ago at a former employer where a team mate was talking about how it was the future. However, we didn't end up using them. It turns out the more advanced your system gets, the more work goes into it, and the more of a pain in the ass it becomes to run [until it reaches a certain level of operational stability]. So I wouldn't say that using the most bleeding edge research is a good idea even if you know about it. Most people are still fucking up the basic stuff we were doing 10 years ago, and the neckbeards here probably have loads of stories about how much simpler and more stable distributed systems used to be.
> So, having said all of that, is there hope for adding things like this to my toolbelt? And where do I even start?
HN is a decent place to start. Now that you've read this article, you're aware of the general landscape of concurrent document editing algorithms.
Ultimately, all you really need is the language to a) describe your problem and b) compare potential solutions. I have a CS degree, and it's not like we spent semesters learning every algorithm under the sun. But what we did learn was the language to describe and compare the performance of algorithms (computational complexity theory). Beyond that, it's just about knowing enough terminology to be able to describe your problem, which you could learn just by skimming through an algorithms textbook (or random Wikipedia articles, honestly).
For example: a friend of mine was recently trying to devise an algorithm for ordering buff applications for a collectible card game. Once I was able to find the right language to describe the problem (topological sorting), finding an algorithm to use was easy.
The article mentions Martin Kleppmann. Go buy a copy of his "Designing Data-Intensive Applications". It may well have been titled "Practical Distributed Systems for the Everyday Developer". It is an absolutely fantastic book with a perfect ratio of theory/practice.
Extra points for buying a dead tree copy and reading it without a thousand alerts and internet temptations vying for your attention :)
Sounds like you've got an awareness of CRDTs now. That's a win and a differentiator from many other engineers.
I already knew about them, but there will be a stack of other stuff I don't know about. Just get Googling when you hit different problem areas.
I found out about CRDTs because I was working in a niche where a security device was used that breaks TCP between two hosting regions for an application. So I searched for issues to do with that and refreshed myself on the two general's problem, Byzantine fault tolerance, 2 and 3 phase commits and at some point OTs and CRDTs popped up. You'll find this stuff if you keep looking at a problem for long enough.
That was a while ago though and despite still working in that area I've still not implemented anything with them because I'm a contractor and I'm under the direction of people who don't have CRDTs in their toolbelt and don't want ideas from others! I guess my cheerful message is keep at it but know you can only do so much and there's plenty of other pits of despair to fall into even when you're out of the current one. Yay!
I never graduated, but that was back in the 80s. The important part is not getting "official" education, but learning how to self-learn. You don't need to necessarily understand the underlying maths, if you can understand the general theory.
And you're not a "shitty software engineer" if you are self-aware enough to know what you don't know. Read, start with Wikipedia, look up the references, play with stuff, explore.
If you need to understand certain things at a deeper level, then explore the classes and other guided tuition.
I've been doing this stuff for 30 years and I still learn every day. You will produce less than optimal work for ever. As a master craftsman (which is what you are, not an "engineer") you are always learning and improving your craft.
You can be satisfied when completing some work if you did your best to produce it and it works within the constraints you had. There will always be things you would do different or better, so do that next time.
CRDTs are just applied partial order theory. Partial orders are just mazes where you can drag your finger in a certain vague direcion, taking arbitrary turns in that vague direction, and still find the exit.
Get your intuition down really good and 10+ years is not required.
I can give a philosophical answer rather than a technical one. Our lives are filled with learning various things. And we all realize there is no end to it. But truly what makes you feel better about what you do is not this endless learning. It is when you can drop what you learnt and think freely about a problem that you can solve. Until you reach there, you are in a boat tossed by the waves trying to grab whatever you can to keep yourself afloat. Please disregard this answer if it is not of any help.
I love the candor in your post but you're being too hard on yourself.
Learn functional programming, gradually. You'll get there. I like to think I reverted a decade of brain damage inflicted by enterprise java development in three years.
I've worked for multiple Rails shops with high quality engineers. The trick is to interview them too before accepting any offer. The best teams I've been on have been small. Past a team/project size, maintaining high quality code becomes much more difficult.
Normally I would downvote a comment that amounts to "read the article", except that this one is entirely on point.
One of the classic problems was size and performance. However both problems have now disappeared. Operations are `O(log(n))` and the size overhead is a factor of 1.5-2.0. That's quite workable.
This is my daily struggle routine. I cannot get rid of the idea that proceeding my career only accumulate how much I earn, instead of making myself a better programmer.
The three most recent HN discussions on CRDTs are all worth perusing.
[1] is an excellent tutorial that assumes no initial familiarity with CRDTs or the math that underpins them. It walks you through both the formalisms and the implementation, which is pretty key to understanding why making real-world CRDTs flexible enough to handle things like rich text editing is hard.
[2] is a talk that goes more in-depth on the hard parts
[3] goes deeper on OT vs. CRDT
It's worth noting that many of the CRDT discussions focus on collaborative text editing. That's a really hard problem. CRDTs are (and have been for some time) a useful primitive for building distributed systems.
I use CRDTs in production at Jackbox for audience functionality and honestly I don't know why the only thing that people talk about when it comes to CRDTs is collaborative text editing. Like, sure, cool, but that's literally a single problem domain and like, Google Docs already exists and works well for the majority of users so how many developers actually need to create a collaborative text editor? CRDTs are an incredibly abstract and versatile concept; collaborative text editing is a tiny problem domain. I would really like to see more writing about CRDTs that is NOT about collaborative text editing.
Author of the blog post here. I’ll let you in on a dirty secret: I agree with you.
I see text editing as the beachhead for this tech. Text editing is hard enough that other systems don’t work well, so you’re kind of forced to use OT or CRDTs to make it work. And text documents are lists of characters - so once you’ve made it work there you have an implementation that can also sync arbitrary lists. At that point, editing maps (js objects) and adding support for arbitrary moves and reparenting will allow us to add real-time editing to a lot more software.
I think there’s a lot of interesting software architectures that can be enabled by making everything in a system a CRDT document. But text is still the beachhead. And a good litmus test for performance. And an important application in its own right.
on the one hand, the generality of the text editing solutions is really powerful, and I see what you mean in terms of that solution generalizing to other domains. But on the other hand, I always think back to how popular Memcache or Redis were even early on when they had very very few features. Just having a fast, in memory cache of strings empowered a lot of interesting new product development. I really wonder how much the average developer on a random project could get out of an appliance that lets you create and mutate an arbitrarily large number of values of the well-known, simple CRDT types like g-counters, pn-counters, 2P-sets, etc. Most of the literature is focused on "how do we merge the most complex data type", and not questions like "how do we manage the entire lifecycle of CRDT stores", "how does a CRDT store fit into most people's stack", "should CRDT types be added to existing stores or should developers expect dedicated CRDT-only stores", or "do people generally need to define their own CRDT types or do most people just want a box full of common ones". I hand-rolled my own CRDT setup just to get the most simple CRDT types because I didn't see anything out there that makes directly consuming CRDT types by application developers accessible. E.g., you make a g-counter and literally all a client can do with it is increment it or read its value. That's it. We have that, and it's totally useful! We also do entirely state-based replication because expressing the operations on the data would be so much larger than the data itself. But our situation is just so off-base for many people because clients are only ever interested in the current state (and never care about the past state), and we can safely just ignore the problem of when to delete the data; we just keep it around until you're finished playing the game, and then delete it when the game is over.
Etcd does some of this stuff, but in general it sounds like you have a useful opensource tool inside you that wants to be shared with the world. I can't wait to see it made.
How could CRDTs be used to collaborate in a project made with a visual programming language that consists of interconnected operators? Is it necessary to serialize this graph?
However in my experience most games's shared world state cannot be reconciled if one player's action is inserted in time prior to another player action. Or future actions reveal hidden information.
Even turn based games have turn timers. Including Jackbox games!
Among public databases there may only be two correct CRDT implementations in the world - Cassandra's and etcd's. Compare to how many database products make CRDT or CRDT-like promises, and how few probably actually work.
Lots of software promises CRDTs-like-over-actors, e.g. Erlang, the new Cloudflare product. Not really qualified to talk about them.
Most applications normal people use boil down to chat conversations - chat apps, Gmail, Facebook, collaborative text like you said, all sorts of log-like things - or serial, recommendation-driven consumption - YouTube, Netflix, which use incremental / batched cells in a matrix to update their recs. This stuff is insensitive to order and time. Then there's e-commerce, which is inherently linear, centralized, single-actor, all-or-nothing transactions, CRDTs provide no advantages here.
It's tricky. On the one hand you'd get this interesting discussion of CRDT applications. On the other it may wither under intense scrutiny.
a lot of other use cases are interested in using CRDTs for complex mutations with a relatively small number of users. Most collaborative editing sessions will be a small handful of users. Our use cases are things more like "three thousand people are presented with a multiple-choice question simultaneously and answer it simultaneously and we have to show the results on an xbox and the entire thing has to happen in under 15 seconds". We mostly just use the standard g-counter and pn-counter types. We have a thing that's like ... a high score list for a trivia game that doesn't have a well known type (it's kinda like ... having a zset where only the top N items in the zset are replicated) and a CRDT implementation of a ring buffer of strings for audience comments in ... Quiplash 2, iirc. I'm a bit tied up since we have a new pack coming out October 15th, but I might have time to write some stuff up after that.
We want something like that for distributed collaboration in Ardour, a cross-platform DAW. The relevant state is serialized as an XML file, and used natively as a complex object tree. Users want to be able to edit (in the DAW) locally, then share (and merge) their results with collaborators.
I've plugged this collaboration project a few times recently, and have no relationship to it other than discovering it (via YJS' "who is using" list[1]) and finding it fascinating:
What I find most interesting about it is that it has reduced the state of multiple 'smart' user-facing widgets/apps into a common, lowest-common-denominator format (a text document) that lends itself more easily and intuitively to collaborative editing and CRDT operations.
I don't know for sure whether this is the path forward for CRDT-based applications in general, but I think there are valuable ideas there. It does raise the possibility of the widgets/applications occasionally being in 'invalid' states; but rarely in a way that the human participants wouldn't notice or be able to fix themselves.
Whether that scales to the complexity of the state management for a multi-track audio editing session, I don't know; but it could be instructional to compare.
in real-time? Well, I have thoughts, but I'm not super familiar with Ardour itself, so I'm not sure if you're trying to merge during a live performance or if you're talking more of a distributed studio recording session type situation. I have working knowledge of Reason and Logic and ChucK (which I use with JACK and do some networked OSC stuff with, although I haven't touched it in a few years).
The approach we use at Jackbox for making the state of an xbox game mutable to thousands of live viewers on twitch is to have lots of little CRDT values, mostly just counters and sets of strings, and you merge the little values independently of one another, which is very different from the situation of editing a text document, which is typically structured as one big value. I wonder if, for a DAW, you could merge at the track or control level instead of the workspace level. E.g., communicate as an independent value the state of an individual fader, and communicate either states or operations on that fader and have each client merge them. In this example, the fader's state would be encoded as a PN-counter with a range condition, and you'd replicate increment and decrement options, like it was a networked rotary encoder. So every mutable thing in the DAW would be a value having operations that can be merged, instead of having a single big value representing the entire state of the DAW. My use-case is also funky because I have potentially thousands of writers writing to the same key, but only a single reader, and the reader doesn't need an update at every change, so I use state-based CRDTs, but I think most other people using CRDTs use operation-based CRDTs. Also not sure how you would mutate two separate values transactionally or if that's a thing you even need.
Not realtime. Users would sync periodically during their working process.
There are lots of mutable things in a DAW that are not numeric parameters.
The state of a playlist (just think "some time ordered list of objects") is not treatable in the same way as the value of a fader.
If you had a context-aware XML parser and access to timestamps for every XML node, you could do the human-aided merge by considering each node and just using the latest version of the same node, falling back to the human when there's a deletion conflict (for example). But this doesn't actually merge the attributes of a node or deal with conflicts between otherwise sequential edits.
We use CRDTs for having a scalable, availability first, service discovery implementation. I’m sure there are more uses out there. We use Akka Distributed Data and there are many users of that.
just wanted to say, i really like Jackbox's games and its really fun to see you here! i've been exposed to your games via youtube vids and it feels weird (but cool!) to have that intersect with HN. would love to read more about how you use CRDTs to make it all work! (i'm pretty clueless re: multiplayer games so even a high-level explanation would be interesting)
Thanks for your work on Jackbox! Never had an issue in the games and they are perfect for parties. I always thought it would be fun to make a Jackbox-type game system.
If you're a young technical entrepreneur looking for a 10-100M startup opportunity and with a very interesting technical challenge behind it: Create a collaborative replacement of Jupyter Notebooks. There's already some effort done in JupyterLab fork if you're interested [0], but with no significant advancements.
So yes, I agree that CDRTs are indeed a promising endeavor.
CoCalc is a collaborative replacement of Jupyter notebooks. It's a top-to-bottom re-implementation of the entire Jupyter stack designed specifically for realtime collaboration. You can use it via our hosted offering (https://cocalc.com), or install it on prem via https://github.com/sagemathinc/cocalc-docker.
We released the our collaborative Jupyter notebook in 2014 as a plugin to Jupyter classic. We then iterated on what we learned over the years, completely rewriting everything multiple times, including the entire realtime collaboration stack. Cocalc's Jupyter support is pretty mature and battle tested at this point, and also includes a TimeTravel slider that lets you view all past versions of a Jupyter notebook and integrated chat.
I was a college professor (at Univ of Washington), I started a company around this in 2015, so CoCalc has soo far been mainly aimed at serving the needs of academics teaching courses. It's been increasingly popular lately, e.g., in the last month over a half million distinct Jupyter notebooks were edited on https://cocalc.com. Of course, many of these notebooks are homework problems. Anyway, our company is doing very well, and we hope it will eventually be a "10M startup opportunity". :-)
Domino Data Lab has been around for a while and closed another $43M in funding earlier this year. They have a boatload of tools around collaborative notebooks. They go even further and have data science manager-level dashboards to track the notebooks, their resources, and who is working on what. There are others, but I'm calling this company out specifically because they've shown great traction and I've spent a little time with the cofounders when they were still at a shared incubator space.
Interesting decision process. I kept wondering if other people had implemented the Figma approach and it looks like you did a nice job with it. I also appreciate you putting those cool explainers up front
There's no need to gatekeep building something on already having knowledge.
If someone has time and energy and desire, not knowing anything about document editing or CRDTs is not a blocker. Those things can be learned in a week to a month by someone who dedicates time to it.
Very few parts of software are inaccessible to someone with basic CS knowledge. It's a great idea for people to try something, regardless of their background, and if they fail but learn something, that's still a fine outcome.
A question that's been in my mind for a while is why Version Control and Collaborative Editing work at such cross purposes with each other when they are essentially solving the same problem? The biggest difference is that one works interactively and the other favors a CLI. Beyond that, how much of the distinction is artificial?
In particular I've been wondering about the space between CRDTs and the 'theory of patches' such as we discussed with Pijul the other day.
I have a collaborative editing project that's been sitting in my in-box for a long time now because I don't want to write my own edit history code and existing tools don't have enough ability to reason about the contents as structured data. The target audience is technology-averse, so no 'dancing bears' are going to interest them. It's not enough for it to work, it has to work very well.
As it stands today, version control and collaborative editing do not solve the same problem. Version control deals with large chunks of changes at a time. I don't even particularly want a version control system that stored every single keystroke made in source code. [1] Collaborative editing deals with keystroke-by-keystroke updates. By the standard of collaborative editing, even a single line source control commit is a big change.
The problem spaces are quite different. Problems that emerge on a minute-by-minute basis in collaborative editing emerge on a week-by-week basis in source control, and when the problems emerge in the latter, they tend to be much larger (because you can build up a much bigger merge conflict on a routine basis with the big chunks you're making).
Yes, it's true that if you squint hard, it looks like version control is a subset of collaborative editing, but I'd be really hesitant to, say, try to start a start-up based on that observation, because even if we take for the sake of argument that it's a good idea to use the same underlying data structures, the UI affordances you're going to need to navigate the problem space are going to be very different, and some of the obvious ways of trying to "fix" that would be awful, e.g., yes, you could give me a "collaborative space" where I see what everybody's doing in their code in real time... but it's a feature, not a bug, that when I'm working on a feature I'm isolated from what everyone else is doing at that exact moment. When I run the compiler and it errors out, it's really, really nice to have a good idea that it's my change that produced that result.
(I'm aware that collaborative editing also has the "I was offline for a week and here's a bunch of conflicts", but I'm thinking in terms of UI paradigms. That's not the common case for most/all collaborative editing systems.)
[1]: Not saying the only solution is the one we had now. A magic genie that watched over the code and made commits for you at exactly the right level of granularity would be great, so you'd never lose any useful context. But key-by-key isn't that level of granularity.
Version control is collaborative editing. Synchronizing on every key stroke is real-time collaborative editing. That's nice if you're working on a overlapping data at the same time. In code this does not happen so often because code repositories tend to be large.
Git does not work well for text because we have not figured out a nice format for text yet that developers and other people both enjoy. Developers want to stick to plain text as their format because we have so far failed to create nice tools and formats for structured data. Perhaps these affordances can appear thanks to a popularization of real-time collaborative editing.
Nobody is using "collaborative editing" to mean the sort of thing we've been doing with source control for decades, even if the plain English meaning of the component words seems like it might match that. We wouldn't have needed a new category term if "collaborative editing" didn't have the new element of real-time sharing.
The most common way in word processing to do collaborative editing is with 'track changes'. That has has quite a bit in common with version control and pull requests. There are different forms to do collaborative editing, real-time or asynchronous.
I count five or six people, including the author, in your 'nobody'.
If you watch any of the videos on CRDT, live editing (real-time) is every bit as much a corner case as the 'play by mail' scenario that code repositories do.
Asynchronous communications complicate Last Writer Wins algorithms, because the network is partitioned and time stamps don't have the same predictive value. Git is DVCS, where the "D" really means partition-resilience.
I'm not sure the distinction is as clear as you are making out.
Eclipse used to (?) have a feature where each local save was stored separately, so it was trivial to view local history.
I've done remote code reviews with shared documents.
I can certainly imagine a system that combines the two of these things into a single system where real time version control was integrated into a multi-user system.
I've been on systems where multiple developers were trying to develop on the same system at once. I've also seen teams trying to do it systematically. It scales basically to two developers, sitting across from each other. Three, again, physically colocated, on a good day. Even if they're working on completely separate tasks, you hit "compile" and it's a complete mystery what's going to happen. It's not even stable if you do nothing and just hit "compile" again.
Beyond that it's insane. You do not want that in your version control system, as something built in, working all the time, across your entire team. It would be a massive anti-feature that would nuke your product.
Again, anyone thinking this sounds like a totally awesome idea, I strongly encourage you to try out the simple version, availablbe right now, of just "five or six people editing the same source code checkout" right now, before betting a start up on it. I guarantee a complete lack of desire to productize the result if you try it for a week or two.
A middle ground could be nice: An IDE extension that notifies you when something you're writing will conflict in the future, should you and your coworker both commit and push what you've typed out. It would allow you to sort that out immediately, or at least plan ahead, rather than being surprised by a large merge conflict n days down the road.
Agreed; that sounds like it could be potentially useful. If you've got something with a central server already in the model, up to and including something like github or something, it doesn't even sound that hard to check that on a per-push basis. A bit of trickiness around what constitutes a "live" branch, but that seems amenable to heuristics.
There's no reason BitBucket, GitLab, and GitHub couldn't scan all of the unmerged commits on pushed branches to look for impending merge conflicts. BitBucket at least does once you start a PR. They could be more aggressive about that without changing any underlying algorithms.
The main difference would be that the PR check only compares with master, and you'd want to compare all branches. But a low-constant-factor test for collisions would make that cheap enough (conflicts can only occur for commits that contain the same file names, which is an O(mlogm) set intersection test of short text strings)
Around 6-7 years ago we started a collaborative editing project for prezi.com. The problem basically boiled down to concurrent editing of a big DOM-like data-structure. We looked at the little literature that was available at the time including OT and CRDTs, but quickly realized that none of the existing approaches were mature enough for our needs. All of them were stuck at "text editing", but we needed to edit these big object DAGs.
So we ended up essentially implementing what you laid out, an in-memory revision control system, although using a bit more formal methods to reason about divergence/convergence of clients.
The most basic operation was the "diamond merge": given operation x:A->B, y:A->C, construct x':C->D, y':B->D such that x' . y == y' . x
It also had to satisfy certain other algebraic laws, notably diamond composition, which allowed us to compose these merging operations whenever we wanted, guaranteeing that the clients will eventually converge to the same data state. It was quite neat! Shame that it's all proprietary.
Good old days. I remember, the most pesky operation was implementing a good undo-redo algorithm, it's quite tricky, even once you add inverses.
There's a next level of VCS forming on the horizon, in some combination of CRDTs, patch theory, and grammar-aware diffing.
Which should also learn from fossil, and consider metadata such as issues and surrounding discussions to be a part of the repo.
A really robust solution would also be aware of dependencies and build systems, and even deployment: I see these as all fundamentally related, and connected to versioning in a way that should be reflected and tracked through software.
If you look into Bazel (build system), you start getting to the point where everything including dependencies, build system, and deployments can be defined as "source" code and ideally should be treated as a first class software
Cloud based code environments are starting to merge this. Github Code Spaces for one are starting this. I don't know if they use Operational Transaction (OT) or Conflict-Free Replicated Data Types (CRDT) but they are repo backed. I assume it is just using Github diffing tools in the repos and maybe OT/CRDT in live sessions over WebRTC or similar.
Much of real-time collaboration goes back to networking and real-time networking used in distributed multi-user systems like games, where simulations need to sync on a server. In games though, Dead Reckoning [2] is used as well as interpolation and extrapolation in prediction, much of it can be slightly different for instance with physics/effect, but messages that are important to all like scores or game start/end are reliably synced and determined on the server.
I wonder if there is a way to describe change sets as a mathematical curve and achieve something like the rewind-ability within Planetary Annihilation https://www.forrestthewoods.com/blog/tech_of_planetary_annih... which seems to be an smoother alternative to dead-reckoning that bakes the history into it a bit better.
Author of the blog post here. I totally agree with you.
People think of OT / CRDT as realtime algorithms for realtime collaborative editing because they're always programmed and used that way. But the conflict resolution approach doesn't have to merge everything as-is. You could build a CRDT or OT system that generated VCS-style conflicts if concurrent edits happen on the same line of code. To make it a valid OT / CRDT algorithm the main constraint is just that every peer needs to resolve conflicts the same way. (So if I merge your changes or you merge my changes, we end up with identical document states). It would be easier to implement using OT because you only have to consider the interaction between two peers. But I think its definitely doable in a CRDT as well.
I think having something that seamlessly worked in both pair programming setups and with git style feature branches & merging would be fantastic.
I have a lot of thoughts about this and would be happy to talk more about it with folks in this space.
I've approached this problem from a different angle. I thought one could embrace the possibility of several truths. In my solution a truth lives in what I call a layer. Different layers can then be projected together in different stacks. Instead of using feature branches/toggles one can change the stack by adding/removing/reordering layers. One can also handle localized content this way, which was the original use case before the idea mutated.
I also thought one could differ between two different kinds of conflicts. I call them simultaneities and disagreements. Simultaneities happen when concurrent changes happen and could be handled by CRDTS for example. Disagreements are conflicts between layers.
The idea is then that you can choose to work in the same layer as someone else, if you are working close. You can also "shield" yourself from changes in other layers by working in a different layer. If you want to detect a disagreement between your layer and a layer someone else is working on, you project your layer on top of that layer.
Even though I believe in these ideas I don't know how to get other people interested in them. It might be that they are flawed in an obvious way not apparent to me.
What would it take to make someone of your caliber curious?
Sounds like a cool idea! There's some UX flows I'm not entirely clear on, but there's some novel ideas buried in here that I haven't seen anyone talk about before, which CRDTs would enable.
> What would it take to make someone of your caliber curious?
Talk about it more! Write a blog post describing the use cases and how it would work. Make UI mockups. Talk about what problems exist today that your idea would solve. Hop on to the automerge slack channel (linked from the automerge github repo) and show us.
And if you want to, build it. You don't need anyone's permission to make useful software.
My understanding may be flawed, but as far as I know you can think of an OT log and a git log as being similar. Each party generates deltas to the data structure that are recorded in the log, and when these parallel histories meet they must be merged. OT merges without involving the user, which sometimes leads it to discard changes. Git merges like that if it can, but when something must be discarded it asks the user. It is the interactive merging and deep ability to navigate and edit the log of changes that makes git so command-liney.
Not intending to nit-pick, but Git doesn't store the content as deltas. Each commit is the snapshot of the entirety of the codebase at that point in time.
Packfiles are nothing but a compression/storage scheme - which is a couple of levels under git-objects conceptually. Saying that "git stores content in deltas because it uses packfiles" is like saying "git stores content in deltas because it uses compression". Of course it does. You'd be hard-pressed to find any tool similar in functionality that doesn't compress content before persisting it or sending it over the network.
The Linus' original insight that made git successful in part was exactly the fact that: You don't need to do the delta scheme on the VCS-object level, but you can just use it on the storage/compression level. So conceptually CRDTs are more similar to the VCSes of the past (e.g. SVN) than they are to git.
line-oriented data formats vs everything else. Why ? Because of "patching theory". If you don't understand the the data describes objects and doesn't have line-by-line semantics, it is hard to get merges correct.
Version control works wonders with line-oriented stuff, which covers more or less every programming language in existence.
It doesn't do so well with non-line-oriented structured formats such as XML (not sure how JSON or TOML) fits in here).
Given that collaborative editing typically works with non-line-oriented data formats, you can see the issue, I think.
That's what I refer to as "grammar-aware diffing" in the sibling comment, and it's one of the low-hanging fruits here.
Even git allows for pluggable diffing, and doesn't force line orientation. What's missing is the concept of moving something, as distinct from deleting lines/chunks and then inserting lines/chunks which just happen to be the same.
This is not a problem which CRDTs have, to put it mildly. I believe pijul understands it as well. A lot of this stuff is right out on the cutting edge, and as it matures it will become practical to connect the edges, such as a CRDT which collaborates with a parser to produce grammar-aware patches which are automagically fed to pijul or something like it.
This comes with a host of problems, mostly that we're not used to dealing with a history which has this level of granularity, most of which we don't want to see, most of the time. But they would be nice problems to have.
Some of "We" depend on sub-line diff highlighting during code reviews in order to reason about refactors and adding/removing arguments from function signatures.
That this is generally a feature of the diff tool and not the version control is a bit disappointing.
If you're interested in building collaborative apps but not the architectural overhead of implementing CRDTs I'd recommend checking out roomservice.dev [1]. They've begun to power some other collaborative apps such as Tella.tv [2] - realtime browser-based video editing.
I have been consistently at odds with myself comparing CRDTs vs. OT. One the one hand, CRDTs have a nicer core formalism. On the other hand, OT works, and is closer to the actual driving events of text editing.
The core argument of this article: that CRDTs now work and distributed is better than centralized I question. I certainly want more distribution than "everything is run on a google server" but do I really foresee a need for distributing a single document? One server with an optimal OT implementation can probably handle near a million active connections.
In practice, that's plenty. Each piece of data having one owner is quite reasonable. There are lots of pieces of data.
I remain on the fence for collaborative text editing. Though it's great to see all the work pushing CRDTs forward!
Blog author here. I've been having this conversation with a lot of folks over the last few weeks and I hear you.
Does it make sense for us as an opensource community to invest our time and energy making one really good CRDT (with implementations in a few languages). Or does it make sense for us to distribute that energy between a bunch of CRDT and OT implementations, with different performance tradeoffs?
My take is that its been hugely beneficial to us all that JSON is a standard, because I can use it from every language, and have confidence that the implementations are fast and good quality. I think we have an opportunity to make that for a good CRDT too. Even if OT would work fine in your architecture, if we have a great, capable, fast CRDT kicking around, it could be a reasonable default for most people. And my claim is that the performance difference between CRDTs and OT is smaller than the difference between high and low quality implementations. (I expect a well written CRDT in wasm will outperform my OT code in javascript.)
>>> Philosophically, if I modify a google doc my computer is asking Google for permission to edit the file. (You can tell because if google’s servers say no, I lose my changes.) In comparison, if I git push to github, I’m only notifying github about the change to my code. My repository is mine. I own all the bits, and all the hardware that houses them. This is how I want all my software to work.
I also long for this future. I want all software to work in this manner.
Interesting to know how CRDTs are enabling this, or how the limitations of OTs have restricted development of these tools (although git and GitHub exist regardless)
If you don't use CRDTs, you may be doomed to re-invent them. Reading about them just now I realized that I spent the last year developing a CRDT with LWW and OR characteristics.
edit: updated 'you are doomed' to 'you may be doomed'.
When dealing with this type of discussion I always try to remember that making design decisions is a tradeoff, an arbitrage highly dependent on your knowledge of the field, but also context and taste.
Believing there is a silver bullet is a fool errand.
From what I've read about CRDTs, it seems difficult to escape the overengineering trap when dealing with them.
I tend to agree. Each team, project, and organization has different needs, preferences, and cultures. One-size-fits-all is a really tall order.
I believe it's better to focus on kits of parts--API's and/or self-contained functions--that can be combined or ignored as needed, along with a variety of reference application samples.
Having lots of ways to easily filter and sort content is also very useful. For example, filtering and/or sorting annotations by person, group, date, content (sub-strings) is very useful. A query-by-example kind of interface is nice for this.
I hate that I am skeptical on this. I suspect wave just left that bad of a taste behind. So much hubris in what was claimed to be possible.
The ideas do look nice. And I suspect it has gotten farther than I give credit. However, sequencing the edits of independent actors is likely not something you will solve with a data structure.
Take the example of a doc getting overwhelmed. Let's say you can make it so that you don't have a server to coordinate. Is it realistic to think hundreds of people can edit a document in real time at the same time and come up with something coherent?
Best I can currently imagine is it works if they are editing hundreds of pages. But, that is back to the basic wiki structure working fine.
So, help me fix my imagination. Why is this the future?
So yes, hundreds of people can edit a string and produce a coherent result at the end. Contiguous runs of characters will stick together and interleave with concurrent edits.
CRDTs don't guarantee coherence, but instead guarantee consistency.
The result may often be coherent at the sentence level if the edits are normal human edits, but often will not be at the whole-document level.
For a simplistic example, if one person changes a frequently-used term throughout the document, and another person uses the old term in a bunch of places when writing new content, the document will be semantically inconsistent, even though all users made semantically consistent changes and are now seeing the same eventually-consistent document.
For a contrived example of local inconsistency, consider the phrase "James had a bass on his wall." Alice rewrites this to "James had a bass on his wall, a trophy from his fishing trip last summer," and Brianna separately chooses "James, being musically inclined, had hung his favorite bass on his wall."
The CRDT dutifully applies both edits, and resolves this as: "James, being musically inclined, had hung his favorite bass on his wall, a trophy from his fishing trip last summer."
In nearly any system, semantic data is not completely represented by any available data model. Any automatic conflict-resolution model, no matter how smart, can lead to semantically-nonsensical merges.
CRDTs are very very cool. Too often, though, people think that they can substitute for manual review and conflict resolution.
Right. The problem CRDTs solve is the problem of the three-way merge conflict in git: the problem of the "correct" merge being underspecified by the formalism, and so implementation dependent.
If two different git clients each implemented some automated form of merge-conflict resolution; and then each of them tried to resolve the same conflicting merge; then each client might resolve the conflict in a different, implementation-dependent way, resulting in differing commits. (This is already what happens even without automation—the "implementation" being depended upon is the set of manual case-by-case choices made by each human.)
CRDTs are data structures that explicitly specify, in the definition of what a conforming implementation would look like, how "merge conflicts" for the data should be resolved. (Really, they specify their way around the data ever coming into conflict — thus "conflict-free" — but it's easier to talk about them resolving conflicts.)
In the git analogy, you could think of a CRDT as a pair of "data-format aware" algorithms: a merge algorithm, and a pre-commit validation algorithm. The git client would, upon commit, run the pre-commit validation algorithm specific to the file's type, and only actually accept the commit if the modified file remained "mergeable." The client would then, upon merge, hand two of these files to a file-type-specific merge algorithm, which would be guaranteed to succeed assuming both inputs are "mergeable." Which they are, because we only let "mergeable" files into commits.
Such a framework, by itself, doesn't guarantee that anything good or useful will come out the other end of the process. Garbage In, Garbage Out. What it does guarantee, is that clients doing the same merge, will deterministically generate the same resulting commit. It's up to the designer of each CRDT data-structure to specify a useful merge algorithm for it; and it's up to the developer to define their data in terms of a CRDT data-structure that has the right semantics.
For a codebase, unit tests could be the pre-commit validation algorithm. Then, as authors continue to edit the piece, they both add unit tests, and merge the code. In the face of a merge, the tests could be the deciding factor between what emerges.
Of course, unless you have conflicts in the tests themselves.
So the CRDTs could be applied to a document and an edit/change log to guarantee the consistency of the log and its entries, not necessarily the document itself?
What if the document starts empty and syncing doesn't happen until everyone presses submit? Will it CRDTs produce a valid document? Yes. Will it make any sense? Who knows. I think that's what OP is getting at.
I read it as a question regarding OT vs. CRDTs, which I believe would produce similar results even under heavy concurrency. In terms of larger edits or refactors, you’d probably need to do something else, e.g. lock the document or section, unshare the document, use some sort of higher-level CRDT that ships your changes atomically and forces a manual merge on concurrent edits, etc. None of these necessarily require a central server, though they may require an active session between participants.
I should also note that even if you use regular merge, and the end state of a text document is a complete mess after a refactor + concurrent edits, there’s enough data in the tree to simply pull out any concurrent contributions. They could then be reapplied manually if needed. Perhaps the app could even notice this automatically and provide an optional UI for this process. Similarly, it would be possible for the concurrent editors to remove the refactor edits and thus “fork” their document.
My question was not meant to be OT versus CRDT. Rather, I am questioning expectations at that shared editing use case.
Comparing to git (as others have done) is interesting. The expectation is any merge is manually tested by the user. Such that it is not just the git actions at play, but all support activity. That is, the user flow assumes all intermediate states are touched and verified by a user. Where this is skipped, things increase the risk of being broken. (Is why git bisect often fails projects that don't build every commit.)
Same for games. Some machine gets to set the record straight as to what actually happened. Pretty much always. The faster the path to the authority for every edit, the higher chance of coherence.
With hundreds of authorities, machine or not, this feels intractable.
I think CRDTs could compose nicely with a system that features a central authority or stronger semantics. For example, you could imagine a sequence of vector clocks, each a superset of the last, stored in a separate data structure (and maybe hosted in a central location) that serves as the "main branch". The CRDT part would work as before, but any attempt to "commit" a vector clock that's concurrent with the top of "main" would be rejected. As in your example, every commit would be vetted by a human, and you'd get the best of both worlds.
(But I think this would be unnecessary for most instances of real-time collaboration, since people tend to own and edit small portions of a document instead of jumping around and making drastic revisions all over the place. In fact, it should be very rare for two changes to actually be concurrent, unless an offline version of a document is being merged in. I would agree that "100 people editing the same document offline and merging periodically" is a less than ideal use of CRDTs, but I think they could offer many benefits even in this scenario, especially if paired with a central ledger as described above.)
Ultimately I think the answer is “it depends” but the issue is that there is usually document structure which is mot visible in the data structure itself. For example imagine getting 100 people to fill out a row on a spreadsheet about their preferences for some things or their availability on certain dates. If each person simultaneously tries to fill in the third row of the spreadsheet (after the headings and the author), then a spreadsheet CRDT probably would suck at merging the edits. But if you had a CRDT for the underlying structure of this specific document you could probably merge the changes (eg sort the set of rows alphabetically by name and do something else if multiple documents have rows keyed by the same name).
It depends how big the document is, i.e. what is the density of users per page. If it's a 100 page document and the 100 users are all working on different sections, then it could easily be possible.
I just don't remotely see a use case for this. Real-time human collaboration in general fails at a scale much smaller than this, and not because of the tools available.
JoeDocs[1] could be a useful project to track related to this - the Coronavirus Tech Handbook[2] amongst other collaborative documents is now hosted by their service.
They utilize the same YJS[3] library mentioned in the article this thread discusses, and their GitHub repos include some useful working demonstration application code.
A question I always have is if CDRTs solve some problem with collaborative editing, then can git's merge algorithm be rewritten to use CDRTs and benefit from it somehow?
Somehow I think the answer is no. There is a reason we still have to manually drop down to a diff editor to resolve certain kinds of conflicts after many decades.
I think a better question is “what if merges were more well behaved,” where “well behaved” means they have nice properties like associativity and having the minimal amount of conflict without auto-resolving any cases that should actually be a conflict.
The problem with using a CRDT is the CR part: there are generally merge conflicts in version control for a reason. If your data type isn’t “state of the repo with no conflicts” or “history of the repo and current state with no conflicts” but something like “history of the repo and current state including conflicts from unresolved merges” then maybe that would work but it feels pretty complicated to explain and not very different from regular git. Also note that you need history to correctly merge (if you do a 3-way merge of a history of a file of “add line foo; delete line foo” with a history of “add line foo; delete line foo; add line foo” and common ancestor “add line foo”, you should end with a history equal to the second one I described. But if you only look at the files you will probably end up deleting foo)
It's perfectly CR to encode a merge conflict as another type of data in the CRDT lattice.
For documents, you might represent this as a merged document, with merge conflict markup inside the merged document - similar to using version control and getting a merge conflict.
Also similar to version control, the merge conflict can be a form of data where a valid CRDT operation is for a user to resolve the conflict. When that resolution is merged with other users, unless there's a conflict in the conflict-resolution, everyone sees the merge conflict go away.
Another valid CRDT operation is when a user modifies their version of the document in the region of the merge conflict prior to seeing the conflict, and broadcasts their edit. Then the merge conflict itself would update to absorb the valid change. In some cases, it might even disappear.
In principle, you can build a whole workflow stack on top of this concept, with sign-offs and reviews, just as with version control. I have no idea how well it would behave in practice.
The answer is no, but unlike git, crdts make a choice for you, and all nodes get convergent consistency. The problem heretofore with crdts is that those choices have not been sane. I think there are a recent crop of crdts that are "95% sane" and honestly that's probably good enough. There is an argument that optimal human choices will never be reconciliable with commutativity, which I totally buy, but I think there is also an argument for "let not the perfect be the enemy of the awesome". And having made a choice, even if it's not optimal, is a much firmer ground to build upon than blocking on leaving a merge conflict undecided.
Git mostly treats merging as a line oriented diff problem. Even though you can specify language aware diffing in theory it doesn't seem to buy you much in practice (based on my experience with the C# language-aware diff).
It wouldn't make much sense to me to just plug a text CRDT in place of a standard text diff. CRDTs like automerge are capable of representing more complex tree structures however and if you squint you can sort of imagine a world where merging source code edits was done at something more like the AST level rather than as lines of text.
I've had some ugly merge conflicts that were a mix of actual code changes and formatting changes which git diffs tend not to be much help with. A system that really understood the semantic structure of the code should in theory be able to handle those a lot better.
IDEs have powerful refactoring support these days like renaming class members but source control is ignorant of those things. One can imagine a more integrated system that could understand a rename as a distinct operation and have no trouble merging a rename with an actual code change that touched some code that referenced the renamed thing in many situations. Manual review would probably still be necessary but the automated merge could get it right a much higher percentage of the time.
For specific use cases where the data format is not plain-text and or is formatted as json (e.g. Jupyter notebooks, rich text editing like ProseMirror), I can see CRDTs being used to automatically merge files and retain consistency. Two things though:
1. this doesn't require modifying git's merge algorithm itself; just having custom merge drivers built on top.
2. Is using more space (edit history of CRDTs vs a regular json document) worth it for automatic merging?
Depends on what kind of document we’re talking about I.e. how the grammar captures the domain model. Eg: A shared ledger in the case of digital currencies, or the linux source code being worked on remotely by many people are exactly examples of such documents.
Maybe if your "document" is the Encyclopedia Britannica? Wikipedia has hundreds of editors working at once, but that only really works because it's broken up into millions of smaller parts that don't interact much.
I meant this to be my takeaway. The data structure is nice. And I suspect it is a perfect fit for some use cases. I question the use case of shared editing. Not just the solution, but the use case.
Your key insight, which is spot-on, is that nothing can prevent human-level editing conflicts.
If I was going to take an attempt at justifying the importance of CRDTs, I would say:
CRDTs are the future because they solve digital document-level conflict.
They don't bypass the problem the way that diff/patch/git conflict resolution does, by requiring human intervention.
Instead they truly and utterly obliterate the digital conflict resolution problem: a group of people editing a document can separately lose network connectivity, use different network transports, reconvene as a subgroup of the original editors... and their collective edits will always be resolved automatically by software into a deterministic document that fits within the original schema.
If viable, this has far-reaching implications, particularly related to cloud-based document and sharing systems.
But how do they obliterate it? They just move the authority, no?
That is, say you get a hundred machines editing a document. They split into partitions for a time and eventually reunite to a single one. What sort of coherent and usable data will they make? Without basically electing a leader to reject branches of the edits, sending them back to the machines rejected?
There's no leader node necessarily required; each participant application in the session may have their own local copy of the document, and they apply edits to that using CRDT operations.
It's no doubt possible to construct application that don't behave correctly for certain combinations of edits -- but the datastructures themselves should be robust under any re-combination of the peer group's operations.
Edit / addendum: to phrase this another way and perhaps answer you more clearly: it's a responsibility of the application designer to come up with a document format for their application (and corresponding in-app edit operations) that will tend to result in 'sensible' recombinations under collaborative editing.
My sense so far is that this is the tradeoff; the complexity moves into the document format and edit operations. But that's a (largely) one-off up-front cost, and the infrastructure savings and offline/limited-connectivity collaboration support it affords continue to accrue over the lifetime of the software.
My understanding is that these ensure merges are well formed. Not that they are semantically coherent. There has to be some other semantic check on top of the document to tell that. Anything with a human involved typically gets kicked to the human. Anything else would still need some authority on not just what is semantically accurate, but which edits should be discarded to get to which semantically valid document. Right?
That is, this only works if you can somehow force well formed to be the same as semantically valid. Those are usually not equal.
Sure; let's take the example of a code file, where semantic validity could make the difference between a correct program and one that doesn't compile.
Validity checks could include a syntax parser as a first-pass, followed by unit test as a second pass. If any of the verification steps fail, then the application can display that there is a problem with the file (and hopefully some context).
The authorities in that example are the software that checks the program; it can run locally for each participant in the session, or there could be a central instance that runs them, depending on the preferred architecture for the system.
None of the above necessarily requires discarding edits; but in some cases participants might choose to undo edits that caused problems, in order to get back to a valid document state.
> sequencing the edits of independent actors is likely not something you will solve with a data structure.
Any multiplayer game does this. Git does this as well.
So of course you can do this, it's a matter of how you reconcile conflicts. Real-time interactive games will generally choose a FIFO ordering based on what came into the server's NIC first. Git makes the person pushing the merge reconcile first.
For docs, live editing seems to work the same as in games. Reconciliation for the decentralized workflow will be interesting, but it's just going to be minimizing the hit to a user when their version loses the argument.
But git doesn't do this. It punts to the user pretty quickly. (Even when it performs a merge, it is expected that the user confirmed the merge with a build. That is, git just claims the states combined without stopping in the same lines of the same files. There merge has to be verified by a user, from its perspective.)
Games, similarly, typically have a master state server. Yes, they optimistically show some state locally, but the entire process checkpoints with a central state constantly. (Else you get jerky behaviors pretty quickly as the states diverge more and more in ways that can't be reconciled without committing to a branch over another.)
Edit: that is, I would think the point is to force more, smaller arguments. Anything else puts more at risk as you lose one. Right?
You're agreeing with me on both points wrt git/games...?
The best way to prevent damage from arguments is to avoid them. Like how docs shows cursors so you know to avoid areas that are actively edited. Combined with the tiny incremental changes (often 1 char), users assume that any conflicts are due to each other instead of any distributed inconsistences.
Apologies, I'm meaning my statement to ask how these help. I get that OT needs a central server to be the arbiter if what happened. I don't get how these data structure mitigate that.
Optimistically, I can see how they help. But, pessimistically, it looks like they just make the bad case worse.
You use them when you don't assume to have a central server to do all your reconciliation. When your system is peer-to-peer, needs to be fault tolerant of any node failing, or can have long periods of disconnection. Does that help?
"Twitch plays Google Docs" is always going to be incoherent, for social reasons. CRDTs can make it possible, they can't make it a good idea.
But for a contrived example, a game with hundreds of players, backed by an enormous JSON document, where the game engine is in charge of making sure each move makes sense: A CRDT could enable that, and each player could save a snapshot of the game state as a simple text file, or save the entire history as the whole CRDT.
Or as a less contrived example, instead of a game, it's a chat client, and it provides rich text a la Matrix, but there's no server, it's all resolved with CRDTs and all data is kept client-local for each client.
There are a lot of cool things you can build with a performant CRDT.
No, peer2peer lockstep is the future. No central server, no speed penalty. No storage penalty.
Has been used in RTS games to synchronize 1000s of units across low-bandwidth connections.
Input may be delayed by latency which can be mitigated with client-side prediction. Cosmic bit-shifts & indeterminism can be a challenge in longer sessions but peers can sync with eachother when there is an OOS.
Usually in games you have some sort of mechanism for determining what is the 'truth' in terms of game state. I agree that if everyone is online while editing or only briefly offline then what you suggest would probably be much better. If someone was offline for long periods of time and made extensive edits they would essentially have to be discarded.
I think in practice what you would do (if your use case allowed it) is use CRDTs, but periodically checkpoint and trim them when you know everyone has synced. That gives you very similar properties to the video game world and still has the features of not losing peoples edits when they make them offline.
The title sounds like it could be fanboy clickbait but it’s actually a thoughtful look at how far CRDTs have come from the viewpoint of an expert and skeptic.
It is wonderful to see so much enthusiasm about this technology. I have been working on CRDTs since 2012 and it has been quite a ride.
For those looking for more information, have a look at the information collected at http://crdt.tech/ (Disclaimer: I am involved, though Martin did the bulk load of the work.)
If you are into CRDTs for collaborative gaming, we are looking for partners and investors: https://concordant.io (Disclaimer: I am technical advisor in its team.)
Yes, as does riak. There are plenty of simple crdts and the theory, while recent, has all of it's fundamentals fleshed out. We know what property makes data structures crdts, and how to compose them, and how to prove they are crdts.
Currently we are in the "discovery of new crdts" and "engineering and implementing of older crdts reliably" phase, and in some cases "discovering when not to use crdts".
The crux of the this issue is that crdts that play nice with human expectations in regards to collaborative document editing are not known, possibly excepting automerge (yjs). As it's a 'softer' concept will no good axioms, there is no solid theory on how to combine the theoretical requirements of crdts with human expectations.
CRDTs are hip and cool. But right now I'm trying to find an implementation for desktop software, not some web-framework in-electron. And could not find a concise and correct codebase.
All the implementations are: 1. javascript or 2. dependent on their chosen method of synchronisation or 3. incorrect.
The result of a two week long search is that I'm reimplementing the stuff myself...
Im one of the authors of this. Right now the code is very unstable as we're tracking the performance branch of the JS implementation. Once the JS version hits 1.0 I'll be putting a bunch of effort into making the API cleaner and more rusty and documenting things.
It does work and can actually be used as a backend for the JS implementation if you use the wasm backend we've built. In fact, this is how we have tested it, by compiling to WASM and running the JS test script against it.
I'm working on a project with some offline data synchronization needs, but haven't started implementation yet. I've been following CRDTs with interest. I also saw many of the same downsides mentioned in the OP, e.g. bloat (which apparently are being addressed remarkably well). Beyond OT, another approach I've run across that looks very promising is Differential Synchronization[1] by Neil Fraser. While it also relies on a centralized server, it allows for servers to be chained in such a way that seems to address many of the downsides of OT. I wonder why I rarely ever see Differential Synchronization mentioned here on HN? Is it due to lack of awareness or because of use-case fit issues or some fatal flaw I haven't seen? Or something else?
Has anybody seen any work where CRDTs get insight into conflict resolution using the underlying grammar of whatever text is being written (aka English, Javascript, regex, etc.)? Seems like the conflict resolution could do a better job if it knew the EBNF of the text that was being edited.
Also, any prior art on CRDTs over relational data? I suppose each single field would potentially be an "editable space", but on a long document (say a Javascript code block), updating the whole thing in the db by overwriting it with each edit would not be very efficient. Seems like there could be a datatype that was a little smarter than "text", that could handle more granular updates, that implemented a CRDT index under the hood? I'm working on a VCS for relational data [1] which is more on the not-so-realtime-collaborative-editing end of the spectrum, but would really like to figure out how to handle real-time collaborative editing in this space.
Maybe over WebRTC? I found a little bit of prior art w.r.t. CRDT over WebRTC. [2]
Ahhhh Google Wave. I was an early adopter and shed a tear when it went away. The closest I've felt to that product is Slack but find Slack too noisy. With Wave I felt like I was IN my work not in a "sidebar" application that was pulling my attention from my work. I suppose there were so many ways to use Wave and so many ways to use Slack that your experience could be completely different than mine. But RIP Google Wave.
Nobody uses email anymore! It's a last resort. If properly nurtured, Google Wave easily could have become Slack and more. It was pointing in that direction.
Yes, nobody uses email anymore. That's why 99% of websites have an email signup blocker for discounts, or a newsletter, or just for the heck of it. But they do want that email address.
CRDTs seem very promising, but we still have a long way to go. The most exciting work in this area is being done by Ink&Switch [0]. They have a number of interesting real-world app prototypes based on CRDTs.
- An interesting case where CRDTs failed is Xi-editor, where they tried to use CRDTs as the basis for a plugin system [1,2].
- One of the biggest problems with CRDTs is the overhead needed to keep track of the full document history. The automerge [3] project has been working on efficient compression of CRDTs for JSON datatypes.
- The idea of monotonic updates is really appealing at first, but I was disappointed when I realized there's no good solution to handle deletions. Tombstones, to me, seem like kind of a hack, albeit a necessary one. Practically, CRDTs aren't the silver bullet they might seem like at first.
- Another lesson learned is that when ten people are editing the same paragraph, there's not really a right answer. I think the key to implementing CRDTs is doing it at the correct level of granularity.
- ProseMirror intentionally chose NOT to use CRDTs [4].
The nice thing about CRDTs is that each individual message can be end-to-end encrypted (like WhatsApp messages), and then re-merged by all the clients locally.
A local-first database with such an encrypted sync property would would be amazing for building lots of apps with the ability to sync data between users or between your devices seamlessly. The challenge I ran into my initial experiments is that CRDTs need to be compacted/merged in various ways to stay efficient, but encryption gets in the way of that a little when considering server backups / high availability.
Following the golden rule, I always post a link to a series of papers comparing the theoretical properties of CRDTs and OT – here's the latest one:
Real Differences between OT and CRDT under a General Transformation Framework for Consistency Maintenance in Co-Editors – https://arxiv.org/abs/1905.01518
Proceedings of the ACM on Human-Computer Interaction 2020
Chengzheng Sun, David Sun, Agustina Ng, Weiwei Cai, Bryden Cho
I wonder why OT is restricted to a central server. In 2016/2017 I wrote a Progressive Web App (PWA) for myself which uses an algorithm which probably fits the category of OT. It uses a WebDAV server for synchronization between devices. Yes, this is a centralized server, but when some super slow & dumb WebDAV server can serve this purpose, it should probably be possible to build it on top of S3, a blockchain or something federated.
My biggest issues at the time were around CORS as with a PWA you can't simply use every server the user enters, as the same-origin-policy keeps getting in your way.
Any time you see mention of "a server", you can in fact replace it with "a synchronised collection of servers acting as one".
OT involves logic on the server, not just storage, so it's not really OT as generally meant, if using S3 and a collection of peer clients only running client logic. S3 doesn't have enough interesting logic for that.
However, if you try the thought experiment of stretching "a synchronised collection of servers" to be all of the peers, no S3 even required, and then do OT with that, you can!
The result behaves exactly like OT in terms of things like document editing, conflict resolution and history garbage collection, rather than behaving like a CRDT.
It has different timing and reliability characteristics from single-server OT though. Those characteristics depend on how the virtual server is implemented on top of the synchronised peers, and especially how it implements peers coming and going.
If that sounds like OT-on-p2p has similarities to CRDT-on-p2p - they do, and they are not the same. Roughly speaking, CRDT-on-p2p has lower latency relaying updates between interested peers, because it doesn't need to globally synchronise. However with some fancy techniques you can make OT-on-p2p fast most of the time as well, and retain some of the OT benefits.
Those two behave differently but there are some common characteristics. Once you have the cluster idea, it's not out of the question to mix and match bits of OT, CRDT and other kinds of Distributed Transaction on a per-operation basis for different ops on the same data, depending on the characteristics you want them to have.
There are many trade-offs in the characteristics.
If you squint a lot, that's sort of, kind of, what realtime network games do implicitly, without a clear theory underlying it. They also add predictions and interpolations.
1. Every time the user changes something it writes the change to a journal and
2. executes the change on a local cache (to update the UI).
3. Then it starts a background sync by fetching the latest version from the server
4. executes the change on the fresh data from the server
5. uploads the transformed data with an etag to avoid overwriting parallel changes from other clients and
6. removes the change from the journal (and updates the local cache) if everything worked just fine.
So you could argue that using the etag is some kind of logic, but I think that is not what you mean with 'involves logic on the server'.
This implementation certainly doesn't work for all use-cases (e.g. high throughput/low latency), but given that it enables even offline-scenarios, I think it isn't that bad either.
A problem w/ e.g. CRDT datasync in web apps is data security, HTTP resources impose control points where you know "why" the client is asking for e.g. this chunk of social graph, it's /profile/friendlist so the UI can ask for a very controlled and tightly specified data projection for that particular UI and consumed by tightly controlled javascript. Datasync is NOT for scraper bots, arbitrary read patterns or any notion of general access.
To some extent, I see the value in "don't listen to predictions from people with a record of inaccurate predictions".
On the other hand, people who make a lot of inaccurate predictions are, at least, taking a chance by saying out loud what they are thinking. They know their predictions might be wrong, and hopefully they learn and update their methods along the way. It is a nice way to grow.
Back in my Austin road cycling days, there was an amazingly talented junior rider who had a reputation for crashing hard and often. Some of us wondered how many pieces his frame could be split into. He survived the crashes, best I can tell. I'm pretty sure his confidence, resilience, and bike handling skills got better with every crash. Pushing the limits and being wrong in these kinds of situations may not be everyone's style, but it can work.
Operational Transformation and Conflict-Free Replicated Datatypes are very different from each other.
As the author explains, OT relies on some ordering of system events, and CRDTs don't. That means CRDTs need to be commutative (and probably associative), and OT doesn't.
So, OT is less scalable but more powerful, and CRDTs are more scalable but less powerful (in theory).
It's sort of like comparing Paxos/Raft to Bittorrent.
Great summary. CRDTs are a better fit for generalized data. Having previously worked on an OT system, the central server stickiness and merge complexity simply did not scale. There are trade-offs with CRDTs, especially metadata, but as the post mentions compression techniques are far more solvable in real-world scenarios than a fundamental performance bottleneck at the core.
Collaboration is the killer app of the next gen of software.
What we need is a community set of peer relay nodes, on top of which data structures can be synced. Infra companies are well setup to provide this firebase like store layer on top, but for any generic data structures (lists, dicts, array, etc)
With this, any saas application is at a disadvantage, because data is no longer tied to the application!
I think CRDTs will play a bigger role in future. We built a multi-model (KV, Docs, Graphs, Search) realtime database service leveraging CRDTs underneath. If anyone interested, they can read more about the tech here...https://www.macrometa.co/technology
So CRDTs are the future, but what about today for real, production products? I'm just about to really dive into collaborative editing features for our product, and OT still seems to me to be a much safer bet unless you're dealing with a more obscure environment.
Yes, starting with OT looks easy. You can make 99% work in almost no time. But the last 1% will bite you in the rear really hard...
Actually, CRDT is not a single data structure or even algorithm. It is a term for several families of data structures and different algorithms on them. If your task is not editing text, you may find a simple and already implemented CRDT for your case.
"It was a general purpose medium (like paper). Unlike a lot of other tools, it doesn’t force you into its own workflow. You could use it to do anything from plan holidays, make a wiki, play D&D with your friends, schedule a meeting, etc."
The trouble with comments like this is that they make discussions shallower and more generic, which makes for worse threads [1]. Actually it's not so much the comment as the upvotes, but shallow-generic-indignant comments routinely attract upvotes, so alas it amounts to the same thing.
The most recent guideline we added says: "Please don't complain about website formatting, back-button breakage, and similar annoyances. They're too common to be interesting." I suppose that complaints about writing style fall under the same umbrella.
Not that these things don't matter (when helping people with their pieces for HN I always tell them to define jargon at point of introduction), but they matter much less than the overall specific topic and much less than the attention they end up getting. So they're basically like weeds that grow and choke out the flowers.
(This is not a personal criticism—of course you didn't mean to have this effect.)
I don't understand why you think that writing on a very technical subject needs to build you a ladder to climb on as a prerequisite. There is a link to a very high quality talk right at the top of the article for folks who wanted to dive deeper that specifically makes that effort.
I found the article quite good, and if you had genuinely been motivated to engage with the content you could have highlighted the acronym and searched for it.
There is a wealth of good info for "CRDTs" that comes up on the first page of Google, Bing or DDG.
Does the acronym actually illuminate what they are or how they function? I submit to you that it probably doesn't.
There are practices for informative writing that have been developed over decades and decades that recommend, among other things, defining initialisms on first use.
I read hundreds of pages of such writing a week and I can assure you that if this is a practice, it is not well observed by academics or engineers in the field.
As I tirelessly mention whenever this comes up on HN, which is often: we have a specific technology that is designed precisely for this situation.
It's called the link. All an author has to do is link the first instance of an acronym or piece of jargon to some authoritative description, and you get the best of both worlds: readers familiar with e.g. CRDTs[0] can just keep reading, and the rest can click the link and find out.
Understood. It's a bit like the Google Assistant saying, 'Stand aside, I don't need the instructions. I'm being helpful.' And then doing the wrong thing. It should just RTFM.
Relevant to the parent here, I prefer to use a link because it is more commonly expected and it gives me the opportunity to refer to a quality source for an in-depth explanation of what I'm referencing.
It is especially useful when writing a technical document that utilizes multiple products/stacks/terms. Creating links to quality sources for those items gives someone new to the content a good source to go deeper into those pieces while allowing me to focus the article on the specific aspect I'm writing about.
A lot of people seem to be questioning why you'd need this when you could just provide a full link. Personally I read a lot of technical documentation and having acronyms written out in full would almost always be enough. Otherwise the whole document is likely over my head, or it's just a bad acronym.
We also have select-contextmenu-search on both desktop and mobile, for any word or acronym. Links are nice for disambiguation or to point to a recommended resource, but they're hardly essential, nor are in-line expansions or definitions.
That's still lazy writing. Every blog should be written with the assumption it will be encountered by a non-specialist. Expanding abbreviations on first use and offering a brief explanation of jargon is enough to let these readers know if the article is something they are interested in.
Every blog? That's silly. People are allowed to have conversations about niche topics that you are not familiar with. You aren't the audience of every blog.
People are allowed and encouraged to speak freely about anything on the internet, but people seem to forget that this is the world wide web, and writers can't control who in the world shows up to their blog or site. With a little help, someone who might not be in the core audience, might actually enjoy or learn something. If everything is written with jargon and abbreviations with no context, it's really just lazy inconsiderate writing.
It never ceases to amaze me how many websites for restaurants or whatever neglect to mention basic things like what state (and country) they're in. Even newspaper web sites assume that we know that the "Chronicle" or the "Ledger" or whatever generic name is the local paper for East Bumblefuck.
That's true, but most people underestimate how opaque their writing can be even to other experts. It doesn't mean you have to explain every piece of jargon, but you can often greatly improve the clarity of your writing, including for expert readers, by targeting at least a few levels of expertise below where you think your audience is. We all have gaps in our knowledge that will seem basic or obvious to others, no matter how expert we are in a topic.
Yep. Its the paradox that the more you understand something, the harder it is to teach it because its more work to empathise with people who don't know the concept.
Anyway, blog author here - sorry I didn't explain CRDTs earlier in the piece. It didn't occur to me that people would be confused.
This is what introductory paragraphs/sections/chapters are for. Someone already well acquainted with the subject matter can quickly skim through them, while others less familiar with it get a quick catch-up.
Just like libraries, sometimes it is and sometimes it isn't the best approach.
For example, in a "How to do $BASIC_THING in python" article, putting an intro of "This is what a variable is" may not be a bad idea. Meanwhile, in a "Writing an operating system from scratch in an esolang I wrote" article, maybe you'd be better off linking to previous blog posts or other resources.
Obviously these are both extreme examples, but I think it's still a valid view.
While I agree that reading the title was confusing ( as I am not familiar with CRDT ), I think the writing style was actually very good.
I read the title, wondered what CRDT was, and started reading. In the back of my mind I was wondering what CRDT was, but reading the article felt like I was going on a journey. Every term that needed to be defined was defined. Finally, when CRDT was mentioned in the article, it was immediately defined.
I generally agree that throwing acronyms around without defining them is not fair to the reader, but I don't think this article did that at all.
Yup, strong agree. The article did a great job of capturing the "story" of the competing approaches really well, I didn't even mind that the acronym wasn't explained until later.
This is called "burying the lede", where the newsworthy portion is buried somewhere later instead of being mentioned upfront. It's best not to do this, since not all readers will read two thirds of a story in order to determine the subject.
I don't think this is a good example of burying the lede. If I wanted to bury the lede on this post, I'd do this:
> I've spent the last decade working on OT, and have always thought it was the right way to implement a collaborative editor. Then something amazing happened.
Instead, we get this:
> I saw Martin Kleppmann’s talk a few weeks ago about CRDTs, and I felt a deep sense of despair. Maybe all the work I’ve been doing for the past decade won’t be part of the future after all, because Martin’s work on CRDTs will supersede it. Its really good.
That seems like the opposite of burying the lede. The main point of the story is _not_ that CRDT stands for Conflict-free Replicated Data Type, it's that the author now favors CRDTs over OT for collaborative editors.
It's a quibble to say that the undefined term CRDT is part of the lede or the the lede itself, since people who do not know the meaning of the acrynym need to read a significant part of the story to be told the definition.
That can be seen by glancing at the comments on this page.
I disagree that it is a quibble. Not having context is different than being strung along to increase engagement metrics. I didn't know what CRDT was either, but I spent 10-15 seconds searching it and then read the rest of the post, which I thought was otherwise extremely well done. I actually subscribed to Joseph's RSS feed because I was so pleased with the quality of the writing on the blog.
I do agree that engaging in this sort of disagreement on HN is low-yield, however.
I've seen this writing tactic become more and more common over the years. It shows disrespect for your audience, and tends to play well only when "preaching to the choir".
Whenever I see this writing style, such that I cannot find a thesis in the first two paragraphs, I almost universally discard the writing as a waste of time.
Seriously. It needs to be explained the FIRST TIME it appears, and it shouldn't be abbreviated in the title. I read for a minute thinking he was talking about Chrome Remote Desktop (which is what CRDT means to me).
Mods, can we expand the acronym in the title of this submission please?
If I didn't use Chrome Remote desktop every day then this definitely would've been my initial assumption as well. Acronym for an obsolete technology but with an added "D"? We've gone digital now, baby!
I thought the author did an amazing job of discussing a highly technical topic in a very approachable way. Every blog on HN should aspire to write like this! It was so good it got me reading other posts even.
Yes, it would have been nice for us non- domain experts if the author had done the classic "Conflict-free replicated data type (CRDT)" thing, but you can easily just say that, ya know? "Hey, it would be helpful if you expanded CRDT early on."
Agreed, 29 mentions of the acronym 'CRDT' and I had no idea what it was until I had to break my reading flow and google it, it sounded like buzzword soup to me.
Engineers, when talking about technical concepts with acronyms, always expand them for the first time to your readers!
I’m sure a casual, non-technical reader of Hacker News would be unaware of most of the headlines here. Google is your friend, and CRDTs are part of the language of distributed systems. To some degree, one has to help themselves.
Given that the author defines CRDT (conflict-free replicated data type) a few paragraphs in, it might have been accidental. The author might have re-ordered a few of the paragraphs during editing.
I strongly disagree, that forces author to spend extra time on explaining everything. That's why it's often so hard for me to find quality in-depth advanced blogs on various technologies and fields -- because they all tend to be really introductory. So there's either papers or tutorials, but nothing in-between. E.g. a different-angle explanation of the same thing, or comparison with another tech who came from that.
In contrast, I like way more a different approach on explaining (mostly see it on Cyrillic forums) -- instead of guiding you by hand, they just give you clues where to look for. That way, knowledge givers are way more approachable, because it costs them very little to chat back something like "look for CRDT", than go into in-depth explaining. In the end -- there's way more information, and from top experts in the fields.
Thanks. I expanded the acronym it when it became relevant to the story. But judging by the comments here, lots of folks were distracted and frustrated that they didn't know what the acronym meant earlier.
Anyway, I've updated the introductory paragraph to make it more clear.
Is it really that hard to google? If you're trying to learn about a subject it can get annoying to repeatedly have to jump to the meat of the article or fast forward if you're watching a video.
Obviously googling isn't hard, but having to google what could be easily explained in the text breaks one's concentration, something that is critical for most readers.
A single quick search and you're good to go. Besides, if you need to look it up then you're probably better off reading a quick summary anyways.
Spelling out "Conflict-free replicated data type" doesn't really help beginners all that much and non-beginners will just use "CRDT" anyways.
We don't need every article about the web to spell out HTTP right? I don't get why the author is getting beat up just because his free content isn't convenient enough.
If the article is titled "HTTP is the future" yes, I think unpacking the acronym is appropriate. Also, he's not getting "beat up", it's just a mild criticism regarding how the article was written, it's not that big of a deal.
We went with OT for our real-time syncing of edits in 2010 and a decade later, we are still sticking with OT for reasons I already stated sometime back - https://news.ycombinator.com/item?id=24186883
However, in the spirit of "There are no solutions, only trade-offs" CRDTs are absolutely necessary for certain type of syncing - like syncing a set of database nodes.
But for systems which already mandate a central server (SaaS/Cloud) and especially for a complex problem like rich-text editing (i.e semantic trees) I still think OT provides better trade-offs than CRDT.
I respect Joseph's conviction on CRDTs being the future, so I guess we'll figure this out sometime soon.