Hacker News new | more | comments | ask | show | jobs | submit login
The Xi Text Engine CRDT (github.com)
235 points by jxub 6 months ago | hide | past | web | favorite | 39 comments

If anyone’s interested, I wrote a long article a few months ago on (what I believe to be) the ultimate string CRDT, among other things: http://archagon.net/blog/2018/03/24/data-laced-with-history/

I really, really love the work that Levien and Hume have put into the Xi CRDT, and how open they’ve been with their research. IMHO, though, it’s a bit too “academic” and clever for any sort of general purpose use, though it may well be ideally suited for its stated purpose. There even appears to be a limit to the amount of rewinding you can do, which points to O(n^2) complexity for conflict resolution: a huge problem for offline-first development, where concurrent timelines don’t really have any limit on the length of their diverging histories. (I might be mistaken about this—please correct me if I’m wrong.)

Still a very satisfying algorithm, though! You can see more of Raph Levien’s thinking here: https://medium.com/@raphlinus/towards-a-unified-theory-of-op...

The current implementation definitely has some parts with suboptimal complexity, but at the time of design we had plans to address all of them. I just didn't have sufficient time (or productivity?) in my internship to implement them all. For example a lot of the things that are currently lists can be replaced with ropes of the same type Xi uses for text.

This is pretty impressive work for an undergrad internship.


> offline-first development, where concurrent timelines don’t really have any limit on the length of their diverging histories

Is there really great utility in automatically merging wildly divergent documents? Being able to send your queued changes and agree on a new state of the world is lovely, but for most of the circumstances I can imagine the ideal new "state of the world" is "please fix this merge conflict", not "here's my best guess at a reasonable state based on the semantics of some data structure."

Maybe it's domain specific. Maybe in code you want merge conflicts and in some kinds of rich text docs you just want a best-guess auto-merge. Even then, though, I think it would depend a lot on what the document is for. Maybe blog posts and company culture docs should get auto-merge and legal contracts should get merge conflicts.

Yes, there is great utility in this approach! For one, most concurrent text editing, rich or not, is actually quite easy to merge in 99% of cases. But even that aside, when you have data structures or document formats that don't ever have to resolve conflicts—or, more accurately, when conflict resolution becomes something that's naturally done in app-space, on the edges, and not as part of the network layer—then you open up the possibility of offline-capable, collaborative apps that require zero coordination between nodes. A node with changes only has to deliver their content to interested servers and peers, not engage in lengthy conversations about who has the definitive version, whether somebody has to revert, etc. You could build a Google Docs clone that synced entirely over Dropbox!

And in any case, the "merge" part would still be up to the app. The data structures will always be able to merge, sure, but the app could still throw up a dialog box on wildly divergent changes saying, "Hey, some crazy stuff happened here—want to look it over and commit manually?" (Where "commit" means generating new changes to overwrite any undesired changes, since it's not possible to literally undo.) You could argue that we've simply implemented ad-hoc coordination protocols in the app layer at that point, but IMO, that's exactly where they belong in cases where there's no authoritative store for your data. (Which is really the case with mobile, offline-first development.)

I agree with this. The correct way of resolving conflicts is different depending on both the semantics of the document and the context in which the conflict arose.

For example, I think the CRDT approach is probably applicable to most examples of interactive editing, where incorrect conflict resolutions will happen near a point of focus for one of the editors and can be quickly and easily resolved.

If you have wildly divergent documents however (again depending on the document type itself), this is not really the same case, and it's much more likely that a human will need to create a resolved document by consulting the two parent documents.

Lots of people collaborate by emailing files to each other, and do version control by saving under a different filename every time. Using CRDTs as an on-disk file format can make these workflows less annoying, by working with changes instead of differences: the same advantage that Git's 3-way merge algorithm has over Subversion's 2-way merge.

> IMHO, though, it’s a bit too “academic” and clever for any sort of general purpose use, though it may well be ideally suited for its stated purpose

Exactly my thoughts. Looks like all the improvements in the OT/CRDT academics - is only useful to make plain string editors (like code editors and plain text editors). I maybe wrong or in a completely different page even.

I'd like to see these improved algorithms applied to full featured rich text editors, with styles, tables, TOC sections, etc. They have a completely different set of challenges and I'd like to see how these improvements cater to those.

For example the non-code editors like Google Docs & Zoho Writer still use a decade old OT algorithms and seem to do just fine.

Not a skepticist here. Just curious how CRDT and the latest collaboration techniques will improve making rich text editors (I'm in the business of making one)

Apple’s Notes.app uses CvRDTs (I know this from figuring out the data format). For attributed text, they’re using something homegrown that they’re calling “topotext”. It’s somewhat similar to RGA/Causal tree, but they’re maintaining a digraph (unnecessarily?) and I suspect the conflict resolution isn’t quite right (doesn’t seem to do newest first). But for the sake of discussion, it’s a list of characters with lamport timestamps and a tombstone flag.

On top of this each character has a separate lamport time stamp for the attributes. This appears to act as a LWW register, per character. (So a conflicting attribute mutation would pick one users state and an insert conflicting with a embolding a range would not pick up the bold, but close enough?)

(On disk, runs of increasing clock are stored as one node, with a length, and the resulting text and attribute runs are stored separately from the CRDT data.)

A table is treated as an attachment in the main text (attributes on a placeholder character point to it). It is also encoded as a CRDT. (They compose CRDTs for maps, registers, etc here.) It is modeled as an ordered set of row ids, an ordered set of column ids, and a map of column id -> row id -> topotext. This is needed to preserve the semantics of adding/removing/ordering columns and rows in the face of conflicts.

Drawings are also stored as a CRDT.

(Sorry if this is terse or confusing, I’m typing on mobile.)

Fascinating! I've always wondered about this. Out of curiosity, how did you work this out (and why)?

Initially I wanted to figure out how to export my notes. I don't like to use software if I can't export my data. Then I saw strings in the table data mentioning CRDT and was curious. (I like to take things apart and see how they work.)

The data is protobuf encoded, gzipped, and stored in a sqlite database. I wrote some python code to help me work out the protobuf schema. Then I observed how the data changed as I tweaked the notes to assign names to the fields. (After I figured it out, I learned that the protobuf schema was being sent to the client in the web app, but it was a good exercise anyway.)

To fill in a couple of details, I took a look at some structures via classdump and the disassembled code in Hopper. (e.g. the point representation for the drawings.)

That's very interesting. Thanks!

Hey I just wanted to say thanks for this article. I'm interested in CRDTs but the papers are a little over my head, so it's nice to be able to step through your examples in detail.

Glad you found it useful!

Another explanation on CRDTs with a broader scope, can be found here: https://github.com/ljwagerfield/crdt. There's also this conference talk by Arnout Engelen: https://www.youtube.com/watch?v=9xFfOhasiOE

Previous discussions about the Xi editor and rope science:

https://news.ycombinator.com/item?id=17109930 (3 months ago, 47 comments)

https://news.ycombinator.com/item?id=16267202 (6 months ago, 295 comments)

https://news.ycombinator.com/item?id=14129543 (a year ago, 60 comments)

https://news.ycombinator.com/item?id=11576527 (2 years ago, 177 comments)

Also in this space, some Atom devs are working on a new editor/engine in Rust and have recently shifted focus to CRDT as a way to get collaborative editing and advanced SCM-like scenarios.

The editor is called 'xray' and the CRDT tech is Eon. They have some info here and you can find more in the repo/branches. https://github.com/atom/xray/blob/master/docs/updates/2018_0...

If you're into the client/server model of Xi, xray is targetting the same, including an in-browser experience connecting to a remote backed. Similarly, there is Theia-IDE which actually seems the most advanced in terms of a functional in-browser editor with a client/server model.

I think these tools are going to enable entire new generations of programmers on super low end hardware where their editor services and toolchains are running in a remote DC.

There are others in this space with similar tech, but most seem focused on very specific niches and use cases. If there are other softwares that hit the collaborative editing, CRDT, and in-browser experience points, I'd love to hear about them.

Hmm, what does low-end hardware have to do with it? I can’t think of any challenge in text editing that requires beefy hardware.

Coordinating multiple editors is tricky, yes, but it doesn’t need fast hardware, just good software and ideally a reliable network.

Editing text on a phone is hard, but that’s a UI problem -- it’s the small screen and lack of a keyboard. Most phones these days have very capable CPUs and plenty of memory.

(I agree that this technology is very cool, though! I’m just curious why you pick out that low-end use case.)

The point is that the frontend is a "thin" JavaScript UI rendered in the browser while the entire real dev environment is remote -- LSP (language sever protocol plugins aka autocomplete, semanic highlighting, nav to reference/definition, etc), other plugins, the project's toolchain, etc, are running in a container/pod on a beefy machine.

Theia (and GitPod.io) will give you this today and it is compelling. GitPod gives you a single button on PRs/Issues that drops you in a dev environment, ready to build and test at the click of a button. No cloning, no installing a toolchain, etc.

If Rust and Rust Language Server are running in a container with Theia, this means I can use a Chromebook-style device for serious, real development without having to enable dev mode or even Linux apps. Every machine in the world becomes a real potential development environment.

Theia even has (or is about to have) debug protocol support too. A real, full IDE running on a remote DC, accessible from your browser. ( If you follow what the Theia and Che devs are doing, they're trying to support the full VS Code API... Which is SUPER exciting!)

(Note, with the level I'm speaking about here, the CRDT is a bit of an implementation detail, but it's useful for collaborative editing and in xrays case, syncing state b/w the browser client and the backend and the underlying SCM system.)

I agree with this, but one thing they might have been going for is editing code that you then compile and run on a beefier remote server. Although you don't need a full CRDT for that, it can do that.

> I can’t think of any challenge in text editing that requires beefy hardware.

The expectations of advanced text editors have expanded to create performance concerns which do not apply to the most basic text editor. It's basically what happens when somebody tries to elevate a vulgar art by doing it big: like making the world's largest macaroni sculpture, anything can be done big enough to meet limitations. In addition to this, Atom also has the problems of being an Electron app.

> I think these tools are going to enable entire new generations of programmers on super low end hardware where their editor services and toolchains are running in a remote DC.

This sounds awfully dystopic to me.

Why is that? Did I say everyone had to adopt it? You don't think there's value in someone in a low-income situation being able to experiment with and use a full development experience without having to upfront invest in computing power?

Have you tried any of this tooling? It's hard to notice that I'm not in VS Code at times. I'd be curious what you find so dystopic.

> Why is that? Did I say everyone had to adopt it?

Given the overall trend towards SaaS I expect tooling vendors to try to enforce this whenever they think they can get away with it. By working on or using these projects you're giving them more ammunition, even if your intentions are to just add another option.

> You don't think there's value in someone in a low-income situation being able to experiment with and use a full development experience without having to upfront invest in computing power?

And it's better that they get stuck paying rent to some cloud vendor just to access their editor?

I'm all for making stuff more accessible, but this is the same nonsense as FB's "Free Basics".

I don't get it. Theia runs on my home machine, my home cluster, or runs standalone just like any other editor. Everything I've mentioned are fully OSS projects (edit: GitPod has a bit of secret sauce for the workspace functionality, but Che has that, too). If anything, my goal in all of my software choices is user empowerment and I refuse to build or use anything proprietary or tied to a single cloud. My only exceptions are Plex (I'm working on replacing it) and Windows/games on a single machine.

There's no reason that a browser based editor has to mean lock-in, and the current landscape doesn't support such gloom, in my opinion.

Does anyone know if there has been any work to apply CRDTs for collaborative editing of source code AST, especially for something more complex than sexprs? I imagine that could be neat, but also have some pitfalls.

Yes definitely. I'm currently doing research in this field. And there are others doing research in this too (http://www.expressionsofchange.org)

I enjoyed the post, but there's something that bothers me about using the name CRDT.

A CRDT is just what mathematicians (and functional programmers) call a Semilattice[1], right? In general, I find it frustrating when people make up new names for existing mathematical concepts because it deprives others from learning and seeing the big picture. Does this resonate with anyone here?

CRDTs and semilattices are definitely connected, but they're not quite the same thing. A CRDT is a data structure that can be merged across the network in a way that's commutative, associative, and idempotent. That much is exactly the same as a semilattice.

But when people talk about "string CRDTs", the underlying semilattice, the data structure that gets merged, isn't a string; it's usually something rather more complicated. Then there's a _function_ which interprets that more complex data structure as the string that the user or application really cares about.

So a CRDT is a semilattice equipped with an interpretation function.

But it gets even more complicated, because there are many ways to do CRDTs in practice. Rather than gossiping your _entire_ complex data structure across the network ("state-based CRDTs"), usually CRDTs try to only send what's necessary. This leads to optimisations like delta-based and operation-based CRDTs. These optimisations are crucial for real-world use of CRDTs, but their connection to semilattice theory is not immediately clear to me. (That doesn't mean there isn't one, though!)

In any case, the story is a little more complicated than "CRDTS are just a semilattice". I do wish more people knew about the connection, though.

While JSON isn't perfect, I hope they base this around the JSON CRDT[0] so it can handle nested structures well.

Also they might not need tombstones[1].

[0]: https://news.ycombinator.com/item?id=12303100

[1]: https://news.ycombinator.com/item?id=12303467

This is an old link, but I could find https://github.com/google/xi-editor/blob/master/docs/docs/cr... on master.

I wonder when the Xi editor will be ready to use. I can't wait.

Why's it under google's namespace in github? Was it there always?

Google allows it's employees to put their 20% projects on the google github, provided they a disclaimer in the README. It doesn't mean much but perhaps it's good for publicity. Xi has always been in there.

How does Xi editor compares to Vi and Emacs?

Vi and Emacs are older projects with a larger community, and thus there's a ton of plugins out there even for niche applications. Xi is much newer, and attempts to solve many of the problems that Vi and Emacs face (examples: in Vi's case the difficulty of writing extensions, and in Emacs' the difficulty in maintaining and extending the core of the editor written in C). Xi also focuses in areas that weren't considered that important when Emacs/Vim were first developed, such as asynchronous operations and collaborative editing.

Xi's frontends are all too immature at this stage for day-to-day work.

This editor is probably gonna be banned in China if it got noticed by the government. Hell they banned Winnie the Pooh

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact