Hacker News new | past | comments | ask | show | jobs | submit login

I don't have much experience in this area, but I'd be interested in an overview of how different pieces of sofware handle the concurrent / multiplayer editing problem, like:

- Etherpad

- Google docs

- Apache / Google Wave (open sourced: http://incubator.apache.org/projects/wave.html)

- repl.it https://repl.it/site/blog/multi

- figma https://www.figma.com/blog/multiplayer-editing-in-figma/ (image editing rather than text editing)

So is the gist of it that OT relies on central servers and they all use OT rather than CRDT? That was not entirely clear to me.

Looking at the xi analysis, it sounds like "IME" is specific to a desktop application using X (?), so it doesn't apply to any of this web software.

The rest of them seem like they do apply?

Is the problem that you have to pay a "CRDT tax" for every piece of state in the application? I thought the same was true of OT. Don't you have to express every piece of state within those constraints too?

repl.it doesn't seem to have a problem with syntax highlighting or brace matching (try it, it's pretty slick). So is it just that they paid that tax with a lot of code or is there something else fundamentally different about xi vs. repl.it ? Or maybe xi is going for a lot lower latency than web-based editors?

recent thread about OT vs. CRDT that might be interesting: https://news.ycombinator.com/item?id=18191867

TL;DR CRDT is completely irrelevant to any of the highlighting/etc stuff

Most highlighters are lexers. Advanced highlighters/folders are parsers.

The lexing/parsing that is required for highlighting is easy to make incremental for all sane programming languages.

for LL(star) grammars, adding incrementality is completely trivial (i sent patches to ANTLR4 to do this)

for LR(k) grammars, it's more annoying but possible (tree-sitter does this)

For lexing, doing it optimally is annoying, doing it near optimally is very easy.

optimal incremental lexing requires tracking, on a per token basis, how far ahead in the character stream the recognizer looked (easy), and computing the affected sets (annoying)

Most real programming languages have a lookahead of 1 or 2. Near-optimally requires tracking only the max lookahead used, and assuming all tokens need that max-lookahead. In a world where min token length is 1, that means you only need to re-lex an additional (max lookahead) tokens before the changed range. In a world where min token length is 0, it's all zero length tokens + (max lookahead) tokens before the changed range. This does not require any explicit per-token computation.

Again for basically all programming languages, this ends up re-lexing 1 or 2 more tokens total than strictly necessary.

Tree-sitter does context-aware on-demand lexing. i have patches on the way for ANTLR to do the same.

The only thing CRDT helps with in this equation at all is knowing what changed and producing the sequence of tree-edits for the lexer/parser.

The lexer only cares about knowing what character ranges changed, which does not require CRDT. The typical model for this kind of edit is what vscode's document text changes provide (IE For a given text edit, old start , old end, new start , new end)

The parser only cares about what token ranges changed, which does not require CRDT.

Oh wow, cool!

I made a simple proof-of-concept realtime PEG parser a couple years ago, which ingests text OT/CRDT operations ("insert at position X", etc) and invalidates & recalculates the compiler output by invalidating all overlapping ranges and recalculating from the root. My implementation is way slower than I expected it to be - though I'm sure you could use a lot of tricks from well optimized parsers to speed it up. I agree - CRDT / OT / whatever is sort of orthogonal, although its really nice being able to feed the parser with the same operation format, and have it update its output.

https://home.seph.codes/public/miniohm/ / https://github.com/josephg/miniohm if you're curious.

I'd love to see this sort of thing applied at a larger scale in a compiler. For example, I could imagine editing a C program with a compiler running live. Instead of batch compiling artifacts to disk like its 1970, instead as I type each character the compiler recompiles just the parts of my program that could be affected by my change. The compiler passes code changes to the linker (maybe with a function level granularity). And the linker could then (live) manage writes to an executable, allocating new byte ranges for changed functions and updating references on the fly.

Even on very large C++ project like Chrome, there's no reason why incremental updates should take more than 1-2ms. If all you're doing is changing a single function in the binary file, why do our linkers rewrite the whole thing?

> Instead of batch compiling artifacts to disk like its 1970 [...]

I am delighted to see people discussing this. Batch orientation made sense in a very resource-constrained era. But at this point we have more RAM and CPU than we know what to do with. It seems so obvious to me that the correct solution is to prioritize developer experience and keep everything hot.

My single biggest barrier to developing faster is the latency between making a change and seeing what that does to the test results. I would like that to be well under a second. Given the cost of programmer time, I could happily throw money at RAM and CPU. But standard, batch-oriented tools won't take advantage of it 99% of the time.

I think there's a revolution in developer experience out there, and I really want to see it happen.

The perfect place to start would be support in LLVM backends, maybe they are already working on this.

I don’t know that that’s true. I doubt you could do this sort of redesign simply in an LLVM backend because a “hot” compiler would work differently at so many levels of abstraction. I think an easier place to start would be a self contained compiler with a little less scope than llvm. There we could figure out the patterns, find a clean set of internal APIs and make some sweet demos. And then with that approach the same task in LLVM.

I have done exactly this in the Markdown editor I developed: Stylo: https://www.textually.net if you're curious.

I used what I call "partial compilation" for each edit made in the Markdown editor that "re-compiles" only affected parts of the text. A function is responsible for computing the ranges to be recompiled based on the edits made in the text. In Stylo it was absolutely essential since CSS is used for syntax highlighting and a complete DOM is kept in memory for the complete Markdown text. The necessity to keep the complete DOM in memory could have been avoided without "following sibling" selectors handling but I really wanted "complete" CSS support. Since Markdown can be used for texts ranging from 1 word to 2000 pages, complete recompilation was unthinkable, so partial one with some exceptions was implemented.

The main difficulty with this kind of approach is that it's possible that the compilation, for whatever range determined, does not stop. Meaning that the effect of the edit goes far beyond the determined range, which function can not know this without compiling... It can happen for example if you edit in C and type "{" then all the scopes below are affected by this opened (and not closed yet) scope. In Markdown it can happen when you open a fence code block (using the ``` opening fence). So you need to have a proper way to handle these edge cases. The other problem is to handle the state associated with these regions and handle the non-stopping compilation case talked before.

All this, as I have discovered, make this kind of compilation and it's associated syntax highlighting really difficult to implement properly and fast enough.

In order to be fast, this process needed to be as asynchronous as possible, because there was some tiny but noticeable lags on the main thread. Even partial compilation was more around 8ms to 12ms on my computer (an old macbook pro 2012). But there is also some cases where asynchronous is not suitable, in which cases synchronous processing is used, but, only when absolutely necessary. So, a mechanic which allows to switch between asynchronous and synchronous processing was needed.

All this to say, it's possible but really difficult to do properly. Every bit of the CPU power have to be extracted to make it comfortable to use. I can just imagine with the complexity of C++ what a nightmare it would be to manage, but it should be possible with a proper compiler architecture that modularise partial compilation and encapsulates the inherent complexity of this approach.

Thanks for writing this - its really interesting to read and I'm glad other people are trying this approach.

Is your markdown editor opensource? I'd love to play with it.

"Even on very large C++ project like Chrome, there's no reason why incremental updates should take more than 1-2ms. If all you're doing is changing a single function in the binary file, why do our linkers rewrite the whole thing"

Because it requires N^3+ dataflow problems to track what is possibly changed with significant accuracy.

Does it?

Let’s say I edit a function, and my editor (attached at the hip to my compiler) knows which bytes have been changed. So it knows my change was isolated to a function in a .cpp file. The compiler has cached the compilation environment from the last build, and with that in mind it could recompile just this one function. Then we need to binary patch the function - which essentially requires allocating disk space inside the binary executable. If the previous compiler pass left some free space, we could allocate out of that and mark the old function memory as reusable. Then we need to update CALL pointers - either store them in a list or have every function do indirect calls through a table, and then you just need to update the table entry. If you want to get fancy you could do better than function level granularity, but I think that alone would give us sub-ms builds.

I wouldn’t want to change the process for release builds - but for incremental debug builds I don’t see where you’re getting an N^3 from.

Yes, it does.

"The compiler has cached the compilation environment from the last build, and with that in mind it could recompile just this one function."

First, this is actually the most trivial possible case, and even there, as you'll see, this is not correct.

Outside of a function it does not handle changing types, changing access permissions in classes, changing between virtual and non, etc. It does not handle changing globals, things with attributes of various sorts, etc.

All of those require understanding the uses, or overestimating them.

As for whether function-local changes can have non-local effects, the answer is flatly "yes". You don't even have to be that creative, since you can do things like force template function output and other things that affect global state from a function in C++.

The assumption that because i changed text in one function, it means the possible changes are limited to that function, is not correct in C++ :)

There are languages in which it is, mind you, just C++ is 100% not one of them.

Honestly, I would strongly suggest you start to read papers on this, this is an incredibly well studied area, and the costs are well known.

If it was easy and cheap, we'd still be doing it ;)

It used to be a pretty standard feature of editing environments for things like LISP, etc.

Do you have any recommendations for places to start reading? I didn't realise it was a well studied area - naively it seems like it would be possible in many languages. It would certainly be trivial for 90% of the transpiled / polyfilled stuff that we do in modern typescript / javascript.

I completely agree that doing this to live-update an optimized build would extremely expensive and difficult. And I'll happily acknowledge that (for example) adding or moving fields in a struct would be hard to implement.

But adding a compiled specialization of a template function / class should be "easy" to handle. Its non-local, but I can imagine a dependancy tree of functions needed for the compilation. On update, we can figure out a diff of that tree, free all the now-unused function bodies in the binary and then add (compiling if necessary) all the added / updated functions.

C++ might be one of the worst languages to implement this for - .NET, JVM, typescript or Go might all be much easier. But yeah, if this has been extensively looked in to I'd love to learn why we don't do it for modern development. It seems like an obvious win, and if we don't do it I hope its for good reasons.

But, this is a hard thing to do, the compiler code will have to be organized in a different way, probably the language should be designed specifically towards this end. And the compiler code will be more brittle for sure and will probably be slower for batch compiles.

Most languages do already support ok-fine grained recompiles (at the module level) and it's just many linkers that are slow (I keep hearing from people that this need not be). And actually some of them already do something that is called "incremental linking".

Even C compilers regularly compile 100K+ LOC/second, so I think there's not much to be gained here. Yes, complicated C++ can be slow to compile. Maybe we shouldn't write compliated C++ code to begin with? In the end, batch compile performance also matters.

Very interesting and informative. But I suppose what the user really wants is the next step: not just incremental parsing, but also incremental compilation (or at least the error-checking part of the compilation process).

The basic error checking comes with incremental parsing. A lot of semantic error checking is also not that bad to make incremental (though some languages are just really really bad here).

Incremental compilation depends on the constraints on environment (IE are you trying to incrementally produce normal object files or what)

The more you relax the environmental constraints the easier it gets.

It also depends on just how non-local an effect the programming language allows. For most programming language, general incremental compilation requires either strongly overestimating possibly-modified sets, or global interprocedural dataflow + pointer analysis.

Most choose the former, or don't do general incremental, but instead something like "edits to functions are incremental, edits to global data/types/etc are not"

(This is even without compiler optimization on).

The Skip language (which I mentioned in another comment) is a language based entirely on incremental computation. Every piece of code you write in it becomes incremental, as far as I know.

AND the compiler is self-hosted!!! So that means they have a fully incremental compiler.

Although there is an LLVM back end, so maybe only the front end of the compiler is incremental? That's still important. It comes from the same people as the Hack language, which I believe has a fast "online" typechecker -- which I think is incremental (?)

I believe you could write a C compiler in Skip the same way?

If you peek at the source code, it has a 6000 line recursive descent parser, and 6000 line type checker, much in the style of OCaml (but in Skip).

I think this deserves a blog post because it's not clear to me that the message of Skip got through (7 months ago on HN). Their website has some examples, but not too many.

Of course, I haven't tried it yet, and it's a super ambitious project, so there could be some big problem with it.

But if the goal is to make an entire suite of compiler and devtools that are incremental, then Skip seems like it's ahead of the state of the art and something to borrow from.

Salsa was also mentioned, and I believe it's based on some of the same academic work like Adapton. But I don't know if they have an entire self-hosted compiler as a demo like Skip does. It seems to be "query-oriented" rather than supporting arbitrary code, which is what I would expect for most incremental computing frameworks. Skip seems to be more ambitious.


I believe that the Skip language does this, because it's compiler is self-hosted. It's http://skiplang.com/blog/2017/01/04/how-memoization-works.ht...


Skip – A programming language to skip the things you have already computed https://news.ycombinator.com/item?id=18077612


This is a great comment!

Yeah I think I get what you're saying -- basically that if you have incremental algorithms then you don't need to take a dependency on CRDT or OT to make the UI updates fast?

In other words, you can do things synchronously BUT still quickly, since you have an incremental algorithm. In any case, I basically asked that same question here:


I've seen Treesitter and am impressed by it, which I wrote a little about in that comment.


Since you speak of "sane" languages, I wrote a parser for an "insane" language by hand, with a principled regex-based lexer:

How to Parse Shell Like a Programming Language http://www.oilshell.org/blog/2019/02/07.html

In fact the first thing I tried was to use ANTLR to translate the POSIX shell grammar (which only covers about 1/4 of the shell language). This drove home the limitations of grammar-based approaches. I found its model of lexing to be very limited (along with insisting on materializing a parser tree rather than providing semantic actions in ANTLR v4).

So it's good to hear that it's being updated for "context-aware lexing", and incremental lexing. Are you using ANTLR for IDE-like tools? I wasn't aware that it was good or commonly used for that kind of use case.


I briefly talked with the author of Treesitter on Github about his bash syntax highlighter with Treesitter, which is of course a parser and not just a lexer, and can be made incremental.

I wasn't sure if the context-aware lexing mechanisms here could really handle the shell language:


I think his response was basically that it doesn't have to be 100% correct, which I agree with for an editor. Trading off some corner cases for speed is probably the right thing.

But I have thought about what it would take make a parser for a "crazy" language incremental and 100% correct (which you would want for further semantic analysis).

I wonder if you can write recursive descent parser in the Skip language and then get incrementality for free? The language is designed so that every algorithm can be made incremental, and it's not limited to pure functional constructs like other systems. Supposedly it goes to some lengths to provide you with a more imperative / mutable state modelk, so it should be able to express an "arbitrary" parser.


In other words, instead of having different algorithms for making LL(star) and GLR parsing incremental, Skip might give you a more expressive model for the parser. A bunch of my blog posts are about the limitations of grammar-based approaches.


FWIW the style of context-aware lexing I use is explained here:

When Are Lexer Modes Useful? http://www.oilshell.org/blog/2017/12/17.html

And the other post lists the 14 different modes or contexts that the shell front end has:


It's a very simple and powerful mechanism, but my parser only cares about batch mode for now. It's not clear to me how it differs in expressiveness from what Treesitter offers (or ANTLR). The current lexer mode basically depends on the control flow of the parser.

Just to answer the IME question, it refers to "Input Method Editor" and is an important problem to solve for all platforms, desktop, mobile, and Web. The API's for IME are often crufty and it's easy to get edge cases wrong. These days, lots of people care because emoji, but formerly it was something that English language speakers tended to ignore.

An anecdote that goes against the conventional wisdom about IMEs vs. ”English” and emoji: Right now in Firefox and Chrome, the events that get fired are more correct if you enter emoji using the Windows 10 Pinyin IME’s emoji palette than if you enter emoji using the Windows 10 on-screen English keyboard’s emoji palette.

Edited to elaborate: Keyboard APIs aren't well exercised for astral characters, but IME APIs deal with strings anyway.

OT doesn’t have to depend on central servers, but it is much simpler and less resource-intensive to do it that way. This is what Etherpad, Google Wave and Google Docs do.

As you say, both OT and CRDT come with a “tax” in that you must structure your application’ edits in a way that they can interpret. However, this is easier with OT for the text editing case, as OT uses position based addressing, whereas CRDT is identifier based (I’m assuming this is where the canonical IME issue mentioned comes from - somebody please correct me if I’m wrong).

xi-editor pioneers a hybrid where "identifiers" are actually the tuple of revision and position. The motivation is to buy back some of the performance edge that OT has because of its use of position. I think if you are going to do CRDT it's pretty compelling, but that's also somewhat orthogonal to the complexity concerns raised in my original post.

Interesting - that’s definitely compelling. In a nutshell what exactly is the issue with IME?

Robert Lord's comments on the issue are probably the best explanation, written with the benefit of hindsight.

[0]: https://github.com/xi-editor/xi-editor/issues/1187#issuecomm...

The interesting thing about OT is that almost every non-trivial editor uses that as its model, because it makes undo/redo very simple.

Figma has a blog post about it, they use Fractional Indexing.


Stern Brocot is a better way to do fractional indexing (pathological cases are not aligned with common uses cases like putting something to the front).


I'm pretty certain that these are OT: Etherpad, Docs, Wave. I've done quite a bit of investigation into these too and I'm convinced that OT is the mathematically inferior, but practically superior model (it is far easier to reason about, which is important because not everyone on your team might read compsci books).

You can abstract OT/CRDT behind a step-locked view model, which should be fine for in-process. IME can be solved in this view model, too, by halting updates to the CRDT/OT between IME entries.

The problem is that, with xi, this would necessitate implementing that view model in each front-end, so there's certainly the "CRDT tax" there. xi is a bit of an oddball.

Another OT/CRDT thread from just a few days ago: https://news.ycombinator.com/item?id=19845776

I don’t think Google docs is distributed at all. I think all edits happen in a single process on a single machine. That’s why the limit of concurrent editors is so low.

It’s distributed but it’s not decentralized. Concurrent edits are possible. This is how Google Docs is able to work offline.

It's distributed because there's communication with the clients going on, which requires some kind of coordination to achieve agreement on the result of two simultaneous edits.

I think the assertion is that the edits are not distributed. Instead, an intent to edit is sent to a central server which forces them to be serialized or rejects them. It is not merging distributed edits.

I'm not sure if this is a) the assertion or b) correct/relevant.

This is incorrect. Edits are made locally and sent to the server. The server merges the edit with any edits it received in the meantime from other clients. So the server serializes the rebased edits.

I'm not sure this is disagreeing with me. If a single central server/service is serializing and rebasing the edits, that is what I took the assertion to be.

Decentralized world be the clients communicating their edits to each other.

It can be confusing, as github is a centralized service for git, a decentralized protocol. So, much of the terminologies get blurred.

> So is the gist of it that OT relies on central servers and they all use OT rather than CRDT?

You are right. I'd take shot at blowing out the why part. Practically, real-time collaboration apps can be categorized into two:

#1 The document editors, which stores information on the cloud. I'll refer these as document editors henceforth.

#2 The ones that doesn't need a thick server, except for broadcasting edits. Like plain text editors and code editors. I'll refer these as code editors for simplicity sake.

Both have a slightly different set of problems.

Document editors, usually have a slightly richer document representation schema (since most of them are rich text documents anyway). But they have the luxury of a central server. A central server means, now they have the power for good versioning, rollbacks to a consistent state, picking winners and ensuring ordering of edits when syncing large edits.

Code editors, although their doc representation is usually arrays of characters (it could be strings, ropes, or whatever but essentially just a set of characters), these don't have the luxury of a central server that could decide a winner when concurrent edits take place. This means the clients have to be very intelligent and converge to the same representation no matter how out-of-order the edits arrive.

Now coming back to the two algorithms camps - OT & CRDT:

The general outlook is that, OT - while being simple, they have this classic TP2 problem that would make it harder to arrive at a consistent state eventually - without a help from a central server. The alternative is to have a complicated system, almost as complicated as CRDTs. (More about this can be read through the Internet. If anybody's interested I'll post links to stuff I've read through)

CRDT - CRDT have (or maybe had?) a strong reputation of being efficient at arriving at a consistent state with out of order edits. They can do it, because basically no characters in a CRDT document is deleted anyway (they are just marked deleted and are termed tombstones). So no information loss, which means a clever algorithm can write a function to easily arrive at a consistent state at each site. This means, a central server is now only optional.

If I didn't lose you until now, you'd have intuitively guessed why document editors, tended to always pick OT and code editors prefer CRDT over OT.

This is complete oversimplifications and certainly there's a ton of things I left out for simplicity sake.

Source: I'm in the business of building a powerful Google Docs alternative, a word processor that runs on the browser: https://writer.zoho.com (Zoho Writer)

> The general outlook is that, OT - while being simple, they have this classic TP2 problem that would make it harder to arrive at a consistent state eventually - without a help from a central server. The alternative is to have a complicated system, almost as complicated as CRDTs.

Well this characterization isn't quite right. With regard to TP2, this is a solved problem, with or without using a server. Most of the conversations around TP2 seems to miss the crucial point that TP2 is not needed for convergence. CRDT academic literature has largely been responsible for propagating the notion that TP2 is a must for every OT system - without TP2, all is lost -- this is a myth. I wrote about it extensively here [1].

This article [2] gives an analysis of 7 coeditors based on OT, including the likes of Google Docs, showing that they avoid having to design transformation functions that require TP2. Some of the systems are fully de-centralized [3], others require a server.

It is also worth making a finer distinction between systems that do need a central server -- there are "transformation servers" and "messaging severs". Google Docs, for example, demands that its servers perform some portion of the transformation logic. Many other solutions rely on using the server only for message transmission/serialization, thus the server side logic is significantly simpler.

[1] https://arxiv.org/pdf/1905.01302.pdf

[2] Achieving convergence in operational transformation: conditions, mechanisms, and systems.ACM CSCW (2014)

[3] A generic operation transformation scheme for consistency maintenance in real-time cooperative editing systems ACM GROUP (1997)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact