> At any rate, I don't buy the argument that because Rust is fast and WebAssembl...

alephnan · on April 20, 2022

> I'm glad pure javascript is fast enough for your needs.

In this case, the bottleneck at 9 million LoC is not CPU cycles but memory usage. That's where I am considering pushing down into WebAssembly, but whatever function-inlining optimization V8 is doing, it is probably still faster than the overhead of JS<->WebAssembly function call. I have no doubt WebAssembly will edge out again.

> Merging concurrent changes. (Like, merging a long lived branch). This currently involves getting fancy with a B-tree. This is fast enough in rust. It would be orders of magnitude slower in javascript.

I guess my point is why do you need balanced trees? Is this a CRDT specific thing? Can you implement CRDT with just an array of lines / gap buffer?

> Undo support. When you hit undo, you need to ignore any more recent typing from other users and just undo the local user's last change. This is subtle.

From my understanding, the whole point of CDRTs or any command-based design is express changes to the state as the delta. In which case, you only have to stack / remember the set of commands and not have to store the state on every change. I'm not sure if this overlaps with the data structure choice, other than implementation details.

> And, sure, Javascript's Array is pretty fast. But if you ever need something more complex than Array, your performance will suffer massively. Personally I feel much better writing this code in rust + wasm.

I was just looking into TypedArrays. From my understanding, the JS runtime will use a HashMap or DoublyLinkedList if the array elements are homogenous. In the case where the JS runtimes know it is homogenous, they will fallback to those TypedArray / fixed length buffers that are "continuous" in memory. So on the JS side, Arrays can be as fast as TypedArrays thanks to the compiler. Consequently, in some benchmarks they will effectively be the same.

The question is whether the speed of a native array is faster than the speed of a TypedArray such that it pays for the WASM<->JS overhead. And I guess this depends on the application and the access patterns of that interop.

josephg · on April 21, 2022

> In this case, the bottleneck at 9 million LoC is not CPU cycles but memory usage. That's where I am considering pushing down into WebAssembly

How often does this come up in practice? I can't think of many files I've opened which were 9 million lines long. And you say "LoC" (lines of code). Are you doing syntax highlighting on 9 million lines of source code in javascript? Thats impressive!

> I guess my point is why do you need balanced trees? Is this a CRDT specific thing? Can you implement CRDT with just an array of lines / gap buffer?

Of course! Its just going to be slower. I made a simple reference implementation of Yjs, Automerge and Sync9's list types in javascript here[1]. This code is not optimized, and it takes 30 seconds to process an editing trace that diamond types (in native rust) takes 0.01 seconds to process. We could speed that up - yjs does the same thing in 1 second. But I don't think javascript will ever run as fast as optimized rust code.

The b-tree in diamond types is used for merging. If you're merging 2 branches, we need to map insert locations from the incoming branch into positions in the target (merged) branch. As items are inserted, the mapping changes dynamically. The benchmark I've been using for this is how long it takes to replay (and re-merge) all the changes in the most edited file in the nodejs git repository. That file has just shy of 1M single character insert / delete operations. If you're curious, the causal graph of changes looks like this[2].

Currently it takes 250ms to re-merge the entire causal graph. This is much slower than I'd like, but we can cache the merged positions in about 4kb on disk or something so we only need to do it once. I also want to replace the b-tree with a skip list. I think that'll make the code faster and smaller.

A gap buffer in javascript might work ok... if you're keen, I'd love to see that benchmark. The code to port is here: [3]

> Undo support -> In which case, you only have to stack / remember the set of commands and not have to store the state on every change. I'm not sure if this overlaps with the data structure choice, other than implementation details.

Yeah, I basically never store a snapshot of the state. Not on every change. Not really at all. Everything involves sending around patches. But you can't just roll back the changes when you undo.

Eg: I type "aaa" at position 0 (the start of the document). You type "bbb" at the start of the document. The document is now "bbbaaa". I hit undo. What should happen? Surely, we delete the "aaa" - now at position 3.

Translating from position 0 to position 3 is essentially the same algorithm we need to run in order to merge.

> I was just looking into TypedArrays.

I tried optimizing a physics library a few years ago by putting everything in typedarrays and it was weirdly slower than using raw javascript arrays. I have no idea why - but maybe thats fixed now.

TypedArrays are useful, but they're no panacea. You could probably write a custom b-tree on top of a typedarray in javascript if you really want to - assuming your data also fits into typedarrays. But at that point you may as well just use wasm. It'll be way faster and more ergonomic.

[1] https://github.com/josephg/reference-crdts

[2] https://home.seph.codes/public/node_graph.svg

[3] https://github.com/josephg/diamond-types/tree/master/src/lis...