> At any rate, I don't buy the argument that because Rust is fast and WebAssembly is comparably fast, then it makes sense to push your data structure, especially a text buffer, down into WebAssembly.
To store a buffer in a plain text editor, with no syntax highlighting? I'm not going to fight you on that. I'm glad pure javascript is fast enough for your needs.
But doing collaborative text editing with a CRDT adds some extra requirements that your text editor's buffer might not have. For example:
- Hydrating the document state on load. To cut down on file size, diamond types files currently just store the entire editing history. When you open them up, we replay the whole editing history into a buffer. And using Wasm + jumprope[1], this is <1ms even for very large documents. The same would not be true in javascript, and I might need to use bigger files.
- Merging concurrent changes. (Like, merging a long lived branch). This currently involves getting fancy with a B-tree. This is fast enough in rust. It would be orders of magnitude slower in javascript.
- Undo support. When you hit undo, you need to ignore any more recent typing from other users and just undo the local user's last change. This is subtle.
- When edits happen, we need to convert between your (line, JS column) tuple and whatever the CRDT uses internally. Diamond types specifies edit positions by counting unicode characters from the start of the document. So we need to convert between (line,col) and unicode offsets whenever local or remote changes happen to broadcast or apply the changes.
Rust + Wasm let me make all of these operations essentially instant. I can do that because I can use an appropriate data structure & algorithm for each operation.
And, sure, Javascript's Array is pretty fast. But if you ever need something more complex than Array, your performance will suffer massively. Personally I feel much better writing this code in rust + wasm.
I'd integrate diamond types with your text editor by just passing edits back and forth across the module boundary. I agree that it probably doesn't make sense to share a text buffer between javascript and wasm.
> I'm glad pure javascript is fast enough for your needs.
In this case, the bottleneck at 9 million LoC is not CPU cycles but memory usage. That's where I am considering pushing down into WebAssembly, but whatever function-inlining optimization V8 is doing, it is probably still faster than the overhead of JS<->WebAssembly function call. I have no doubt WebAssembly will edge out again.
> Merging concurrent changes. (Like, merging a long lived branch). This currently involves getting fancy with a B-tree. This is fast enough in rust. It would be orders of magnitude slower in javascript.
I guess my point is why do you need balanced trees? Is this a CRDT specific thing? Can you implement CRDT with just an array of lines / gap buffer?
> Undo support. When you hit undo, you need to ignore any more recent typing from other users and just undo the local user's last change. This is subtle.
From my understanding, the whole point of CDRTs or any command-based design is express changes to the state as the delta. In which case, you only have to stack / remember the set of commands and not have to store the state on every change.
I'm not sure if this overlaps with the data structure choice, other than implementation details.
> And, sure, Javascript's Array is pretty fast. But if you ever need something more complex than Array, your performance will suffer massively. Personally I feel much better writing this code in rust + wasm.
I was just looking into TypedArrays. From my understanding, the JS runtime will use a HashMap or DoublyLinkedList if the array elements are homogenous. In the case where the JS runtimes know it is homogenous, they will fallback to those TypedArray / fixed length buffers that are "continuous" in memory. So on the JS side, Arrays can be as fast as TypedArrays thanks to the compiler. Consequently, in some benchmarks they will effectively be the same.
The question is whether the speed of a native array is faster than the speed of a TypedArray such that it pays for the WASM<->JS overhead. And I guess this depends on the application and the access patterns of that interop.
> In this case, the bottleneck at 9 million LoC is not CPU cycles but memory usage. That's where I am considering pushing down into WebAssembly
How often does this come up in practice? I can't think of many files I've opened which were 9 million lines long. And you say "LoC" (lines of code). Are you doing syntax highlighting on 9 million lines of source code in javascript? Thats impressive!
> I guess my point is why do you need balanced trees? Is this a CRDT specific thing? Can you implement CRDT with just an array of lines / gap buffer?
Of course! Its just going to be slower. I made a simple reference implementation of Yjs, Automerge and Sync9's list types in javascript here[1]. This code is not optimized, and it takes 30 seconds to process an editing trace that diamond types (in native rust) takes 0.01 seconds to process. We could speed that up - yjs does the same thing in 1 second. But I don't think javascript will ever run as fast as optimized rust code.
The b-tree in diamond types is used for merging. If you're merging 2 branches, we need to map insert locations from the incoming branch into positions in the target (merged) branch. As items are inserted, the mapping changes dynamically. The benchmark I've been using for this is how long it takes to replay (and re-merge) all the changes in the most edited file in the nodejs git repository. That file has just shy of 1M single character insert / delete operations. If you're curious, the causal graph of changes looks like this[2].
Currently it takes 250ms to re-merge the entire causal graph. This is much slower than I'd like, but we can cache the merged positions in about 4kb on disk or something so we only need to do it once. I also want to replace the b-tree with a skip list. I think that'll make the code faster and smaller.
A gap buffer in javascript might work ok... if you're keen, I'd love to see that benchmark. The code to port is here: [3]
> Undo support -> In which case, you only have to stack / remember the set of commands and not have to store the state on every change. I'm not sure if this overlaps with the data structure choice, other than implementation details.
Yeah, I basically never store a snapshot of the state. Not on every change. Not really at all. Everything involves sending around patches. But you can't just roll back the changes when you undo.
Eg: I type "aaa" at position 0 (the start of the document). You type "bbb" at the start of the document. The document is now "bbbaaa". I hit undo. What should happen? Surely, we delete the "aaa" - now at position 3.
Translating from position 0 to position 3 is essentially the same algorithm we need to run in order to merge.
> I was just looking into TypedArrays.
I tried optimizing a physics library a few years ago by putting everything in typedarrays and it was weirdly slower than using raw javascript arrays. I have no idea why - but maybe thats fixed now.
TypedArrays are useful, but they're no panacea. You could probably write a custom b-tree on top of a typedarray in javascript if you really want to - assuming your data also fits into typedarrays. But at that point you may as well just use wasm. It'll be way faster and more ergonomic.
To store a buffer in a plain text editor, with no syntax highlighting? I'm not going to fight you on that. I'm glad pure javascript is fast enough for your needs.
But doing collaborative text editing with a CRDT adds some extra requirements that your text editor's buffer might not have. For example:
- Hydrating the document state on load. To cut down on file size, diamond types files currently just store the entire editing history. When you open them up, we replay the whole editing history into a buffer. And using Wasm + jumprope[1], this is <1ms even for very large documents. The same would not be true in javascript, and I might need to use bigger files.
- Merging concurrent changes. (Like, merging a long lived branch). This currently involves getting fancy with a B-tree. This is fast enough in rust. It would be orders of magnitude slower in javascript.
- Undo support. When you hit undo, you need to ignore any more recent typing from other users and just undo the local user's last change. This is subtle.
- When edits happen, we need to convert between your (line, JS column) tuple and whatever the CRDT uses internally. Diamond types specifies edit positions by counting unicode characters from the start of the document. So we need to convert between (line,col) and unicode offsets whenever local or remote changes happen to broadcast or apply the changes.
Rust + Wasm let me make all of these operations essentially instant. I can do that because I can use an appropriate data structure & algorithm for each operation.
And, sure, Javascript's Array is pretty fast. But if you ever need something more complex than Array, your performance will suffer massively. Personally I feel much better writing this code in rust + wasm.
I'd integrate diamond types with your text editor by just passing edits back and forth across the module boundary. I agree that it probably doesn't make sense to share a text buffer between javascript and wasm.
[1] https://crates.io/crates/jumprope