Hacker News new | past | comments | ask | show | jobs | submit login

Hi! Joseph Gentle here, author of ShareJS. You quoted me in your article.

You're right about OT - it gets crazy complicated if you implement it in a distributed fashion. But implementing it in a centralized fashion is actually not so bad. Its the perfect choice for google docs.

Here is my implementation of OT for plain text: https://github.com/ottypes/text Note that its only 400 lines of javascript, with liberal comments. To actually use OT code like that, you need to do a little bookkeeping. Its nowhere near as bad as you suggest.

In this tiny source-only demo I do all the bookkeeping, and implement a working network protocol on websockets. The result? I can sync an object between the server & client all in about 150 lines of code total: https://github.com/josephg/appstate

This code has no external dependancies except for the ot type itself (ottypes/json0 in this case - although it could be changed to text or rich text or something with barely any code change).

Nice work making a cool demo - but I think you're using the wrong algorithm ;)




Yes, but that implementation deals only with plain text. The complexity seems to ramp up pretty quickly as you support more types of operations, and since extendability is an important concern for my project, I decided to avoid OT.


If you have insert, update, and delete then you can build any other operation from those primitives as long as you have the ability to batch a composite operation. So there's no need to extend the OT algorithm to support other kinds of primitive operation. For nested structures, addressing via ids or linear addresses has to be taken into account but that doesn't affect the OT transforms, it's one layer above. So it's possible (though not easy) to have an extensible OT system without exponential complexity.

Still, avoiding OT has led to some interesting new ideas and I look forward to seeing how things progress.


Does Sharejs currently support distributed synchronization?

My appreciation of OT (having tried to implement it myself) is that proving that the code preserves the transform constraint is very hard (and the proof one makes is easily wrong).

I believe that the OT proof is equally hard for centralized and distributed versions of OT (I was only ever focused on centralized).

Proving that operations commute is way easier. Defining a global total order for operations is way easier.

Working with an OT base seems more difficult as a result; adding offline or new operations requires more proofs and gets harder.

I too turned towards simpler synchronization algorithms: https://github.com/espadrine/canop/blob/master/doc/paper.md


ProseMirror doesn't support distributed synchronisation so why do you ask?

Implementing OT consists of copying algorithms from academic papers, there's no need to prove anything yourself. Commutativity and global ordering are prerequisites to any OT proof.

Offline support falls naturally out of OT, it's effectively free. New operations can be defined in terms of the primitives of insert, update, and delete, given the ability create a batch operations.

Distributed OT, where there is no central server is seriously tricky and algorithms with proofs of correctness can be hard to come by. But centralised OT is readily implementable, albeit with considerable background reading of the necessary papers.

Personally I think it's good to see approaches other than OT being explored. There's space for more than one solution.


> ProseMirror doesn't support distributed synchronisation so why do you ask?

I wondered whether his quote from the article related to the implementation of distributed OT in ShareJS, or just centralized OT. I always assumed he had said that of centralized OT.

> Implementing OT consists of copying algorithms from academic papers

If you extend them, you need to extend the proof as well.

> New operations can be defined in terms of the primitives of insert, update, and delete

Similar to vector spaces, adding a vector to the base that is orthogonal to all others can give you a much larger space. While in OT, every new primitive requires an exponentially growing number of proofs, you can avoid that with a simpler algorithm.

Sure, OT has text figured out (although it took some time to fix the details). However, I am personally interested in more complex data structures such as matrices (bitmap) and trees (vector graphics, rich text), where you want to maintain the meaning of, say, the rotation of a layer. Even with sound, which can be kind of seen like a list, using the same primitives as with text would yield strange results.

It is harder to tweak an OT algorithm to go closer to the intention of the user (which I do heavily in Canop for text) as it is completely independent from the proof that is needed to maintain convergence.

To me, adding collaboration to an operation-based editor should be as simple as adding an undo/redo system.

I absolutely think that OT was a phenomenal achievement, but in a couple of years, I hope it won't be the go-to choice.


> in OT, every new primitive requires an exponentially growing number of proofs, you can avoid that with a simpler algorithm.

Its not actually necessary to add any new primitives though. Most of the time, composite operations can represent higher-level operations. I'll expand on this below.

> I am personally interested in more complex data structures such as matrices and trees

Trees can be modelled in OT using only the primitives of insert, update, and delete, providing that you support linear addresses (i.e. paths). We know from Lisp that everything else can be modelled from trees, so you effectively get a general-purpose system from just three primitives. CoPowerPoint was implemented using this approach and allows arbitrary graphical editing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: