

Build your own Google Docs with GoInstant's new OT API - jhchen
https://goinstant.com/blog/build-your-own-google-docs-with-goinstants-new-ot-api

======
adityab
This is nice!

WebODF[1] developer here. Commenting because the project is doing this sort of
thing and I have decent experience writing OT. I'll give a short explanation
of how Operation Transformation works.

"Build your own Google Docs" is a slightly brave claim - "build your own
Etherpad" would be more accurate.

This is because their operation set is very simple; for rich text documents
the operations simply do not contain enough information to transform them
against other ops. At the moment, their API includes 3 operations:

    
    
        * Insert  
        * Format  
        * Delete
    

What I don't see:

Something like MoveCursor, for a start. From the source code of the markdown
editor, it appears that cursor position synchronization is handled outside of
the OT realm, likely by some other signalling. This is bad because the cursor
positions do not get transformed at all - you can see this if you try to
collaboratively edit on the same line in the demo - your cursor jumps around
unwieldily.

Why this happens:

Imagine two users A and B concurrently editing inside a paragraph (a | denotes
a cursor, belonging to B):

    
    
      <p>abc|def</p>
    

Now suppose A inserts "123" at position 3 (after c), and B tries to move
his/her cursor to the right by 1 step. Assume that A's operation reaches the
server first. Then server's and A's state is:

    
    
      <p>abc123|def</p>
    

because B's cursor is pushed to the right by the preceding text.

Then B's operation "move cursor to position 4 (after the d) reaches the
server. But A's operation has not reached B yet. The state on B becomes:

    
    
      <p>abcd|ef</p>
    

And the state on the server becomes:

    
    
      <p>abc1|23def</p>
    

Now B's op is relayed to A, and A's to B, and A's state becomes:

    
    
      <p>abc1|23def</p>,
    

and B's state becomes:

    
    
      <p>abc123d|</p>.
    

because there is no OT for "MoveCursor" against "InsertText" \-- in fact
"MoveCursor" is not even an op here. This is an inconsistent state, which is
basically a broken editing session (should never happen). Fixing this up by
passing around diffs is not allowed (in a strict sense); diffing is not
controllable OT.

WebODF uses stateless client-side OT, so any server is just mostly a dumb
operation relayer. See what the OT logic for these two operations does:
[https://github.com/kogmbh/WebODF/blob/master/webodf/lib/ops/...](https://github.com/kogmbh/WebODF/blob/master/webodf/lib/ops/OperationTransformMatrix.js#L537)

If the "InsertText" operation's position was lesser than the concurrent
"MoveCursor" position, "MoveCursor" is _transformed_ to increment it's
position by the "InsertText" operation's text length. This ensures that the
cursor, while moving, will end up at the _intended_ position within the word
you were looking at. (you'll notice that "MoveCursor" has a 'length' parameter
in WebODF because a cursor also represents a user selection, yes it has
colored collaborative selections).

OT'd process:

    
    
      <p>abc<cursorB/>def</p>
    

A inserts "123", A's state is:

    
    
      <p>abc123<cursorB/>def</p>
    

B moves cursor to position 4, B's state is:

    
    
      <p>abcd<cursorB/>def</p>
    

A gets B's op, _transforms_ it to increment position by "123".length = 3,
state becomes:

    
    
      <p>abc123d<cursorB/>ef</p>
    

B gets A's op, applies it because moving a cursor doesn't affect text
insertion, state becomes:

    
    
      <p>abc123d<cursorB/>ef</p>
    

Hooray, Eventual Consistency!

Any OT implementation is full of such _opinionated_ enforcements and
tradeoffs. It should make some assumptions about the editing UX and make sure
that this is preserved when doing realtime concurrent editing; ignoring that
and _only_ going for eventual consistency will result in situations where the
user cannot look at where they are typing.

WebODF has many such operations (~23) for each kind of edit: "SplitParagraph",
"InsertText", "ApplyHyperlink", "ApplyDirectStyling", etc at last count. A
large number of them are supported in collaborative mode, i.e. they have OT
written. And this is no small task, because _each operation needs to have a
transform written against every other operation in the set of all operations,
including itself_. It does help that not all operations interfere with others,
so the OT matrix is somewhat sparse. :-)

Fact: Any complex rich text editing framework will need to have it's own
customized implementation of OT. We're dealing with the Open Document Format,
we have Operations that are significantly tailored to it.

I feel that GoInstant's API is a nice thing! But it needs: 1\. More
operations. 2\. Purely client-side OT (not strictly required, but very very
useful).

Shameless plug: WebODF is a FOSS project with ~3 fulltime developers
(including me) working on it. We are also working with the ODF change-tracking
subcommittee to standardise things. We need more contributors and/or
donations. :)

    
    
      [1] http://webodf.org/
    

EDIT 1: Sorry, this post became longer than I had wished. I hope it helps
anyone understand how OT works. Also, 'gratulations to GoInstant for launching
this. :)

EDIT 2: Reformatting post for readability.

~~~
josephg
(Ex-wave, sharejs.org author)

You should be aware that on the google wave team we eventually dumped the idea
of 'SplitParagraph' / 'JoinParagraph'. We spent a long time trying to make
them work, but eventually proved that it was impossible to make our OT system
work correctly on those operations. I suspect that you'll have the same
problem.

I would advise you to map your 23 operations down to a much smaller set if you
want to get OT working correctly. With wave, we ended up implementing
'SplitParagraph' by inserting an empty <newline /> tag. We could remove that
tag collaboratively (correctly), and all the OT semantics worked much better.

------
habosa
An Operational Transform API has always been my "maybe when I have more time
and knowledge" idea. I'm glad to see someone else doing it and I hope I have a
chance to use it. OT is the magic behind Google Docs, and it's one of those
really elegant protocols that most CS people instantly appreciate when it's
explained to them.

[http://en.wikipedia.org/wiki/Operational_transformation](http://en.wikipedia.org/wiki/Operational_transformation)

~~~
gritzko
Repeatedly rewriting operations in-flight is probably the worst trick one can
use to deal with partial order.

~~~
josephg
Care to justify that assertion? I think OT is a fantastic way to deal with
partial order.

OT can be used to guarantee eventual consistency while preserving intent. And
unlike transactions, it lets you edit your local model syncronously in the
client without waiting for a server round-trip. When you have a stack that can
do it, its a really great system.

~~~
gritzko
First of all, I know what I am talking about; I published on the subject, I
also created a working system (CRDT-based).

Second, in my understanding, transactions don't deal with partial order; they
do linearization. That indeed requires a round-trip.

Third, I also sketched some OT-based solutions, so I saw both sides of the
coin. According to my experience, OT is terribly complex and fragile. Basic
principles are simple, but various concurrency issues make it a nightmare
further down the road. Operations are rewritten in several places, all those
rewrites are interdependent, partial failures create really tricky situations,
etc.

Fourth, Google has a working OT stack, but they also have a working Paxos.
They simply have too many brainy folks and too much of equipment/money,
probably.

Fifth, all the problems OT solves by entangled op rewrite trickery can be
trivially solved by brute force (not that much of force actually, see CRDT).

Sixth, the very notion of "intent" in OT was a really really big misnomer.
"Intent" is OK for a legal term, but it is terrible for CS, as it has no
intuitive formal meaning (as opposed to "convergence", for example).

Seventh, OT has a strong "magic factor". It is too smart kind of, so it
strikes a spark of excitement in some folks. That is often a bad thing (see eg
"digital fountains").

~~~
gritzko
P.S. "implementing OT sucks" \-- Joseph Gentle

------
amix
I can recommend this article that explains OT really well:
[http://www.codecommit.com/blog/java/understanding-and-
applyi...](http://www.codecommit.com/blog/java/understanding-and-applying-
operational-transformation)

------
bsoft16384
Google Drive Realtime API developer here.
[https://developers.google.com/drive/realtime/](https://developers.google.com/drive/realtime/)

If you're trying to build your own Google Docs, you might want to consider
using the Drive Realtime API, which is based on the same technology as Google
Docs.

The Drive Realtime API has a wide range of mutations built in for common data
structures: \- Maps (last writer wins) \- Lists (insert, delete, overwrite) \-
Strings (insert, delete, overwrite)

We also support arbitrary data structures including trees and graphs.

For modeling cursors, we have "Index References" which are effectively
pointers to a specific element in a string or list. You can move them around
on the client side and we have all of the transformation logic worked out so
that concurrent edits don't make them point at the wrong position.

Because the data is stored in Google Drive, you can create, share and organize
files using the standard functions in the Google Drive API. There's also an
easy-to-use REST API for importing and exporting data as JSON.

------
amix
Are there any great open-source implementations of OT? Last time I looked
there were some, but the quality and popularity of them was questionable.

~~~
jimktrains2
[https://github.com/share/ShareJS](https://github.com/share/ShareJS)

That guy also has a C library too. IIRC he used to work on the wave team.

~~~
josephg
(Author here)

ShareJS also supports collaboratively editing arbitrary JSON documents.
Support for seeing everyone's cursors is being added at the moment - there's a
sandbox kicking around here if you want to play with the prototype version:
[http://sharejs.org:7007/](http://sharejs.org:7007/)

