Hacker News new | past | comments | ask | show | jobs | submit login
Lessons learned from creating a real-time collaborative rich-text editor (ckeditor.com)
426 points by Reinmar on Oct 16, 2018 | hide | past | favorite | 88 comments

> We were quite seriously scared of repeating the infamous history of Netscape which was a well-known example of failing to successfully release a newer version of a popular software after deciding to rewrite it from scratch. Fortunately it did not happen in our case.

Big kudos for pulling it off!

> Two main candidates are Operational Transformation (OT) and Conflict-Free Replicated Data Type (CRDT). We chose OT and perhaps one day we will write down our thoughts on the ongoing OT vs. CRDT battle.

Please do, I would love to hear your thoughts on OT vs CRDT!

Oh, that's really tempting :) Believe us! There are people in our team who would love to dive deeper into CRDT. We confirm that OT definitely has its problems. It's elegant at first but gets really dirty the deeper you dive into this... hole. Especially the undo feature adds to it a tone of complexity.

So, some of my colleagues see CRDT as a potential saviour. There was even a paper on CRDT for tree structures that we might use (can't find a link now). However, we'd rather go with a linear model for CRDT. That'd make it a bit easier on the CRDT level but it would require some really smart decisions about: types of changes, the structure of the linear model and how it is transformed to the tree model (that we need after all). We know, though, that without an actual implementation, it's just a theoretical discussion. Therefore, we started thinking whether we could add CRDT "below" our current tree model. That might allow validating the concept in a real-life. Can't wait to do that :)

BTW, in the meantime, we recommend this paper: https://arxiv.org/abs/1810.02137

Here's the link to the paper discussing tree-based CRDT: https://arxiv.org/abs/1201.1784. I personally find those papers very interesting!

For anyone interested in CRDT, I can recommend the deep-dive writeup to Raph Levien's Xi editor:


Ditto. Would love to hear more about the OT vs CRDT decision!

You both may enjoy http://archagon.net/blog/2018/03/24/data-laced-with-history/ if you haven't seen it already, it's an absolute gem, and touches on exactly this.

Thank you for the suggestion, that was a very enjoyable read indeed :)

Thanks to the author as well, for distilling so much information into the very readable document that this was.

One thing I am wondering about is in the conclusion of the document, the author says:

> You’d be hard-pressed to use CRDTs in data-heavy scenarios such as screen sharing or video editing.

With regards to video editing, could it not be rather doable after all?

Consider a non-linear video editing system (NLVE) [1] like Adobe Premiere Pro [2].

While the size of the video data source material you are editing might range into probably several hundreds of gigabytes for a 4K feature length film, the project file itself remains very small, because the video data is kept separate from the timeline data.

Wikipedia explains:

> Non-destructive editing is a form of audio, video, or image editing in which the original content is not modified in the course of editing; instead the edits are specified and modified by specialized software. A pointer-based playlist, effectively an edit decision list (EDL), for video or a directed acyclic graph for still images is used to keep track of edits.

Are these not in fact the sorts of data structures that would lend themselves the most naturally to being implemented as CRDTs? :D

As for the video source material that is used in the project, those files can be distributed by users outside of the editor software using for example BitTorrent, or they could be automatically distributed by the editor software via for example public IPFS or a private Dropbox folder, or if the users that are working together are on a shared network then the source files could be located on a shared NAS.

[1]: https://en.wikipedia.org/wiki/Non-linear_editing_system

[2]: https://en.wikipedia.org/wiki/Adobe_Premiere_Pro

I ditto your ditto. (Is this now a tritto?)

  «Ditto is a library for using CRDTs.» [0]
[0] https://github.com/alex-shapiro/ditto

As a gentle introduction to OTs, one of my colleagues did a Rails Conf 2018 presentation on implementing collaborative editing. His blog is here https://blog.aha.io/text-editor/ including a simple implementation in Ruby.

> The number of tests: 12500

> Code coverage: 100%

> Development team: 25+

> Estimated number of man-days: 42 man-years (until September 2018), including time spent on writing tools to support the project like mgit and Umberto (the documentation generator used to build the project documentation)

Software is expensive

I showed the demo to a passing PM and asked him how long he would expect a team of five developers to spend on it and he said "a couple of weeks... no wait, maybe two months." :)

show this stats next time the junior dev will say that she/he will develop own WYSIWYG editor in like 1-2 weeks for the project needs ;)

`import editor from CKEditor`

or something like that ;-)

That's an interesting opinion that differs from my item. My impression to all software costs, not just this one, is, "Wow, software is just so damn cheap!"

Compared to any physical project or product software is by comparison so cheap!

For anyone interested in this issue I would suggest taking a look a Neil Fraser's Differential Synchronization.[0] It took me 2 weeks as a solo developer to add real time collaborative editing to our product using that method.

[0]: https://neil.fraser.name/writing/sync/

Differential synchronisation works very well indeed for plain text, but as Neil himself has pointed out, it doesn't necessarily work well for tree-structured data (like XML or JSON) because there's no guarantee that the combination of separate syntactic-validity-preserving edits will necessarily be syntactically valid.

Using diffs was our very first approach. It was quite naive (not to say dumb) solution but we also came to a conclusion that it will be problematic for tree-structured data and there were also performance issues (for example when you bolded the whole document).

So instead of trying to fix those issues, we tried a different approach.

The solution to this problem is to add a translation layer from the tree structure to plain text. If you can do that you can work on any arbitrary data structure easily. I'll leave it up to the reader how to figure out how to do it ;)

We've been there (although, in case of OT). When implementing CKEditor 5, we started with a linear model and just a couple type of operations (the usual set). At that time, we weren't able to transform that linear model to a tree model because conflict resolution based on that simple set of operations was too dumb. As I mentioned in the other comment about CRDT vs OT, we'd still consider having a linear + tree model duet, but so far it's purely theoretical discussion whether it has a chance to work.

I have proof it works, but there’s no room?

Doesn't implementing diff and fuzzy patch operations that work on your data structure (not plain text) solve that? It worked fine in my specific case.

Nice! It's my understanding that the default method used by Neil requires a fixed polling interval, and thus updates are limited to every 100ms or so. Did your implementation get around that?

yes, we got around it

Nice article. This is a great example of how Domain-Driven Design pays off. By really paying attention to how users think of the problem and expressing that in their design, they ended up with a richer domain model which in turn let them get better code and a better user experience.

Thanks! A good UX was indeed one of our main goals and we weren't sure whether we're not overdoing this (again, the Netscape history ;|). But we kept in mind that simpler solutions already exist and it doesn't make sense to just copy them. That'd make the job easier, but we'd end up with the same issues that we identified in the existing platforms (annoying conflict resolution, stability issues, platform limitations to linear content – no tables, no nested blocks). So we risked taking the long path. The UX served as a compass and allowed making some reasonable shortcuts in some non-critical places. If we'd try to solve all possible issues that we've hit, we'd still be in the starting point :D

It says on the pricing page that CKEditor 5 has "GPL 2+ copyleft Open Source" license. Can someone ELI5 that?


I guess you have to make your whole frontend open source if you use their editor under that license? I'm not sure how GPL interacts with scripting languages. Do you have to run GPL scripts on a GPL'd interpreter for it to be permissible?

Regardless, the pricing for their commercial offering seems really intense. $29/mo for 25 MAU? I feel like they're missing a few zeroes there. Maybe they meant 2,500 MAU? 25,000 MAU? The demo I played with makes it clear that the editor component is rather nice, but I would rather look at alternatives at that price. I would imagine the collaborative nature is what influences the price, but somehow Google Docs and MS Office do fine with collaboration as a core product feature without charging >$1/MAU just for that feature.

I suspect the difference is that Google Docs and MS Office don't integrate seamlessly into your front-end product and provide your users with a collaborative editing experience. They would have to know they're using Docs or Office (and if necessary, acquire their own account and maintain their own license in good standing, whatever that entails. Either way, the users' data is ultimately to whatever extent locked up in that solution, and not in your product.)

It makes sense to me, that you can use their flagship editor product and advertise for CKeditor, as long as you're publishing your work under the same copyleft license so they don't need to feel like you've taken advantage of them for a commercial endeavor.

Under a GPL license, nothing is preventing you from charging your users, but also it is probably generally expected that nothing is preventing the users from borrowing your source code, then standing up their own instance of the tool and maintaining it in-house, either.

In this way, it makes sense. You can either pay CKeditor for their work when you integrate it ... or, for no money, you can commit to not use the editor to hold your users and their data hostage.

(I'm over-dramatizing a bit, but also not. People regularly talk about freeing themselves from their Google Mail hostage situations, especially on HN.)

Maybe $29/mo is a lot for 25 monthly active users when you're operating a business at web scale, but it's really not a lot of money for a company with 25 employees who has maybe built their own bespoke in-house collaborative solution, and managed to do it using CKeditor instead of rolling their own editor or cobbling one together from other free components. I don't know if that's the target market, but that's certainly the base case, if your start-up company is "dogfooding" its own solutions.

If your users are not worth $1/mo + your other overhead costs, then maybe they don't need to get real-time collaborative editing? (Or maybe it doesn't need to come from your solution...)

Yes, that's what I was thinking too. It doesn't make much sense to couple your product at those prices.

CKEditor 5 is cool and all, but $1 USD per month per user is in our case way more expensive than our cloud infrastructure.

First of all, thank you so much for the feedback. The 25 end user plan is a specific offering for small projects (no zeros missing:)) please don’t forget that the fee covers as well the support allowance, enterprise warranties, indemnifications, etc.

In our flexible plan, we always try to find a working licensing scenario with our customers, with volume discounts, especially for bigger projects. Also, note that MAU count is not the only licensing metric that’s offered, and we are happy to simply sit down with you, discuss your project and reach an agreement.

> I'm not sure how GPL interacts with scripting languages. Do you have to run GPL scripts on a GPL'd interpreter for it to be permissible?

Somewhat related (Common Lisp is typically compiled) I found the "Lisp LGPL" / LLGPL license to be a clear way of resolving the inherent C-bias of the GPL license suite. It's essentially just the LGPL but with this preamble that defines valid/invalid forms of proprietary software "linking" with an LLGPL library: http://opensource.franz.com/preamble.html Of note is a fun instance of the open/closed principle -- your subclasses of a library's class are yours but modifications to the base class trigger the LGPL licensing hook for your application.

I would assume that if you're serving Ckeditor in your webapp, the GPL would apply and your whole webapp needs to be GPL'd.

Given they had other license options for version 4 (https://ckeditor.com/legal/ckeditor-oss-license/), I suspect they might just use the GPL only option for 5 to try and drive sales. Good luck to them, lots of companies are using Quill (BSD) instead.

> the pricing for their commercial offering seems really intense. $29/mo for 25 MAU?

The page doesn't seem to imply that the price for 25+ users would scale linearly; for that they have a "Contact Us" call to action for negotiating pricing.

25 MAU seems reasonable to me for the following use-case: Company A sells a custom CMS to Company B, with a CKEditor-powered admin area to be used by <25 of Company B's content writing staff.

>The page doesn't seem to imply that the price for 25+ users would scale linearly; for that they have a "Contact Us" call to action for negotiating pricing.

I confirm that! It is just a specific offering for a small project where 25 people team use our editor. It's a fair price (covers granting the commercial license, the right to use / change / hide our code, enterprise support, maintenance, warranties, indemnifications, etc). For bigger projects we always try to implement a licensing scenario that works well for a given customer. Contact us and we will do our best to craft you an optimal plan.

It might be worthwhile putting some bracketing or aspirational pricing up there with it too then as you’re potentially scaring away potential commercial clients who look at that price and can’t see how they could justify its inclusion at a prices they would have to guess at using that as a base.

Thanks, I can Google too, but I was asking for a simple explanation of the license.

Do you mean you aren't sure of GPL2 (the plus just means they reserve the right to change any contributed code to GPL3 or above with out getting a new license agreement from the contributer) or that you aren't sure how that interacts with a web app?

In this case, essentially all your front end code would need to be GPL2+ as well (GNU considers dynamic linking to count for GPL purposes I have a hard time imagining you could sufficiently separate the CKE code from the rest of your code/site to satisfy them). If CKE requires any server side code your back end code technically would need to be GPLed as well but you'd never have to share it (back end code is generally considered to not be distributed hence the AGPL) so it would have no real effect and no one would know if you didn't.

GPL 2+ just means you can redistribute the code under any version of the GPL from 2 or following. Your choice. At the moment there are only GPL 2 and GPL 3 but if there's ever a GPL 4 etc. those would be acceptable too.

Practically speaking, it also means that if you want to contribute code back to this code base, you would have to license your code in the same way, allowing redistribution under all GPL versions 2 and up.

Nice! I too, would love to read more on the OT/CRDT trade-off.

Companion reading (the approach taken in ProseMirror, which was implemented in significantly less person-years): http://marijnhaverbeke.nl/blog/collaborative-editing.html

Do you ever garbage collect the graveyard? I wrote a tree based OT (slightly different domain) and never found a great solution for that.

Interesting question! Actually, at the moment we don’t :).

We did some testing to see how big this issue is and it turned out that it's not dramatic. The main difficulty with graveyard purging is that you need to keep the undo in mind. Elements in the graveyard are used for undo, so garbage collecting will have to be strictly connected with limiting undo steps.

Add to that collaborative changes and it becomes a tougher problem. Therefore, we'd probably start from what seems to be the safest option – trimming the history stack and implementing a garbage collector that goes through all the possible reference points.

Yup. Ran into the same thing.

Another Q for you if you're still around: did you have to make any changes to your algorithm for when it's running through the undo queue, or is it basically the same thing but in reverse? I found some operations could better preserve intention with some adjustments to the undo transform

Oh, don't get me started on modifications we needed to do to make undo look reasonable ... :) At first, we thought that undo is the same thing as collaboration but... nope.

The crucial difference is that with undo you have a context (old document state). With collaboration, you don't. So if both users put a character at the same position, in collaboration the order of output doesn't matter (and can't be solved really). In undo it does, because a user remembers what was the order of the letters before they made a change.

We should compare notes some time :) Sounds like we had to solve a lot of the same issues and the dearth of info on this topic is surprising!

Is there a sharedb compatible OT type definition? Like how https://github.com/ottypes/rich-text exists for Quill? https://quilljs.com/docs/delta/

Also is it possible for third party plugins to add new operations (defining their own conflict resolution against the existing types)?

Our first idea was to have four basic operations (insert, move, attribute, remove) and then have “deltas” (wrap delta, unwrap delta, merge delta, split delta) which were built using those operations and for which we described additional transformation cases. In this approach operation transformations were set in stone, but you could add your own deltas and provide your own transformations for them.

In the end, we concluded that users will see defining deltas and transformations as too complicated and won’t use it. Later, we also decided to drop deltas idea at all and we rewrote deltas into operations.

As an alternative solution, we provide the post-fixing mechanism. Your plugin can listen to all the changes done on the model and apply fixes if something has gone wrong. We used it for the famous "insert table row / insert table column" problem, to add the missing cell. This alternative is not as clean as explicitly providing a transformation algorithm but it is much easier to implement.

The problem with defining your own operations is that you need to write transformations against all the other operations which is a huuuuge work. It took as a few years and we still see some room for improvement.

Amazingly good illustrations. Whoever did that should illustrate all the books.

While I generally agree they are nice, one nit is that some of the fonts are unnecessarily small and/or faded. It looks to me they could be made more visible without sacrificing the visual utility of the diagrams. Roughly 1/3 of the population would have age-related vision difficulties with those kinds of fonts. Every graphic designer should take courses in accessibility.

Been searching for a bit and it appears that the real-time collaboration part isn't open source and requires you to use their cloud service to make it work. The pricing for that isn't available on their website and instead requires you to contact them.

I hope I'm wrong since I have some ideas I'd like to try it out with. Sending my customer's data to their cloud is probably a non-starter.

Recently we realised that our public communication about collaboration is quite confusing ;( We mention "Cloud Services" everywhere, but the reality is that you can have an onpremise installation of our collaboration backend that you can host/control by yourself. The SAAS offer was first, that's why the current website looks how it looks. The onpremise offer is currently in a closed private-beta state, so if there is anyone willing to give it a try, we're open. We'll make it fully public in a month or two, after we polish the documentation and so on.

Is there any difference in editing a structured document which is intended to ultimately be text for presentation and reading, versus a structured document which is describing requirements for code or a storyboard for a design? Do these different end-results change the collaborative environment?

I would think yes. Structured editing is tough because of the interaction possibilities. Think of it in simplistic terms "add words to paragraph" at the same time as "remove paragraph." What do you do?

With text presentation and reading, you are ultimately limiting the number of things that can be seen that way. Though, you can of course expand your entity list to include more and more things. Headers, lists, sections, chapters, paragraphs, etc.

That is to say, I think this is already difficult for structured text. Adding more structure, in the forms of "requirements section" and such just compounds on that complication. I'm not sure it adds more.

What is the main use-case for this? I'm looking at it from a coder's perspective. When I'm writing some code I wouldn't want somebody else inserting characters into my code while I am editing it.

But I can see that if the app is divided into a multiple code-files then different coders can be working on different files at the same time. Still I prefer some form of code ownership or at least locking of multiple files just for me while I'm working on them.

I don't quite see how it would be much different for other kinds of text. While I'm writing this post right now I wouldn't want somebody else modifying its beginning while I'm writing its end.

Not saying there aren't any good use-cases, just can't think of any right now. So that's my question, what are they?

The typical way I've seen this used is almost like pair programming. One user will write a draft of something, or add an image or chart while on a conference call with the other editor(s). Honestly, I see it a lot more with spreadsheets than with rich text, but the use case is there. It was one of the features that put Google docs on the map back in the day.

I can imagine real-time collaboration is an immensely interesting problem to work on.

Also wonderful blog post, actually read it in full, thanks!

It truly is the most interesting subject I've been working on yet. The fascinating thing about it is that it isn't a "solved" issue with one, correct solution, so you can feel like a 19th-century scientist at times :).

If I were Google I'd buy up CKEditor to help compete with Microsoft Office.

I do not think they are interested. Google Docs/Sheets gets features at a very slow pace -- if Google wanted to compete with Office they could literally pour hundreds of millions and a legion of developers into it. As the only monetization I am aware of is Google Suite perhaps that's not enough to justify it. Unless you want to see ads in Docs / Sheets... also, Google Wave has been cancelled.

Re: Unless you want to see ads in Docs / Sheets

The more you use a product, the more you visit related sites and materials on the web; so yes, ads.


Recently saw a fantastic intermediate talk on this subject called CRDTs and the Quest for Distributed Consistency[0]. One product of the research was automerge[1], a library for handling collaborative editing of JSON-like data models

[0]: https://www.youtube.com/watch?v=B5NULPSiOGw

[1]: https://github.com/automerge/automerge

This is a great blog post.

It was interesting to see how they approached moving from a linear data model to a tree structured data model. Is anyone aware of similar work with graph structured data, as you find in electronic design automation? Think collaborative schematic editing.

Nice writeup! A tiny nitpick, shareJS (DerbyJS) had/have an object based OT implementation (ie. tree) since before this was started I think (or around the same time).

As far as I remember shareJS was available when we started to work on tree-based OT. We researched it but it turned out to be too simple for what we wanted to achieve and there was no special handling for some edge cases that happens during collaborative editing.

We achieved something similar to shareJS functionality quite early, to be honest. What proved to be extremely difficult was all the extras to smooth up the user experience. We would have to (heavily) extend shareJS anyway, so it made more sense to go out with own solution, crafted for our needs.

> We started building our next generation rich-text editor with the assumption that real-time collaborative editing must be the core feature that lies at its very foundation

Am I the only one who doesn't like other people typing through my work? Why not simply let the user choose when to merge their work with others?

Honestly, sometimes I feel collaborative text editors have been created "just because we can" or "just because we want to see if we can".


Certainly, there are different use cases and different expectations. What we call "offline collaboration" can be seen as a special case of the real-time collaboration. In "offline collaboration" you also need to merge changes so you also need the conflict resolution. However, the longer you postpone merging, the better the conflict resolution algorithms need to be because the more conflicts you have. That's one of the reasons why we introduced new, semantic operations.

The other option is so-called change tracking or suggestion mode. That's on our roadmap as well and thanks to the platform we created it's not a big deal now. In fact, if we started with implementing suggestion mode (or "offline collaboration"), we might have ended up with an engine which is not ready for real-time collaboration. But once we have real-time collaboration we can now be quite certain about solving other issues.

Sometimes you cannot avoid postponing merging, e.g. when the user is loses connectivity. It'd be interesting to read more on know how you handle longer periods of being offline, and how conflict resolution is done in that case.

if jira and confluence allowed this my company would benefit from probably a double digit efficiency boost.

we have to split our "collaborating" to google docs, and our "carving in stone" in JIRA and Confluence.

I'll say though that the "suggested edit" and "comment on a range" features (and the distinct edit modes about commenting, suggesting and approving edits, and free for all battle royale are all valid modes for us all the time.) (300 person SF+intl. software company)

Doesn't Confluence have collaborative editing? https://confluence.atlassian.com/doc/collaborative-editing-8...

Jira and Confluence would benefit from a better editor plugin even if it didn't have real-time collaboration...

Aka such a shame Google Wave was cancelled.

How do you keep track of the nodes in the "graveyard"? Does that mean that each node has to have some sort of UUID across clients?

Graveyard is just another root, like the main one. So, the same way that all elements have the same position in the main root for all clients, elements in graveyard also have the same position for all clients. Thus, their path is kind of UUID for them.

The link at the bottom which said that you could check out a demo was confusing because there was no demo there.

Demo can be found at https://ckeditor.com/docs/ckeditor5/latest/features/collabor...

It seems that the link is fine. Scroll a little down and you can test the collaboration in Letters (built using CKE5) or with CKE5 document build (switch the tab).

I should note that I am on mobile. On mobile I don’t see a demo in the link from the OP whereas the demo in the link I posted above is available on mobile.

Yes, that's correct. The demo for mobiles is intentionally hidden because both examples on ckeditor.com are right-now desktop-like. We'd be working soon on a solution that would provide much better UX for mobiles, once we do it we'd show a dedicated demo for mobile devices.

Nice write-up. I've seen[0] the blood, sweat and tears that went into this project and I have to say it has come a long way.

BTW any timeline on sunsetting the venerable CKEditor 4?

[0] I was part of the CKEditor 4 team back when the CKE5 guys were laying down some the first iterations of what is described in this blog post.

Hey Tade0! As much as we love CKEditor 5 that we created, we also love our users who are still widely using CKEditor 4. Some of them invested many, many months to create custom plugins, adopt CKEditor to their systems and so on. So, being a responsible company, we will maintain this product still for several years. As a company that creates components, which are embed into other systems, we have to simply deliver stuff that others, including big enterprise customers, can rely on. I know it's a bit unusual in JavaScript world to maintain software for 8(!) or even more years, but this is what we do. Yes, we're a bit crazy.

This is a disguised ad for their saas offer. Seems interesting but no public pricing. Meaning it's probably very expensive and targeting big customers.

It is clear that Google Docs is using OT.Curious to know what Office365 is using for its real-time collaboration.

This is pretty interesting, would like to apply it to other kinds of editor.

These abstractions should be in the OS really.

I dont understand why this is so hard. If you take a document and enter each character into a database as a separate entry, then only lock each character as it's changed and unlock it immediately afterwards. The likelihood that all individuals will be modifying the same character at the same time is low. and while it's locked, the person trying to change it will see it's not changing - so they'll try again in a second or two by which time it will be unlocked.

Among the many other problems with your proposal, you are assuming an editor that can only insert and delete characters, so no selection, copy, paste, search and replace, and any other features that involve modifying more than just one character in a document at a time, since you're critically depending not just on the fact that all edits can theoretically be represented by single character changes but that the rate of such changes must necessarily be very, very low, bounded by human typing speed.

"OK, so to fix that, I'll just..."

That'll break in some other case. If you then work on fixing that up, in another 20 or 30 steps, you end up at these algorithms. Or something far worse, which is actually pretty likely if you try to bodge something together one hyper-local problem at a time.

No, op is assuming that an editor can only _replace_ characters.

Ok, so what happens when you insert a character?

That sounds like a horrible UX

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact