Word processors have extremely specific requirements for layout, rendering, and incremental updates. I'll name just two examples. First, to highlight a text selection in mixed left-to-right / right-to-left text, it's necessary to obtain extremely specific information regarding text layout; information that the DOM may not be set up to provide. Second, to smoothly update as the user is typing text, it's often desirable to "cheat" the reflow process and focus on updating just the line of text containing the insertion point. (Obviously browser engines support text selections, but they probably don't expose the underlying primitives the way a word processor would need. Similarly, they support incremental layout + rendering, but probably not specifically optimized in the precise way a word processor would need.)
Modern browser engines are amazing feats of engineering, but the feature set they provide, while enormous, is unlikely to exactly match the exacting requirements of a WYSIWYG word processor. As soon as your requirements differ even slightly from the feature set provided, you start tipping over into complex workarounds which impact performance and are hell on developer productivity and application stability / compatibility.
This is loosely analogous to CISC vs. RISC: browsers are amazing "CISCy" engines but if your use case doesn't precisely fit the expectations of the instruction set designer then you're better off with something lower-level, like Canvas and WASM. (I don't know whether Docs uses WASM but it would seem like a good fit for this Canvas project.)
Frameworks in general suffer from this problem. If you've ever had to fight with an app framework, or orchestration framework, or whatever sort of framework to accomplish something 5% outside of what the framework is set up to support, then you understand the concept.
Also, as noted in many comments here, browser engines have to solve a much more general problem than Docs, and thus have extra overhead.
The thing that stands out to me the most was the giant sparse array (a regular js-native array) being used to store layout information, presumably. It really messed with our internals because spidermonkey didn't expect those to be used in fastpaths, and it was really lazy about trying to optimize for them.
Anecdotes aside.. I wanted to endorse your entire comment :) I remember thinking to myself how terrible it was to have to piggyback a document layout engine on top of HTML layout and these awful JS abstractions, and how much better and more performant it would be to do a proper layout engine - either in JS or compile-to-wasm, and have it run its own rendering logic.
In particular for large documents where you were making changes to early parts of the document, a single keystroke could invoke this _cascade_ of sparse array fetches and mutations and DOM rearrangements and all sorts of fireworks.
However, I can't claim credit (or blame, but I would argue mostly credit) for that code. There have been three generations of the Docs editor that I know of:
1. The original, which I was involved in, was an unholy mess perched shakily atop contenteditable. As such, it contained no layout or rendering code (but did all sorts of horrid things under the hood to massage the HTML created by the various browser contenteditable engines and thus work around various problems, notably compatibility issues when users on different browsers are editing the same document). Originally launched in 2005.
2. In the early 2010s, an offshoot of the Google Sheets team launched a complete rewrite of the Docs engine which did its own editing, layout, and rendering based using low-level DOM manipulation. This was more robust, supported layout features not available in contenteditable (e.g. pagination), and generally was a much better platform. My primary contribution to this effort was to incorrectly suggest that it was unlikely to pan out. (I was worried that the primitives available via the DOM would be insufficient; for instance, to deal with mixed-directional text.)
3. This canvas-based engine, which I learned about a few hours ago when this post popped up on HN.
I don't know whether #3 is an evolution of #2 or a complete rewrite; for all I know there was another generation in between. But I imagine you were looking at some iteration of #2.
And yes, I'd say credit as well for the layout code, not blame. I wasn't knocking the code - for that era sparse arrays + DOM stuff were pretty common approaches and there didn't exist better web tooling than that.
It's only been the last few years I'd say where the optimization quality (on the engine side) and API support has been good enough to justify this sort of approach ofjust plumbing your own graphics pipeline on top of the web.
That was a spidermonkey issue. I treat that experience more as a lesson in how obscure corner cases left as perf cliffs never stay obscure corner cases, and always get exercised, and you can't afford to ignore them for too long.
For the majority of use cases, do you think contenteditable + view layer which precisely updates the HTML is still viable?
I understand if you have really long documents or spreadsheets (I imagine latter is more frequent), you could maybe solve performance rendering problems with virtualization, which canvas gives more flexibility to?
Correct. In fact, contenteditable went out the window a decade ago when the "#2" engine (low-level DOM manipulation) was launched.
My experience with contenteditable is ~12 years stale at this point, so the only thing I'll try to say is that I expect it would work well up to a certain level of ambition, and no further. As I say above regarding frameworks: they're great so long as your requirements fit within the expectations of the framework, but you quickly hit a wall if you need to stray outside of that. For Docs, the desire for a paginated view/edit mode was an example; there was simply no sane way of squeezing pagination into a contenteditable-based engine.
A canvas-based document editor with any sort of international ambitions has a fairly high bar to clear for reimplementing basic features. The browsers really do handle a lot of useful things for you in contenteditable, like the upthread-mentioned RTL issues, and complex IME input methods.
If you have a lot of HTML-rendering inherently required, strong internationalization requirements, and no need for something like page-based layout... contenteditable has advantages, particularly when comparing the up-front work required.
protobufs can be stored in array format. In that format, each field number is basically it's index in the array. Extension fields in protobufs typically grab high numbered slots. So if you have a prototype with 1 field (id = 1), and one extension field (e.g. id = 10000000), you now have an array with [undefined, stuff, ... 999999 ..., stuff] and various array operators seem to reify this into a real array in older versions.
I remember those being fairly rampant.
I wonder if a technical blog post about the issue would have silenced some of the conspiracy theories.
Regardless, there's a lesson in there somewhere. Never attribute to malice that which is adequately explained by degenerate performance of a browser pushed to its limits?
Add on top of that, that Inbox was developed using a shared codebase for 3 platforms (Web, Android, iOS), the non-UI code was written in Java, while the UI code was written in JS, Java, and Objective-C respectively.
All was good until IIRC, a utility function was introduced that did Object.keys(some protobuf array). This returns a sparse array on V8, but a reified real dense array on SpiderMonkey at that time, and so if you were unlucky enough to have a high extension field in you protobuf, you'd end up creating an array with a billion entries in it.
It was hard to forsee this because Inbox was built out of so many interacting systems. Ideally, the GWT Protobuf Compiler runtime would have had integration tests for Firefox that exercised iteration over sparse arrays with high extension number methods, but it didn't, which means the problem languished until discovered in Inbox. GWT Protobuf was probably a 20% project of someone at the time, implementing the minimal features they needed.
Also, debugging it was a nightmare, because as soon as Object.keys(big sparse array) was encountered, the Firefox debugger would essentially freeze/die, and we couldn't get iinformation out. Single-stepping through a huge ginormous bit of code after bisecting was how I tracked it down, because when I tried to console.println(object.keys(big sparse array)) it would die.
I'm not blaming Firefox, I'm not sure the JS specification even says what the right thing to do with things like Object.keys(sparse array), maybe it was unspecified/vague behavior? I'm just pointing out that there was absolutely no malice, and no desire to block Inbox from running on FF, or IE10 or WebKit for that matter. It's always basically a matter of launch schedules, late discovered bugs, and triage.
Spidermonkey's diciontary object representation leaves a lot for improvement. The issue you cite here isn't specifically related (sounds like it could have been fixed with a one or two line change) but I can describe one of my (still standing) pet peeves about the implementation of objects in spidermonkey:
Dictionary objects are what we call objects that have fallen off the happy path of tracked property-names, and become degenerate associative maps from keys to values. They use a representation where the key-mapping for the object's bound names is kept in a linked entry hashtable (a hashtable where the entries form a doubly linked list) structure that hangs off of the hidden type of the object. Every lookup for a property (including array indexes) involved first pulling this hashtable out, then looking up the property on the hashtable, to obtain a shape, which gives the _offset of the property on the original object_, and then using that offset to look up the value on the original object.
All said and done, there were about half a dozen to a dozen distinct heap accesses, and pollution of about 6-7 cache lines, just to retrieve a single property on an object that had gone into dictionary mode (which is what sparse arrays would become).
Fixing the object representation was on my long-term todo-list for a while. It is a very time-consuming task because all the JITs and other optimization layers were aware of it, so any changes to it would involve adjusting a ton of other code.
> I'm not blaming Firefox, I'm not sure the JS specification even says what the right thing to do with things like Object.keys(sparse array), maybe it was unspecified/vague behavior? I'm just pointing out that there was absolutely no malice, and no desire to block Inbox from running on FF, or IE10 or WebKit for that matter. It's always basically a matter of launch schedules, late discovered bugs, and triage.
One thing you learn working on any sort of a public facing project a lot of people use is that people, especially the most emotionally invested people, will assign motivations to you personally that have no external reference points except their interpretation of events.
I've encountered that working at Mozilla, but thankfully largely been sheltered from direct consequences. You've arguably worked on even more public projects.
There's no need to pollute your commentary with defences that aren't owed.
Does this change mean that I can look forward to being able to write hundreds or thousands of pages in a Google Doc without it getting periodically non-performant?
I have no doubt whatsoever that a Canvas-based editor can be faster and easier to maintain. I don't know how well it'll handle accessibility issues, though. I expect they'll have to do a lot of tedious work to get screen readers and the like to be happy.
It was super handy before I had a laptop for regular use. I used it at public libraries for projects in my last year of high school. It helped me develop a habit of having a third-space workplace that was away from home and school.
The "floating workspace" aspect has always driven at least as much usage as the "collaboration" aspect. That came as a complete surprise to us, but it turned out to be very important to adoption. At some point I think we determined that the average document had something like 1.1 collaborators.
The problem that I ran into was text-rendering when there was a lot of text on the screen. The application would consume a lot of memory and the page would slow down to a crawl when scrolling. I couldn't really find a way to speed up the performance and stopped working on the application after some time. That's when I realized the incredible amount of work that went into Google Docs and other web-based spreadsheets. :)
I think Monaco from vscode is probably an interesting read but I’ve never looked at such a big open source code base before.
Is there something you can recommend to understand better how it works architecturally?
Of course there was that time that I messed with the save/load code and destroyed the text files of one of my customers. Not so happy with that! Saved it by writing a fix system, and that actually led to being hired at that guy's company for my first "real" job. ;)
I called it DEdit, because every programmer wants to grab a single letter title.
But I personally never worked on this kind of problem, I just remember reading these over the years
This is just about the worst possible use-case for the DOM: you get almost none of the benefits, and still get most of the costs.
... it usually means it’s the wrong tool for the job.
I have to ask, why not a native app? Once you start bypassing every browser feature anyway, what’s the point of using a browser?
1. Mobile-first, mobile-only apps.
2. Minecraft or really any game.
3. The proliferation of Electron apps that are basically downloadable versions of the website.
4. Apple's own suite of apps. Keynote is pretty darn popular.
In the case of a user who is really, really unmotivated to comment on a doc, sure. Then every click, every second counts because the user doesn't really have a fixed need to complete the task to begin with. For most other things, users are willing to download apps and may even prefer it.
It's also worth considering that Writely/Docs never really supplanted Word and is still rather feature poor even after a decade of continuous development, perhaps because they keep having to rewrite the rendering engine. If Docs was a downloadable app with a simple web-side static renderer + commenting engine, it might have obtained features that could offset any loss of casual users due to needing a download to collaborate. Especially if the download was fast, tight and transparent.
I am not sure what has held gSuite back all these years, but the pandemic seems to have brought them out of their slumber.
While Docs hits 95% of my needs there's still that 5% and I suspect most of those are held back by the current implementation architecture. Hopefully moving to a Canvas based system will enable them to more easily add complex features.
"I have 200 million entries in a table I need to compress. I'm gonna write a flume job! I estimate it will take 5 minutes to start the job and a half hour to run! Then I'll spend a few days figuring out how to shard it so it actually finishes."
"Sounds good. But I also have this bit of Java code here that does the same compression on my desktop in about 30 seconds. Would you like that instead? You could convert it to C++ if that would make you feel better."
yellowbrick.com did some neat stuff pushing the query algebra down into the flash storage firmware.