Hacker News new | past | comments | ask | show | jobs | submit login

Speaking as one of the original three authors of Google Docs (Writely), but zero involvement in this project (I left Google in 2010): I'm seeing a lot of comments asking how JavaScript-on-Canvas could possibly outperform the highly optimized native code built into the browser engines. It's been a long time since I've really been involved in browser coding, but having written both Writely and, farther back, several native-app word processing engines, here are some thoughts.

Word processors have extremely specific requirements for layout, rendering, and incremental updates. I'll name just two examples. First, to highlight a text selection in mixed left-to-right / right-to-left text, it's necessary to obtain extremely specific information regarding text layout; information that the DOM may not be set up to provide. Second, to smoothly update as the user is typing text, it's often desirable to "cheat" the reflow process and focus on updating just the line of text containing the insertion point. (Obviously browser engines support text selections, but they probably don't expose the underlying primitives the way a word processor would need. Similarly, they support incremental layout + rendering, but probably not specifically optimized in the precise way a word processor would need.)

Modern browser engines are amazing feats of engineering, but the feature set they provide, while enormous, is unlikely to exactly match the exacting requirements of a WYSIWYG word processor. As soon as your requirements differ even slightly from the feature set provided, you start tipping over into complex workarounds which impact performance and are hell on developer productivity and application stability / compatibility.

This is loosely analogous to CISC vs. RISC: browsers are amazing "CISCy" engines but if your use case doesn't precisely fit the expectations of the instruction set designer then you're better off with something lower-level, like Canvas and WASM. (I don't know whether Docs uses WASM but it would seem like a good fit for this Canvas project.)

Frameworks in general suffer from this problem. If you've ever had to fight with an app framework, or orchestration framework, or whatever sort of framework to accomplish something 5% outside of what the framework is set up to support, then you understand the concept.

Also, as noted in many comments here, browser engines have to solve a much more general problem than Docs, and thus have extra overhead.

I'd like to chime in here as someone who has worked on optimizing the execution of your code :) Google docs specifically was one of the subjects of a particular performance push when I was working on Spidermnonkey within Firefox, and I got to see how it behaves under the hood pretty well.

The thing that stands out to me the most was the giant sparse array (a regular js-native array) being used to store layout information, presumably. It really messed with our internals because spidermonkey didn't expect those to be used in fastpaths, and it was really lazy about trying to optimize for them.

Anecdotes aside.. I wanted to endorse your entire comment :) I remember thinking to myself how terrible it was to have to piggyback a document layout engine on top of HTML layout and these awful JS abstractions, and how much better and more performant it would be to do a proper layout engine - either in JS or compile-to-wasm, and have it run its own rendering logic.

In particular for large documents where you were making changes to early parts of the document, a single keystroke could invoke this _cascade_ of sparse array fetches and mutations and DOM rearrangements and all sorts of fireworks.

This is why I love HN. :-)

However, I can't claim credit (or blame, but I would argue mostly credit) for that code. There have been three generations of the Docs editor that I know of:

1. The original, which I was involved in, was an unholy mess perched shakily atop contenteditable. As such, it contained no layout or rendering code (but did all sorts of horrid things under the hood to massage the HTML created by the various browser contenteditable engines and thus work around various problems, notably compatibility issues when users on different browsers are editing the same document). Originally launched in 2005.

2. In the early 2010s, an offshoot of the Google Sheets team launched a complete rewrite of the Docs engine which did its own editing, layout, and rendering based using low-level DOM manipulation. This was more robust, supported layout features not available in contenteditable (e.g. pagination), and generally was a much better platform. My primary contribution to this effort was to incorrectly suggest that it was unlikely to pan out. (I was worried that the primitives available via the DOM would be insufficient; for instance, to deal with mixed-directional text.)

3. This canvas-based engine, which I learned about a few hours ago when this post popped up on HN.

I don't know whether #3 is an evolution of #2 or a complete rewrite; for all I know there was another generation in between. But I imagine you were looking at some iteration of #2.

You're right. This was a few years ago, so well after 2010.

And yes, I'd say credit as well for the layout code, not blame. I wasn't knocking the code - for that era sparse arrays + DOM stuff were pretty common approaches and there didn't exist better web tooling than that.

It's only been the last few years I'd say where the optimization quality (on the engine side) and API support has been good enough to justify this sort of approach ofjust plumbing your own graphics pipeline on top of the web.

That was a spidermonkey issue. I treat that experience more as a lesson in how obscure corner cases left as perf cliffs never stay obscure corner cases, and always get exercised, and you can't afford to ignore them for too long.

With a canvas-based engine, the editor is no longer relying on the contenteditable spec right?

For the majority of use cases, do you think contenteditable + view layer which precisely updates the HTML is still viable? More specifically, what do you think about open-source libraries like ProseMirror (https://prosemirror.net/) or Slate.js (https://github.com/ianstormtaylor/slate) which do that (ProseMirror uses its own view library on vanilla javascript, Slate uses React)?

I understand if you have really long documents or spreadsheets (I imagine latter is more frequent), you could maybe solve performance rendering problems with virtualization, which canvas gives more flexibility to?

> With a canvas-based engine, the editor is no longer relying on the contenteditable spec right?

Correct. In fact, contenteditable went out the window a decade ago when the "#2" engine (low-level DOM manipulation) was launched.

My experience with contenteditable is ~12 years stale at this point, so the only thing I'll try to say is that I expect it would work well up to a certain level of ambition, and no further. As I say above regarding frameworks: they're great so long as your requirements fit within the expectations of the framework, but you quickly hit a wall if you need to stray outside of that. For Docs, the desire for a paginated view/edit mode was an example; there was simply no sane way of squeezing pagination into a contenteditable-based engine.

My experience with modern contenteditable suggests that it does work pretty well, overall, though I've not been using it for something as layout-heavy as Docs -- I've worked on the VisualEditor for mediawiki, which has different requirements.

A canvas-based document editor with any sort of international ambitions has a fairly high bar to clear for reimplementing basic features. The browsers really do handle a lot of useful things for you in contenteditable, like the upthread-mentioned RTL issues, and complex IME input methods.

If you have a lot of HTML-rendering inherently required, strong internationalization requirements, and no need for something like page-based layout... contenteditable has advantages, particularly when comparing the up-front work required.

The sparse array is likely to be a protobuf. I ran into this issue with Firefox when working on Google Inbox, it was one of the reasons why the Firefox version was delayed, but there was degenerate performance with sparse arrays. (I'll note, various conspiracy theories on HN thought it was a deliberate attempt to hamper FF, when in reality, is was an unintended consequence of usage of an old protobuf format which never caused a problem until protobufs with huge extension fields were used in a specific way in the codebase, so the problem was discovered late)

protobufs can be stored in array format. In that format, each field number is basically it's index in the array. Extension fields in protobufs typically grab high numbered slots. So if you have a prototype with 1 field (id = 1), and one extension field (e.g. id = 10000000), you now have an array with [undefined, stuff, ... 999999 ..., stuff] and various array operators seem to reify this into a real array in older versions.

> various conspiracy theories on HN thought it was a deliberate attempt to hamper FF

I remember those being fairly rampant.

I wonder if a technical blog post about the issue would have silenced some of the conspiracy theories.

Regardless, there's a lesson in there somewhere. Never attribute to malice that which is adequately explained by degenerate performance of a browser pushed to its limits?

Yeah, it's primary the fault of overzealous pushing of limits. At the time we were using WebWorkers/SharedWorkers, bleeding edge CSS "compositor" behavior to achieve smooth 30-60fps animations, and lots of other stuff I don't remember. It was very easy to get off the 'golden path' of performance. Small changes in CSS or DOM structure for example would destroy layout/paint performance and require days of debugging.

Add on top of that, that Inbox was developed using a shared codebase for 3 platforms (Web, Android, iOS), the non-UI code was written in Java, while the UI code was written in JS, Java, and Objective-C respectively.

The shared "business logic" layer was cross compiled, and it was the protobuf runtime for GWT inside that was causing trouble. We "optimized" it by making it run on top of pure arrays instead of OO objects. This was a feature of GWT called 'JSO's (JavaScriptObjects) that let you pretend that raw JS objects had methods on them that they didn't, like Extension Methods in other languages.

All was good until IIRC, a utility function was introduced that did Object.keys(some protobuf array). This returns a sparse array on V8, but a reified real dense array on SpiderMonkey at that time, and so if you were unlucky enough to have a high extension field in you protobuf, you'd end up creating an array with a billion entries in it.

It was hard to forsee this because Inbox was built out of so many interacting systems. Ideally, the GWT Protobuf Compiler runtime would have had integration tests for Firefox that exercised iteration over sparse arrays with high extension number methods, but it didn't, which means the problem languished until discovered in Inbox. GWT Protobuf was probably a 20% project of someone at the time, implementing the minimal features they needed.

Also, debugging it was a nightmare, because as soon as Object.keys(big sparse array) was encountered, the Firefox debugger would essentially freeze/die, and we couldn't get iinformation out. Single-stepping through a huge ginormous bit of code after bisecting was how I tracked it down, because when I tried to console.println(object.keys(big sparse array)) it would die.

I'm not blaming Firefox, I'm not sure the JS specification even says what the right thing to do with things like Object.keys(sparse array), maybe it was unspecified/vague behavior? I'm just pointing out that there was absolutely no malice, and no desire to block Inbox from running on FF, or IE10 or WebKit for that matter. It's always basically a matter of launch schedules, late discovered bugs, and triage.

That's fun to know :)

Spidermonkey's diciontary object representation leaves a lot for improvement. The issue you cite here isn't specifically related (sounds like it could have been fixed with a one or two line change) but I can describe one of my (still standing) pet peeves about the implementation of objects in spidermonkey:

Dictionary objects are what we call objects that have fallen off the happy path of tracked property-names, and become degenerate associative maps from keys to values. They use a representation where the key-mapping for the object's bound names is kept in a linked entry hashtable (a hashtable where the entries form a doubly linked list) structure that hangs off of the hidden type of the object. Every lookup for a property (including array indexes) involved first pulling this hashtable out, then looking up the property on the hashtable, to obtain a shape, which gives the _offset of the property on the original object_, and then using that offset to look up the value on the original object.

All said and done, there were about half a dozen to a dozen distinct heap accesses, and pollution of about 6-7 cache lines, just to retrieve a single property on an object that had gone into dictionary mode (which is what sparse arrays would become).

Fixing the object representation was on my long-term todo-list for a while. It is a very time-consuming task because all the JITs and other optimization layers were aware of it, so any changes to it would involve adjusting a ton of other code.

> I'm not blaming Firefox, I'm not sure the JS specification even says what the right thing to do with things like Object.keys(sparse array), maybe it was unspecified/vague behavior? I'm just pointing out that there was absolutely no malice, and no desire to block Inbox from running on FF, or IE10 or WebKit for that matter. It's always basically a matter of launch schedules, late discovered bugs, and triage.

One thing you learn working on any sort of a public facing project a lot of people use is that people, especially the most emotionally invested people, will assign motivations to you personally that have no external reference points except their interpretation of events.

I've encountered that working at Mozilla, but thankfully largely been sheltered from direct consequences. You've arguably worked on even more public projects.

There's no need to pollute your commentary with defences that aren't owed.

So I've experienced Docs getting, hmm, sad once the doc you're editing gets beyond something like 30-50 pages.

Does this change mean that I can look forward to being able to write hundreds or thousands of pages in a Google Doc without it getting periodically non-performant?

I do a lot of game design documents that use images heavily, and it chugs hard around ~20 pages with ~30 images total on them. Macbook 2019, i9

My M1 does not have performance issues like that with docs 3x the size. Breeze right through even with 20+ other tabs open.

Yes but most of us aren't living in the future yet. We're stuck with Intel CPU's for the time being.

A majority of users don't have an M1 processor though.

I sat by Steve when Writely joined Google. [Hi Steve!] I sat by the Google Page Creator team when they were a thing. Regardless of performance, it's a miracle that a WYSIWYG editor can be written on top of the DOM at all, let alone a performant one. They had to work multiple miracles a day just to get bulleted lists to work somewhat reliably.

I have no doubt whatsoever that a Canvas-based editor can be faster and easier to maintain. I don't know how well it'll handle accessibility issues, though. I expect they'll have to do a lot of tedious work to get screen readers and the like to be happy.

I have nothing to add to the discussion, other than I’ve been using Writely since ~2005-2006 and wanted to say thanks for all the fish!

It was super handy before I had a laptop for regular use. I used it at public libraries for projects in my last year of high school. It helped me develop a habit of having a third-space workplace that was away from home and school.


The "floating workspace" aspect has always driven at least as much usage as the "collaboration" aspect. That came as a complete surprise to us, but it turned out to be very important to adoption. At some point I think we determined that the average document had something like 1.1 collaborators.

The name Writely reminds me of a side project I worked on around 2010. I was not satisfied with the performance of Google Docs and its competitors at the time like EditGrid and thought (naively, as it turned out) I'd be able to develop a faster alternative.

I had no idea what I was doing and thinking that using JavaScript to manipulate the DOM was going to be slow, I chose ActionScript and Flash as the language and runtime to develop the project in. I wrote a client-side expression parser and formula engine, and managed to develop a functioning spreadsheet UI with resizable rows and columns, copy and paste with Excel-like animations, cell references etc.

The problem that I ran into was text-rendering when there was a lot of text on the screen. The application would consume a lot of memory and the page would slow down to a crawl when scrolling. I couldn't really find a way to speed up the performance and stopped working on the application after some time. That's when I realized the incredible amount of work that went into Google Docs and other web-based spreadsheets. :)

Do you mean Google Sheets instead of Google Docs?

You are correct. I meant Google Sheets instead of Google Docs. I thought I'd tackle the word processor part once I got the spreadsheet to a usable state. I do think Google Docs suite is used as an umbrella term to refer to all the Google collaboration tools like Docs, Sheets, Forms etc.

How would you recommend someone to get started learning about the architecture of Text editors / word processors?

I think Monaco from vscode is probably an interesting read but I’ve never looked at such a big open source code base before.

Is there something you can recommend to understand better how it works architecturally?

There is no substitute to building one yourself, but The Craft of Text Editing book has a lot of accumulated wisdom. It is Emacs-centric, but basics are same.


One of my first programming projects as a teenager back in terminal-type days was to write my own text editor for the Atari ST. I was super happy with it, and sold three (3) copies of it! That made me very happy at the time.

Of course there was that time that I messed with the save/load code and destroyed the text files of one of my customers. Not so happy with that! Saved it by writing a fix system, and that actually led to being hired at that guy's company for my first "real" job. ;)

Former ST user here: out of curiosity, what was that text editor and company?

Ahahaha! You said company. ;) It was just me, a teenage kid, selling to people who came through the store I worked at selling computers (the store sold the Atari line).

I called it DEdit, because every programmer wants to grab a single letter title.

I'd love to have a good answer for you, but I learned the basics all the way back in the '80s. Seems like I've seen references posted occasionally on HN, hopefully someone has a good link.

There were a few really interesting blog posts about the architecture choices for the xi editor a few years ago, ending with this


But I personally never worked on this kind of problem, I just remember reading these over the years

Here’s a blog post detailing some vscode internals: https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...

I recommend reading about the internals of CodeMirror and ProseMirror: e.g. https://codemirror.net/1/story.html.

I don't think there's any way better than build a text editor from scratch, you have to understand the exact problem before reading other people's solutions.

Might this be helpful? I recall seeing this on HN awhile back.

A good open-source example of this type of problem is CodeMirror (a code-editing widget for the web). To achieve syntax highlighting and everything else, it basically fakes every single aspect of text editing in a browser context - even the cursor and text highlighting - replacing the native versions with pixel-placed DOM elements driven by JS. It receives raw events from an invisible textarea and does everything else itself.

This is just about the worst possible use-case for the DOM: you get almost none of the benefits, and still get most of the costs.

Edit: To be clear, I'm not saying this to rag on CodeMirror. It's a marvel that they got it working as well as it does, and it's been around for a long time- possibly longer than the canvas API has been widely available. It's just that doing things this way requires a pile of hacks and I can see a strong argument for just cutting out the middle-man and doing 100% custom rendering.

VS Code and the likes do the same, right? I'm pretty sure this all will soon be optimized, if not already are.

> If you've ever had to fight with an app framework

... it usually means it’s the wrong tool for the job.

I have to ask, why not a native app? Once you start bypassing every browser feature anyway, what’s the point of using a browser?

Distribution. It is so, so, so much harder to get most users to install an app, especially for casual purposes ("please add your notes to this doc") which is often how people first come to Docs.

That's a common belief but there are a lot of exceptions that mean we should question how true this really is.

1. Mobile-first, mobile-only apps.

2. Minecraft or really any game.

3. The proliferation of Electron apps that are basically downloadable versions of the website.

4. Apple's own suite of apps. Keynote is pretty darn popular.

In the case of a user who is really, really unmotivated to comment on a doc, sure. Then every click, every second counts because the user doesn't really have a fixed need to complete the task to begin with. For most other things, users are willing to download apps and may even prefer it.

It's also worth considering that Writely/Docs never really supplanted Word and is still rather feature poor even after a decade of continuous development, perhaps because they keep having to rewrite the rendering engine. If Docs was a downloadable app with a simple web-side static renderer + commenting engine, it might have obtained features that could offset any loss of casual users due to needing a download to collaborate. Especially if the download was fast, tight and transparent.

On the other hand, allowing for a native desktop app could have caused it to end up in the same state as Microsoft's apps on the web (e.g. the web version of PowerPoint is pretty awful), which would have undermined a key differentiator between Docs and Microsoft's suite.

I am not sure what has held gSuite back all these years, but the pandemic seems to have brought them out of their slumber.

I think a major advantages for me are easy bookmark-ability and sharing via URLs to people who may not have the app. You can bodge workarounds for those things in an offline app, but now the bookmarks aren't in my browser or I have to copy them to it, or the people I share the doc to are just looking at a browser rendered viewer or something.

While Docs hits 95% of my needs there's still that 5% and I suspect most of those are held back by the current implementation architecture. Hopefully moving to a Canvas based system will enable them to more easily add complex features.

What I don't understand is why google doesn't just do what they do all the time and add 20 new APIs for it and strongarm everyone else into having to implement them too

Out of curiosity, what have you moved on to now that you don't do browser coding?

After leaving Google I founded scalyr.com. To get an idea of the type of work we're doing, this old blog post holds up: https://www.scalyr.com/blog/searching-1tb-sec-systems-engine....

Kids these days usually don't understand exactly how fast hardware actually is. ;)

"I have 200 million entries in a table I need to compress. I'm gonna write a flume job! I estimate it will take 5 minutes to start the job and a half hour to run! Then I'll spend a few days figuring out how to shard it so it actually finishes."

"Sounds good. But I also have this bit of Java code here that does the same compression on my desktop in about 30 seconds. Would you like that instead? You could convert it to C++ if that would make you feel better."

yellowbrick.com did some neat stuff pushing the query algebra down into the flash storage firmware.

Does the challenge of word processors extend to these chromium based IDEs like VsCode ? I wonder if that can also be optimized if going to a non DOM based approach?

They are already doing it for some features, https://code.visualstudio.com/blogs/2017/10/03/terminal-rend...

I assume they have used CSS to apply the styles on each text span, and now they don't?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact