
Modernizing the DOM Tree in Microsoft Edge - nikbackm
https://blogs.windows.com/msedgedev/2017/04/19/modernizing-dom-tree-microsoft-edge/
======
c-smile
So new IE’s dom::node essentially looks as

    
    
        struct node {
          element* parent;
          node *first_child, *next, *previous;
        }
    

And that was also my initial implementation of it in Sciter Engine
([https://sciter.com](https://sciter.com)) but after some testing I’ve found
that these two structures work better:

    
    
        struct node {
          element* parent;
          uint node_index; // its index in parent->children
        }
        struct element: node {
          vector<node*> children;
        }
    

a) better suits DOM manipulation/traversal needs, b) more compact and faster,
c) more CSS friendly (for things like :nth-child(2)), e) yet the structure
ensures that there are no loops in the tree.

~~~
userbinator
_more compact and faster_

This is more of a general comment and may not necessarily be true in your
implementation, but I see this argument brought up a lot: "vectors are arrays,
therefore smaller (because only one pointer is needed per element) and faster
than linked lists". However, one must remember that vectors are not statically
sized arrays --- they dynamically size to hold their contents, and for the
resizing to be (amortised) constant-time, there will on average always be some
amount of unused space in the dynamic array, and you still need the (one)
pointer to this dynamic array in the node itself, plus additional overhead to
keep track of size/used. That's the "smaller" argument debunked. As for
faster, that's not always true either, since simple forward/backward scans
will require _two_ indirections, one to get the dynamic array's address and
another to get the pointer to the node itself. That's two memory accesses to
areas which might not be close together at all. Compare this to one
indirection (follow the next pointer) for a linked list, that is also made to
an address in the vicinity of the rest of the node.

In summary: the linked-list will be faster for forward/backward scans and
inserts/deletes and slower at indexing, while the vector will be faster for
indexing and slower for forward/backward scans. The linked list will be
smaller too.

~~~
yorwba
If no insertions or deletions happen between accesses, a vector scan can also
be done with only a single indirection, by simply incrementing the pointer to
the node. This also means better locality than a linked list, where the next
_pointer_ may be close to the current node, while the next _node_ can be
anywhere. In a vector, they are simply next to each other.

~~~
jdmichal
That's true when you have a `vector<node>`, but the structure was described as
a `vector<node*>`. So what's adjacent to your current position is the pointer
to the next node, not the node itself.

Which one is optimal depends a lot on the size of a `node`, as moving elements
around in the DOM will require memory copies in the first case.

------
yuhong
[https://news.ycombinator.com/item?id=3233935](https://news.ycombinator.com/item?id=3233935)

~~~
timtadh
I case people didn't click through it is an awesome comment by the original
author of the IE5 DOM Tree explaining how it was implemented.

------
EdSharkey
I can see now why DOM manipulation was so catastrophic for performance on
IE5-8, all that bookkeeping of character positions must have been a killer
once the DOM tree was of any significant size.

Makes me wonder. If edits to DOM nodes on legacy IE were focused on attribute
values only, and the string-lengths of those edited attributes never changed,
whether one could bypass the bookkeeping and get good performance gains.

------
aconz2
I see something like this and think about how much an ensemble approach to the
data structure would help. This keeps popping up in places where you have a
very general abstraction (like the DOM) and want to support a broad set of
operations and use cases (read-only, insert heavy, traversal heavy, etc.) and
so you often choose the structure which supports most of these cases pretty
well. But you sacrifice a lot of perf by doing so.

What I'm wondering is how well you could do perf-wise by having the DOM be
comprised of a heterogeneous set of structures, each of which is chosen based
on the history of what operations have been performed on it (ie. we're
appending a lot here, let's use a vector of children). This is all similar in
spirit and goals of:

\- JIT compiling, but for the data structures, and not code

\- This work on composing allocators in D
[https://www.youtube.com/watch?v=LIb3L4vKZ7U](https://www.youtube.com/watch?v=LIb3L4vKZ7U)

\- ML ensemble methods

------
0x0
This is interesting and something I've suspected for a long while. I remember
struggling big-time with older IE versions giving some really strange results
when doing to perform DOM node manipulations on improperly nested HTML (which
in turn often came from copy-pasting from Word), erroneous results that
definitively hinted towards DOM tree operations actually happening on flat
substrings.

------
tambourine_man
Gotta love this new Microsoft.

A blog post openly geeking out on browser data structures. So good

------
adrianratnapala
What I like about this article, even more than the technical specifics, is
that it is an account of a successful, deep, _in-flight_ refactoring effort.

That is, they avoided the two most common errors: rewriting, and piling
technical ever higher.

------
bsimpson
Almost makes me wish it worked on non-MS systems…

Kudos to the MS Edge team for pushing the Web forward!

~~~
angry-hacker
Who knows, maybe one day it will in order to fight Google's monopoly?

------
jstimpfle
Heh, just today I made this
[https://gist.github.com/jstimpfle/a4f2661f8d042d9862b9fecdd8...](https://gist.github.com/jstimpfle/a4f2661f8d042d9862b9fecdd85a7c93)
as a poor-man's PHP. It looks for made-up tags and calls with the tag
attributes as arguments, to make text substitutions (no, not a very innovative
idea).

I used a cheap regex to look for these tags instead of parsing the DOM
properly to avoid unnecessary allocations and parsing overhead. So after
reading the article, I guess I'm still in the 90s!

> TreeWriter::AppendChild(parent, child);

That's soo the right approach. OO must die. And the argument why method calls
are (typically) wrong already on a _syntactic_ level can be found in the
article.

------
JustSomeNobody
Interesting. However can we please retire the word modern already? It's as bad
as game changer.

------
ko27
Not to take anything away from Microsoft's accomplishments, but the latest
chrome canary has a speedometer score of 140 (70% faster than Edge).

~~~
najajomo
> Not to take anything away from Microsoft's accomplishments, but the latest
> chrome canary has a speedometer score of 140 (70% faster than Edge).

I don't agree with that statement and can't refute it, but I'm going to mod it
down into oblivion anyway ;)

~~~
wtetzner
I suspect it was down-voted because it didn't add anything to the
conversation.

