We at ClinchPad, which also uses a card based layout similar to Trello, faced the same issues. For every 100 cards, it'd add another 1-2 seconds to the load time meaning when you got to about 700 cards it would be close to 10 seconds of load time.
Here's how I fixed it
1. I added pagination - Load only 300 cards, paginate to load more. We found that while some people had 300+ cards in their default view, most never actually utilised the view usually using a filter to bring down the no of cards to a more manageable level.
That still left about 3-6 seconds of page load time for 300 cards. Unacceptable. So these where the further steps I took to fix it.
2. The main culprit after some rudimentary profiling seemed to be the jQuery's .html('') tag. Apparently it does some cleanup which takes awhile on a huge DOM block with lots of attached events. Replaced it with by looping through each child node in the DOM block and removing them with removeChild. Achieved 8x speedup, loading time down to 1-2 seconds with 300 cards.
3. Second culprit was the jQuery UI's droppable and draggable initiations was taking awhile. Hacked around this by putting the jQuery UI's droppable and draggable initiations inside a setTimeout thats fired in 100ms. Of course, this doesn't affect the actual rendering time but the perceived rendering time is now <1sec because the pipeline loads instantly and it is usually 1-2 seconds before the user does an action on the UI.
All in all, just less than <20 lines of code changed but took me two full days to figure it out. At one point, I was looking through jQuery UI's droppable code and futility seeing how I could optimize it. :)
> The main culprit after some rudimentary profiling seemed to be the jQuery's .html('') tag. Apparently it does some cleanup which takes awhile on a huge DOM block with lots of attached events. Replaced it with by looping through each child node in the DOM block and removing them with removeChild.
It's important to let jQuery clean up its event handlers, otherwise you'll end up with memory leaks.
If you have a list of thousands of items, attaching event handlers to each of those items separately is relatively expensive, although it makes the view code nicer. If you are as serious about performance as these people, you would want to add delegated events on the parent level. There is a writeup about doing this in Backbone  although I can't vouch for it.
In fairness, getting really good dom performance out of Backbone is hard because of the lack of built-in delegated event support and the non-synchronized redraws. If you have big apps that need to be super-fast, either read up on dom performance practices and adopt some helper libraries along with Backbone, or try a framework that has its own draw cycle and delegated event support.
Well, Backbone handles delegation for elements within the same view. But it doesn't handle the case where you have lots of instances of the same view class in a list and want each handler registered only once for the whole list. You can do this manually by checking out the event target and maintaining a mapping from dom nodes to views, which you ought to do if you have lists of hundreds or thousands of repeated subviews. But Ember and various Backbone plugins can do this for you automatically.
> it doesn't handle the case where you have lots of instances of the same view
Wait I'm getting confused. Didn't the title post and your linked article both say not to do this if performance is the goal? (See "Tuesday" in OP). Isn't that what we're talking about here? Sorry if I misunderstood.
Yeah, I think this is just a terminology problem. Let's just lay out the complete answer: if you have a list of 1000 things to render in the same way, you can make it faster in various ways:
1. Don't render all of them at once, use pagination or infinite scroll.
2. Don't attach dom events to each item individually, instead attaching one event listener to the list container and having a way to go from the event's target to the appropriate view to act on.
3. Don't even create a view object for each item, just render them with the same template and have a composite view that knows how to act on each item's model for the delegated events.
That article advocates instantiating a View per model object in direct opposition to the delegated approach. It acknowledges that there may be performance implications on large collections, just as the title post found. Are you sure that is the article you intended to reference?
Yup, it's the DOM stuff and not the js initialization that is the bottleneck. This is why Ember and React can be faster: you still have the same number or more js objects, but the automatic event delegation and managed draw cycle minimize dom changes.
This was going to be my comment as well.. better to detach the elements, and put them into a queue to remove completely in a timer loop... this way cleanup happens out of band.
I worked on a rather large extjs app, and IE was particularly bad about cleaning up dom objects with events attached, leading to memory leaks so bad that IE <= 8 would need to be reloaded a couple times a day. Firefox at the time was much better, and chrome was really new (so was IE8 at the time).
jQuery is good about keeping track of events and properly cleaning up, but it is expensive... though it's easy enough to .find('> *') and detach them, pushing those items into a cleanup queue.
For #2, attached events in the child nodes may still be bound. This may lead to more memory usage than needed. Of course, if you don't have any bound events in the children, it'll work. Maybe a work around for any child events would be to delegate them on the parent container and just remove that one when you clear everything.
+1 where it makes sense.. however this doesn't work well with a modular application, and could lead to a lot of higher level handlers/listeners.
I think the more expensive cost is cleanup, or lack of proper cleanup leading to memory leaks... I'll usually detach nodes to cleanup, and put them into a queue that's run in a setTimeout loop... that way they don't slow down the UI.
Chrome Dev Tools told him where the time was being spent, but the hunch still had to be there. He still had to know about a previous optimization that had worked, and the change that ended up working began with the words "I wondered…"
Also note that he says "Perceived rendering went down to 960ms". So this is largely done to alter perceptions anyway, not necessarily total throughput or however you'd like to phrase it.
The "translateZ: 0" description is a bit misleading -- I wish he'd provided numbers for the improvement. In general using composited layers is more expensive (since the CPU still does rendering of the image, must upload it to texture, etc).
It might be a win if the thing you apply it to:
1. Never changes, but the content around it changes often.
2. Is hard to render (lots of shadows, etc).
The layout and paint thrashing is a really good optimization though. You should be able to insert as many things into the DOM as you like without triggering a layout SO long as you don't read back (like consulting offsetLeft). I think the Chrome inspector will mark read backs with a little exclamation point in the timeline with a tooltip "synchronous layout forced" and a backtrace to your JS...
No, translateZ just makes it a composited layer. Hardware comes much later in the pipeline and possibly in another process.
The content of the layer isn't hardware rendered. It's rendered by the CPU and uploaded to a texture. In WebKit and probably Blink there's a fast path for images, canvas and video so that they can be directly uploaded or (on some platforms like Mac) bound to a texture avoiding an upload copy.
Microsoft and (maybe) Mozilla have a "hardware rendering" path via Direct2D, but Chrome and WebKit don't, they have compositors which can use the graphics hardware to perform compositing, but not rendering.
I presume you mean the "drawsAsynchronously" property. I'm extremely curious, does it really push the rasterization on the GPU? I mean, do you have shaders written, that do all the stuff that CPU normally does? Bezier paths, clipping, stroking, filling?
The translate z trick does not work in general. It works right now in Chrome (and probably Safari). It does not work in Firefox, it may not work in Chrome in the future. (Because you are trying to trick the browser by gaming it's heuristics, those heuristics might change).
That hardware rendering is smoother is also not true in general, just some cases, which the browser will try to guess for you.
Angular doesn't do this (batch DOM updates) - but because Angular's $digest cycle is a batch, it's fairly close (or at least, less bad than Backbone's default behaviour).
Dirty checking and watch execution happens in batches, but the resulting DOM updates are executed ad-hoc within each batch iteration - without any regard for requestAnimationFrame or forced synchronous layouts for instance.
We've had good success with the "queued rendering with interrupts" strategy as well. The 5.9s to 960ms drop is _slightly_ misleading, since a lot of the rendering has yet to be done, but as long as one remembers they're measuring "perceived" rendering time I'm in full agreement.
Other than allowing the browser to paint in the middle, I'd say it's equally (if not more) important that the _.defer calls allow user events to interleave rendering, so you get a bit of scrolling, clicking, hovering, etc. Not doing so is akin to running an intensive operation in the UI thread (for those coming from Swing or Android), and you get a frozen browser page instead.
The one caveat we've seen, though, is your code gets more complicated due to the async rendering. For us the async render was just a subcall in a larger render method, and some later calls relied on the async rendering being complete for some measurement purposes. We had to move those calls to a callback after the queued rendering was done, but ideally only wanted SOME of it to be deferred (some click handlers, etc, we wanted set up earlier so the user could interact with the page), but in a larger codebase you get into a refactoring nightmare, etc etc.
All being said, though, it was probably worth it. :)
Isn't the way React handles the DOM perfectly to prevent layout-trashing?
As far as I understand, the DOM in React is nearly write-only, so layout-trashing should never occur, but I don't know the exact implementation details.
Please correct me if this is not correct, I couldn't find any hard information on this.
This is correct. Unless you manually add code to touch the DOM in your component methods (which is rarely necessary), React only touches the DOM when it needs to mutate it and more or less does not read from it. For example, when mutating several parts of the DOM, React batches the element creation into a single innerHTML call because that's faster than converting many individual HTML strings into DOM nodes.
> As far as I understand, the DOM in React is nearly write-only
You can still read it if you need. Theoretically, major layout invalidation should only be done in bulk during reconciliation, but I don't know when the rendering phase applies, and React also has component-local state.
I know Om only does rendering on requestAnimationFrame (so at once and synchronised with RAF) but that seems to be done by Om itself and I can't find any clear documentation on that part for React.
 although it may not be up to date if you're between a state change and a rendering
React is designed so that you should never need to read from the DOM except things like an input field's value. React specifically tries to never touch the DOM except when doing necessary mutations.
If you're doing complicated layout code, then you will need to read from the DOM. Currently React doesn't have great support for managing layout from JS in a clean, efficient way but it's one of the things we're looking to add in the future.
And yet there is no real reason why render time should scale at all with content hidden below the fold, and I suspect that one second of lag is considerably higher on ARM. To go somewhat off topic, on native platforms there is much more fine-grained control of when things get rendered, without hacks, and taking advantage of parallelism is easier. When will we have a stack (with a WebGL backend or something) that replaces the browser's rendering from the ground up and achieves better efficiency?
> there is no real reason why render time should scale at all with content hidden below the fold
The reason is that CSS layout is exceedingly complex and all of the elements within or even across container boundaries can influence the position and appearance of other elements.
This is a major reason that I'm a believer in the React.js or other "Immediate Mode UI" models. This is how games have worked forever: You simply query your world state for "stuff currently in view" and draw that as fast as you can, caching anything that change infrequently and is expensive to recompute.
> When will we have a stack (with a WebGL backend or something) that replaces the browser's rendering from the ground up and achieves better efficiency?
Actually, if I understand it correctly, famo.us does this / tries to does it. For the past couple of years, actually; it could be a hoax by now. IIRC, they want to make things public somewhere this year. It's just a tech demo so far though, which has already been recreated with three.js
I'm a little surprised they don't do something like iOS does for rendering list views quickly... have a small subset of item views rendered and basically reuse them. You aren't displaying all 7,000 items so while maybe it makes sense to load the data in one shot, does it make sense to load all the DOM elements? Probably not.
Also, I understand why things get slow, but I will never understand why performance benchmarking doesn't seem to exist in many places as part of the QA process. Writing tests and making things work right is usually there, but making sure things are performant and the user has an outstanding experience seems to get left until it's a "problem".
That could be seen as premature optimization. If you don't know things like the average number of cards, or the average for the power users then you don't know if you will have that problem. That would be like saying they should spend time on supporting 30,000 (made up number) cards right now.
blocked rendering works, but reuse of existing dom components do not. much cheaper to break it into the list item and render around your display area in larger blocks than you would if you use native. detaching from the dom changing is cheaper than item reuse, still cheaper to render multiple fragments and converting and adding them once.
Slightly off-topic, but I was so glad when as an individual I could pay for Trello. It makes me feel a bit safer that they'll stick around rather than do the ol' shutdown or bought out and shutdown dance.
<<harsh criticism - i feel bad for, yet I also feel the standards are too low in general>>
pre-rendering the board on the server would have solved his perceived problem immediately.
then in more detail:
layout thrashing only now a consideration? (advice use a mock dom and see what your operations do in your testing if you decide to handle dom manipulation yourself)
as a developer you only started using the profiler when?
too many http request, can be much optimised (yes i realise cdn, but ttl there can be managed nicely even for a single delivery)
css not remaned and compressed
own js badly minified
using jquery ffs!
anyway, perceived rendering would have solved this by the metrics solved is measured here and simply rendering on the server and giving a 500ms ttl on the cdn would have been faster + not overburden their servers.
I don't know their stack so perhaps the next staement is useless: is this api with the big taskboard open aka can I have a stab at it and try to explain and proof what I am talking about.
I wonder how the trello approach of building more complete DOM elements in backbone prior to insertion, to avoid layout thrashing, would compare to using requestAnimationFrame batching? It seems like RAF might allow the browser to see all those DOM "thrashes" as a single render and not try to render them separately, thus speeding them up.
I'm just getting started with using RAF for some JS animations I want to be very high performance, but haven't seen what impact it would have on something as large as a huge trello board.
RAF performance may also be more variable between browsers than simply reducing layout thrashing. At this point though I'm speculating. Would be good if someone more knowledgable would do a comparison.
The reason layout thrashing happens is due to cached layout metrics being invalidated, causing information to be re-computed over and over.
But the layout cache isn't global for the page. Browsers do their best to not invalidate cached layout metrics unnecessarily.
For example, an element's height often depends on its width (due to wrapping of text and other inlines). That height can be expensive to compute because it requires layout and word wrapping of all the element's children. But if you move the element to a different container, but the element's width and cascaded/inherited styles stay the same, some browsers will not invalidate the element's height. (Check out how fast the "Reparent" test is on http://jsperf.com/are-reflows-created-equal in Safari and Chrome )
So if you find yourself in a situation where layout thrashing is hitting you hard, try to find ways to give the browser more explicit information about your layout, so that layout cache invalidations don't propagate as far. For example, giving parent elements an absolute width and/or height can help a lot.
This way, you can often eke out the performance you need, while avoiding heavy-handed refactoring necessary to always batch DOM changes. (Unfortunately, you'll need to verify the improved performance in all major browsers -- not all will have the same optimizations. It would be great if browser vendors documented their behavior more!)
I didn't compare with simply batching updates, but I've had big wins from using document fragments. In order to allow both batch and incremental updates with Backbone, I simply create document fragments before a batch update. Then I check when adding elements if a fragment exists, or if elements should be added to the DOM directly.
I don't like the big icons. It takes longer for me to read the same amount of information. Don't take my word for it, but there you go.
Is there anything like a standalone version we can run in a large multinational? External cloud-services are no-go for legal reasons. It's really annoying me that my wife and evening-work colleagues are super-sophisticated with kanban and then I come in to work with a shitty to-do spreadsheet.
I've found that the use of documentFragment elements can provide an easy way to avoid reflows. They're easy to use, you can build your stuff on the side and then insert the fragment somewhere in the DOM as needed.
I found this technique to make major improvements for a similar task. This only makes sence once rendering is a big bottleneck of course, you don't want to over-parallelize.
Great post! Thanks for sharing. Ive made my way with backbone in a similar way to optimize rendering, and I learned some new tricks with this post. We use trello on daily basis to review our app and system development.
DOM size mostly ends up mattering due to crawling your descendant selectors during a Style Recalculation. That and not having secure layout boundaries.
Reflow (aka layout thrashing) hurts for sure, yeah.
The article isn't clear on it, but there have been studies (http://baymard.com/blog/making-a-slow-site-appear-fast) that show the perception of fast loading is actually more important than the real thing. Showing the user a "Loading..." graphic is the most common manifestation of this, but there are others. (Unfortunately, the original Forrester-Akamai study seems to be unavailable.)