
We spent a week making Trello boards load fast - mwsherman
http://blog.fogcreek.com/we-spent-a-week-making-trello-boards-load-extremely-fast-heres-how-we-did-it/
======
cmadan
We at ClinchPad, which also uses a card based layout similar to Trello, faced
the same issues. For every 100 cards, it'd add another 1-2 seconds to the load
time meaning when you got to about 700 cards it would be close to 10 seconds
of load time.

Here's how I fixed it

1\. I added pagination - Load only 300 cards, paginate to load more. We found
that while some people had 300+ cards in their default view, most never
actually utilised the view usually using a filter to bring down the no of
cards to a more manageable level.

That still left about 3-6 seconds of page load time for 300 cards.
Unacceptable. So these where the further steps I took to fix it.

2\. The main culprit after some rudimentary profiling seemed to be the
jQuery's .html('') tag. Apparently it does some cleanup which takes awhile on
a huge DOM block with lots of attached events. Replaced it with by looping
through each child node in the DOM block and removing them with removeChild.
Achieved 8x speedup, loading time down to 1-2 seconds with 300 cards.

3\. Second culprit was the jQuery UI's droppable and draggable initiations was
taking awhile. Hacked around this by putting the jQuery UI's droppable and
draggable initiations inside a setTimeout thats fired in 100ms. Of course,
this doesn't affect the actual rendering time but the perceived rendering time
is now <1sec because the pipeline loads instantly and it is usually 1-2
seconds before the user does an action on the UI.

All in all, just less than <20 lines of code changed but took me two full days
to figure it out. At one point, I was looking through jQuery UI's droppable
code and futility seeing how I could optimize it. :)

~~~
spicyj
> The main culprit after some rudimentary profiling seemed to be the jQuery's
> .html('') tag. Apparently it does some cleanup which takes awhile on a huge
> DOM block with lots of attached events. Replaced it with by looping through
> each child node in the DOM block and removing them with removeChild.

It's important to let jQuery clean up its event handlers, otherwise you'll end
up with memory leaks.

~~~
asolove
This is important to remember but hopefully not relevant. If you have event
handlers on each card instead of delegating them to the container, that's a
performance problem all by itself.

~~~
vonseel
Can you clarify what you mean here? If I were writing a Backbone app, you mean
I should bind to data attributes or similar instead of creating a new view for
each item in a list?

~~~
asolove
If you have a list of thousands of items, attaching event handlers to each of
those items separately is relatively expensive, although it makes the view
code nicer. If you are as serious about performance as these people, you would
want to add delegated events on the parent level. There is a writeup about
doing this in Backbone [1] although I can't vouch for it.

In fairness, getting really good dom performance out of Backbone is hard
because of the lack of built-in delegated event support and the non-
synchronized redraws. If you have big apps that need to be super-fast, either
read up on dom performance practices and adopt some helper libraries along
with Backbone, or try a framework that has its own draw cycle and delegated
event support.

[1] [http://lostechies.com/derickbailey/2011/10/11/backbone-js-
ge...](http://lostechies.com/derickbailey/2011/10/11/backbone-js-getting-the-
model-for-a-clicked-element/)

~~~
crescentfresh
> getting really good dom performance out of Backbone is hard because of the
> lack of built-in delegated event support

It actually is builtin:
[https://github.com/jashkenas/backbone/blob/1.1.0/backbone.js...](https://github.com/jashkenas/backbone/blob/1.1.0/backbone.js#L1072)

In fact it takes a bit of effort to _not_ use event delegation when setting up
your events dict.

~~~
asolove
Well, Backbone handles delegation for elements within the same view. But it
doesn't handle the case where you have lots of instances of the same view
class in a list and want each handler registered only once for the whole list.
You can do this manually by checking out the event target and maintaining a
mapping from dom nodes to views, which you ought to do if you have lists of
hundreds or thousands of repeated subviews. But Ember and various Backbone
plugins can do this for you automatically.

~~~
crescentfresh
> it doesn't handle the case where you have lots of instances of the same view

Wait I'm getting confused. Didn't the title post and your linked article both
say not to do this if performance is the goal? (See "Tuesday" in OP). Isn't
that what we're talking about here? Sorry if I misunderstood.

~~~
asolove
Yeah, I think this is just a terminology problem. Let's just lay out the
complete answer: if you have a list of 1000 things to render in the same way,
you can make it faster in various ways:

1\. Don't render all of them at once, use pagination or infinite scroll.

2\. Don't attach dom events to each item individually, instead attaching one
event listener to the list container and having a way to go from the event's
target to the appropriate view to act on.

3\. Don't even create a view object for each item, just render them with the
same template and have a composite view that knows how to act on each item's
model for the delegated events.

It's important to understand the relative performance benefits of these
changes. #1 will make things faster in proportion to how much the page size is
smaller the the full list size. #2 will reduce the number of DOM calls by a
factor of the number of items in the list. #3 will reduce the number of
JavaScript objects created. #2 gets you a 10x-100x bigger speedup than #3.
JavaScript fast, DOM slow.

------
JackFr
The biggest takeaway from this should be the difference between Wednesday(0%)
vs. Thursday(90+%) is the difference between optimizing on a hunch and
optimizing with a profiler.

~~~
badman_ting
Chrome Dev Tools told him where the time was being spent, but the hunch still
had to be there. He still had to know about a previous optimization that had
worked, and the change that ended up working began with the words "I
wondered…"

Also note that he says "Perceived rendering went down to 960ms". So this is
largely done to alter perceptions anyway, not necessarily total throughput or
however you'd like to phrase it.

------
randallu
The "translateZ: 0" description is a bit misleading -- I wish he'd provided
numbers for the improvement. In general using composited layers is more
expensive (since the CPU still does rendering of the image, must upload it to
texture, etc).

It might be a win if the thing you apply it to:

1\. Never changes, but the content around it changes often.

2\. Is hard to render (lots of shadows, etc).

The layout and paint thrashing is a really good optimization though. You
should be able to insert as many things into the DOM as you like without
triggering a layout SO long as you don't read back (like consulting
offsetLeft). I think the Chrome inspector will mark read backs with a little
exclamation point in the timeline with a tooltip "synchronous layout forced"
and a backtrace to your JS...

~~~
SDGT
The translateZ deal just throws the browser into hardware rendering, which
will run much smoother with any GFX hardware that will support it.

The same thing works with all of the other 3d transforms: Putting in a BS
value for Z will cause the element to use hardware acceleration.

~~~
randallu
No, translateZ just makes it a composited layer. Hardware comes much later in
the pipeline and possibly in another process.

The content of the layer isn't hardware rendered. It's rendered by the CPU and
uploaded to a texture. In WebKit and probably Blink there's a fast path for
images, canvas and video so that they can be directly uploaded or (on some
platforms like Mac) bound to a texture avoiding an upload copy.

Microsoft and (maybe) Mozilla have a "hardware rendering" path via Direct2D,
but Chrome and WebKit don't, they have compositors which can use the graphics
hardware to perform compositing, but not rendering.

~~~
bdash
For what it's worth, WebKit on OS X uses hardware acceleration for both
rendering and compositing by way of Core Animation.

~~~
Ciechanowski
Which technologies do benefit from GPU rendering? Aren't Quartz calls
rasterized on CPU?

~~~
bdash
Core Animation layers have a mode in which Core Graphics calls targeting them
are both processed asynchronously by a another thread and rasterized via
OpenGL.

~~~
Ciechanowski
I presume you mean the "drawsAsynchronously" property. I'm extremely curious,
does it really push the _rasterization_ on the GPU? I mean, do you have
shaders written, that do all the stuff that CPU normally does? Bezier paths,
clipping, stroking, filling?

------
sync
To prevent layout thrashing yourself, you can use this library:
[https://github.com/wilsonpage/fastdom](https://github.com/wilsonpage/fastdom)

Ember.JS (and possibly Angular?) does this for you automatically.

~~~
zcrar70
Angular doesn't do this (batch DOM updates) - but because Angular's $digest
cycle is a batch, it's fairly close (or at least, less bad than Backbone's
default behaviour).

Dirty checking and watch execution happens in batches, but the resulting DOM
updates are executed ad-hoc within each batch iteration - without any regard
for requestAnimationFrame or forced synchronous layouts for instance.

~~~
Cthulhu_
Would it be possible / beneficial / desirable to add something like this to
Angular?

------
joshma
We've had good success with the "queued rendering with interrupts" strategy as
well. The 5.9s to 960ms drop is _slightly_ misleading, since a lot of the
rendering has yet to be done, but as long as one remembers they're measuring
"perceived" rendering time I'm in full agreement.

Other than allowing the browser to paint in the middle, I'd say it's equally
(if not more) important that the _.defer calls allow user events to interleave
rendering, so you get a bit of scrolling, clicking, hovering, etc. Not doing
so is akin to running an intensive operation in the UI thread (for those
coming from Swing or Android), and you get a frozen browser page instead.

The one caveat we've seen, though, is your code gets more complicated due to
the async rendering. For us the async render was just a subcall in a larger
render method, and some later calls relied on the async rendering being
complete for some measurement purposes. We had to move those calls to a
callback after the queued rendering was done, but ideally only wanted SOME of
it to be deferred (some click handlers, etc, we wanted set up earlier so the
user could interact with the page), but in a larger codebase you get into a
refactoring nightmare, etc etc.

All being said, though, it was probably worth it. :)

~~~
bobbygrace
Yes, this was in perceived rendering time, and yes, the trade-off was worth
it. We kept the async rendering stuff localized so it doesn't complicate the
app much.

------
TN1ck
Isn't the way React handles the DOM perfectly to prevent layout-trashing? As
far as I understand, the DOM in React is nearly write-only, so layout-trashing
should never occur, but I don't know the exact implementation details. Please
correct me if this is not correct, I couldn't find any hard information on
this.

~~~
masklinn
> As far as I understand, the DOM in React is nearly write-only

You can still read it if you need[0]. Theoretically, major layout invalidation
should only be done in bulk during reconciliation, but I don't know when the
rendering phase applies, and React also has component-local state.

I know Om[1] only does rendering on requestAnimationFrame (so at once and
synchronised with RAF) but that seems to be done by Om itself[2] and I can't
find any clear documentation on that part for React.

[0] although it may not be up to date if you're between a state change and a
rendering

[1] [https://github.com/swannodette/om](https://github.com/swannodette/om)

[2]
[https://github.com/swannodette/om/blob/master/src/om/core.cl...](https://github.com/swannodette/om/blob/master/src/om/core.cljs#L452)

~~~
spicyj
React is designed so that you should never need to read from the DOM except
things like an input field's value. React specifically tries to never touch
the DOM except when doing necessary mutations.

If you're doing complicated layout code, then you will need to read from the
DOM. Currently React doesn't have great support for managing layout from JS in
a clean, efficient way but it's one of the things we're looking to add in the
future.

~~~
masklinn
> React is designed so that you should never need to read from the DOM

Oh absolutely, sorry if that was not clear.

But sometimes you need to (e.g. to place an overlay or tooltip thing over an
element), and AFAIK that remains possible.

~~~
spicyj
Correct.

------
comex
And yet there is no real reason why render time should scale at all with
content hidden below the fold, and I suspect that one second of lag is
considerably higher on ARM. To go somewhat off topic, on native platforms
there is much more fine-grained control of when things get rendered, without
hacks, and taking advantage of parallelism is easier. When will we have a
stack (with a WebGL backend or something) that replaces the browser's
rendering from the ground up and achieves better efficiency?

~~~
levosmetalo
You might not be that far away from that time. With all the recent asm.js and
WebGl advancements, it might be just a matter of time some of the existing
desktop GUI frameworks get ported to web.

Just imagine Qt markup rendered by your browser and glued by Javascript.

~~~
mintplant
That sounds like an accessibility and searchability nightmare. A huge part of
the web's usefulness comes from its structured and standardized nature.

------
benjaminwootton
This post makes me want to move from back end coding to front end or full
stack.

It seems like those guys have much more fun now more of the cross browser pain
has been abstracted away.

~~~
joevandyk
There's still _tons_ of pain. _tons_

~~~
dmix
Indeed. Try building a Javascript MVC app. I've been doing it for ~3 years,
since Backbone.js was a tiny unknown project, and it's still painful.

One positive is the lack of old IE version support these days. But then there
is also mobile.

~~~
diggan
If building a Javascript (MVC or not) app is still a pain for you, you should
check out some more feature packed framework like AngularJS or EmberJS and
you'll see that most of the pain have been medicated.

~~~
rimantas
Most of the pain will just move elsewhere with some adition pain related to
said move.

------
programminggeek
I'm a little surprised they don't do something like iOS does for rendering
list views quickly... have a small subset of item views rendered and basically
reuse them. You aren't displaying all 7,000 items so while maybe it makes
sense to load the data in one shot, does it make sense to load all the DOM
elements? Probably not.

Also, I understand why things get slow, but I will never understand why
performance benchmarking doesn't seem to exist in many places as part of the
QA process. Writing tests and making things work right is usually there, but
making sure things are performant and the user has an outstanding experience
seems to get left until it's a "problem".

~~~
danabramov
Airbnb made a JS lib for this: “∞ is a UITableView for the web”.

[http://airbnb.github.io/infinity/](http://airbnb.github.io/infinity/)

~~~
vonseel
Nice find!

------
ryan-allen
Slightly off-topic, but I was so glad when as an individual I could pay for
Trello. It makes me feel a bit safer that they'll stick around rather than do
the ol' shutdown or bought out and shutdown dance.

------
Kluny
Just checked my company's Trello board. Can confirm, it's faster.

~~~
manmal
Yes. Minor nitpick: Vertical scrolling lock-in seems to be disabled, or
scrolling is more sensitive now. I'm constantly scrolling to the side now when
what I want to do is scroll vertically.

------
lennel
<<harsh criticism - i feel bad for, yet I also feel the standards are too low
in general>>

pre-rendering the board on the server would have solved his perceived problem
immediately.

then in more detail:

layout thrashing only now a consideration? (advice use a mock dom and see what
your operations do in your testing if you decide to handle dom manipulation
yourself)

as a developer you only started using the profiler when?

too many http request, can be much optimised (yes i realise cdn, but ttl there
can be managed nicely even for a single delivery)

css not remaned and compressed

own js badly minified

using jquery ffs!

>> anyway, perceived rendering would have solved this by the metrics solved is
measured here and simply rendering on the server and giving a 500ms ttl on the
cdn would have been faster + not overburden their servers. I don't know their
stack so perhaps the next staement is useless: is this api with the big
taskboard open aka can I have a stab at it and try to explain and proof what I
am talking about.

------
ddorian43
How has mongodb worked out for you in the long-run? Have you looked at tokumx?

------
examancer
I wonder how the trello approach of building more complete DOM elements in
backbone prior to insertion, to avoid layout thrashing, would compare to using
requestAnimationFrame batching? It seems like RAF might allow the browser to
see all those DOM "thrashes" as a single render and not try to render them
separately, thus speeding them up.

I'm just getting started with using RAF for some JS animations I want to be
very high performance, but haven't seen what impact it would have on something
as large as a huge trello board.

RAF performance may also be more variable between browsers than simply
reducing layout thrashing. At this point though I'm speculating. Would be good
if someone more knowledgable would do a comparison.

------
dbloom
While we're on the topic of layout thrashing...

The reason layout thrashing happens is due to cached layout metrics being
invalidated, causing information to be re-computed over and over.

But the layout cache isn't global for the page. Browsers do their best to not
invalidate cached layout metrics unnecessarily.

For example, an element's height often depends on its width (due to wrapping
of text and other inlines). That height can be expensive to compute because it
requires layout and word wrapping of all the element's children. But if you
move the element to a different container, but the element's width and
cascaded/inherited styles stay the same, some browsers will not invalidate the
element's height. (Check out how fast the "Reparent" test is on
[http://jsperf.com/are-reflows-created-equal](http://jsperf.com/are-reflows-
created-equal) in Safari and Chrome )

So if you find yourself in a situation where layout thrashing is hitting you
hard, try to find ways to give the browser more explicit information about
your layout, so that layout cache invalidations don't propagate as far. For
example, giving parent elements an absolute width and/or height can help a
lot.

This way, you can often eke out the performance you need, while avoiding
heavy-handed refactoring necessary to always batch DOM changes.
(Unfortunately, you'll need to verify the improved performance in all major
browsers -- not all will have the same optimizations. It would be great if
browser vendors documented their behavior more!)

------
axemclion
Do you think adding something like this -
[http://github.com/axemclion/browser-
perf](http://github.com/axemclion/browser-perf) into the continuous
integration process would help over time ?

The project is a NodeJS implementation of the Chromium telemetry smoothness
and loading benchmarks and the data from it could check perf regressions.

I could help with the integration if needed.

------
michaelmior
I didn't compare with simply batching updates, but I've had big wins from
using document fragments. In order to allow both batch and incremental updates
with Backbone, I simply create document fragments before a batch update. Then
I check when adding elements if a fragment exists, or if elements should be
added to the DOM directly.

------
hessenwolf
I have a net promoter score of 10 for your product. (it means I am telling
people about your product -
[http://en.wikipedia.org/wiki/Net_Promoter](http://en.wikipedia.org/wiki/Net_Promoter)
\- ignore the criticism, I have inside information)

I don't like the big icons. It takes longer for me to read the same amount of
information. Don't take my word for it, but there you go.

Is there anything like a standalone version we can run in a large
multinational? External cloud-services are no-go for legal reasons. It's
really annoying me that my wife and evening-work colleagues are super-
sophisticated with kanban and then I come in to work with a shitty to-do
spreadsheet.

------
Jakob
For me it still takes one second in script only but not because of heavy
scripts but because of forced layouts[0] for each page load.

In your code it’s mostly adding DOM children (invalidating the layout) and
getting an offset later on (thus forcing the layout) for ca 25 times. The page
should be much more responsive.

[0] [https://developers.google.com/chrome-developer-
tools/docs/de...](https://developers.google.com/chrome-developer-
tools/docs/demos/too-much-layout/)

------
Veejay
I've found that the use of documentFragment elements can provide an easy way
to avoid reflows. They're easy to use, you can build your stuff on the side
and then insert the fragment somewhere in the DOM as needed.

[https://developer.mozilla.org/en/docs/Web/API/DocumentFragme...](https://developer.mozilla.org/en/docs/Web/API/DocumentFragment)

------
gboudrias
I have to say that is very impressive. Progressive rendering is somewhat
obvious, but I didn't know layout thrashing could be this important.

~~~
Too
Progressive rendering is obvious if you have things below the fold. But it
makes sense even if everything is supposed to be visible from the start. The
_total_ download+render time can actually become faster, not just because some
things are not visible. This is because xmlhttprequest is truly asyncronous so
you can parallelize the download and the rendering. This also prevents the
"page seems to be frozen" warning most browsers pop up after being stuck in
javascript for > 3sec.

I found this technique to make _major_ improvements for a similar task. This
only makes sence once rendering is a big bottleneck of course, you don't want
to over-parallelize.

------
xmlninja
Great post! Thanks for sharing. Ive made my way with backbone in a similar way
to optimize rendering, and I learned some new tricks with this post. We use
trello on daily basis to review our app and system development.

------
elwell
After spending countless hours optimizing HTML5 mobile apps, I find that DOM
size and DOM reflow are usually the main issues.

~~~
paulirish
DOM size mostly ends up mattering due to crawling your descendant selectors
during a Style Recalculation. That and not having secure layout boundaries.
Reflow (aka layout thrashing) hurts for sure, yeah.

------
cheeaun
I'm curious how do you measure the performance? 7.2 seconds means the load
time, domready time or something?

~~~
thedufer
I believe it is time until the visible area is rendered. Neither domready nor
load time are very meaningful since all of the interesting rendering happens
in javascript, triggered by the onready event.

~~~
chriskottom
The article isn't clear on it, but there have been studies
([http://baymard.com/blog/making-a-slow-site-appear-
fast](http://baymard.com/blog/making-a-slow-site-appear-fast)) that show the
perception of fast loading is actually more important than the real thing.
Showing the user a "Loading..." graphic is the most common manifestation of
this, but there are others. (Unfortunately, the original Forrester-Akamai
study seems to be unavailable.)

