Hacker News new | comments | show | ask | jobs | submit login
We spent a week making Trello boards load fast (fogcreek.com)
445 points by mwsherman 919 days ago | hide | past | web | 94 comments | favorite

We at ClinchPad, which also uses a card based layout similar to Trello, faced the same issues. For every 100 cards, it'd add another 1-2 seconds to the load time meaning when you got to about 700 cards it would be close to 10 seconds of load time.

Here's how I fixed it

1. I added pagination - Load only 300 cards, paginate to load more. We found that while some people had 300+ cards in their default view, most never actually utilised the view usually using a filter to bring down the no of cards to a more manageable level.

That still left about 3-6 seconds of page load time for 300 cards. Unacceptable. So these where the further steps I took to fix it.

2. The main culprit after some rudimentary profiling seemed to be the jQuery's .html('') tag. Apparently it does some cleanup which takes awhile on a huge DOM block with lots of attached events. Replaced it with by looping through each child node in the DOM block and removing them with removeChild. Achieved 8x speedup, loading time down to 1-2 seconds with 300 cards.

3. Second culprit was the jQuery UI's droppable and draggable initiations was taking awhile. Hacked around this by putting the jQuery UI's droppable and draggable initiations inside a setTimeout thats fired in 100ms. Of course, this doesn't affect the actual rendering time but the perceived rendering time is now <1sec because the pipeline loads instantly and it is usually 1-2 seconds before the user does an action on the UI.

All in all, just less than <20 lines of code changed but took me two full days to figure it out. At one point, I was looking through jQuery UI's droppable code and futility seeing how I could optimize it. :)

My tips:

Avoid JQuery and use JavaScript 5 native API, if it's reasonable target (no IE8 < 9).

Check out http://jsperf.com/popular and use fast (and sane) methods.

What does "no IE8 < 9" mean? No versions of IE less than 8, less than 9, or something else?

No version of IE8 that has a version number lower than IE9.

I am having difficulties understanding the explanation - how can IE8 have a number equal or higher than IE9?

Because IE7 8 9.

it was a typo, I meant IE older than version 9 (IE < 9)

btw. JQuery 2 dropped support for such older IE version too

> The main culprit after some rudimentary profiling seemed to be the jQuery's .html('') tag. Apparently it does some cleanup which takes awhile on a huge DOM block with lots of attached events. Replaced it with by looping through each child node in the DOM block and removing them with removeChild.

It's important to let jQuery clean up its event handlers, otherwise you'll end up with memory leaks.

This is important to remember but hopefully not relevant. If you have event handlers on each card instead of delegating them to the container, that's a performance problem all by itself.

Can you clarify what you mean here? If I were writing a Backbone app, you mean I should bind to data attributes or similar instead of creating a new view for each item in a list?

If you have a list of thousands of items, attaching event handlers to each of those items separately is relatively expensive, although it makes the view code nicer. If you are as serious about performance as these people, you would want to add delegated events on the parent level. There is a writeup about doing this in Backbone [1] although I can't vouch for it.

In fairness, getting really good dom performance out of Backbone is hard because of the lack of built-in delegated event support and the non-synchronized redraws. If you have big apps that need to be super-fast, either read up on dom performance practices and adopt some helper libraries along with Backbone, or try a framework that has its own draw cycle and delegated event support.

[1] http://lostechies.com/derickbailey/2011/10/11/backbone-js-ge...

> getting really good dom performance out of Backbone is hard because of the lack of built-in delegated event support

It actually is builtin: https://github.com/jashkenas/backbone/blob/1.1.0/backbone.js...

In fact it takes a bit of effort to not use event delegation when setting up your events dict.

Well, Backbone handles delegation for elements within the same view. But it doesn't handle the case where you have lots of instances of the same view class in a list and want each handler registered only once for the whole list. You can do this manually by checking out the event target and maintaining a mapping from dom nodes to views, which you ought to do if you have lists of hundreds or thousands of repeated subviews. But Ember and various Backbone plugins can do this for you automatically.

> it doesn't handle the case where you have lots of instances of the same view

Wait I'm getting confused. Didn't the title post and your linked article both say not to do this if performance is the goal? (See "Tuesday" in OP). Isn't that what we're talking about here? Sorry if I misunderstood.

Yeah, I think this is just a terminology problem. Let's just lay out the complete answer: if you have a list of 1000 things to render in the same way, you can make it faster in various ways:

1. Don't render all of them at once, use pagination or infinite scroll.

2. Don't attach dom events to each item individually, instead attaching one event listener to the list container and having a way to go from the event's target to the appropriate view to act on.

3. Don't even create a view object for each item, just render them with the same template and have a composite view that knows how to act on each item's model for the delegated events.

It's important to understand the relative performance benefits of these changes. #1 will make things faster in proportion to how much the page size is smaller the the full list size. #2 will reduce the number of DOM calls by a factor of the number of items in the list. #3 will reduce the number of JavaScript objects created. #2 gets you a 10x-100x bigger speedup than #3. JavaScript fast, DOM slow.

That article advocates instantiating a View per model object in direct opposition to the delegated approach. It acknowledges that there may be performance implications on large collections, just as the title post found. Are you sure that is the article you intended to reference?

The thing is, when you profile a typical backbone app, with a list of view.

Most of the time (IIRC 60-70% for ~20 events) is spent registering DOM event handlers. Instantiating the view itself is nothing in comparison.

So registering all the events on the container and delegating (in the POO meaning not DOM) give you a very good performance without scarifying too much your code complexity.

Yup, it's the DOM stuff and not the js initialization that is the bottleneck. This is why Ember and React can be faster: you still have the same number or more js objects, but the automatic event delegation and managed draw cycle minimize dom changes.

This was going to be my comment as well.. better to detach the elements, and put them into a queue to remove completely in a timer loop... this way cleanup happens out of band.

I worked on a rather large extjs app, and IE was particularly bad about cleaning up dom objects with events attached, leading to memory leaks so bad that IE <= 8 would need to be reloaded a couple times a day. Firefox at the time was much better, and chrome was really new (so was IE8 at the time).

jQuery is good about keeping track of events and properly cleaning up, but it is expensive... though it's easy enough to .find('> *') and detach them, pushing those items into a cleanup queue.

Cool product. Looks similar to Pipedrive[1]. Incidentally, they came before Trello.

[1]: https://www.pipedrive.com/

Thanks! :)

For #2, attached events in the child nodes may still be bound. This may lead to more memory usage than needed. Of course, if you don't have any bound events in the children, it'll work. Maybe a work around for any child events would be to delegate them on the parent container and just remove that one when you clear everything.

About events: a nice trick is to add event listeners to the window only and check the event.target.

This method is used in game dev a lot.

+1 where it makes sense.. however this doesn't work well with a modular application, and could lead to a lot of higher level handlers/listeners.

I think the more expensive cost is cleanup, or lack of proper cleanup leading to memory leaks... I'll usually detach nodes to cleanup, and put them into a queue that's run in a setTimeout loop... that way they don't slow down the UI.

The biggest takeaway from this should be the difference between Wednesday(0%) vs. Thursday(90+%) is the difference between optimizing on a hunch and optimizing with a profiler.

Chrome Dev Tools told him where the time was being spent, but the hunch still had to be there. He still had to know about a previous optimization that had worked, and the change that ended up working began with the words "I wondered…"

Also note that he says "Perceived rendering went down to 960ms". So this is largely done to alter perceptions anyway, not necessarily total throughput or however you'd like to phrase it.

The "translateZ: 0" description is a bit misleading -- I wish he'd provided numbers for the improvement. In general using composited layers is more expensive (since the CPU still does rendering of the image, must upload it to texture, etc).

It might be a win if the thing you apply it to:

1. Never changes, but the content around it changes often.

2. Is hard to render (lots of shadows, etc).

The layout and paint thrashing is a really good optimization though. You should be able to insert as many things into the DOM as you like without triggering a layout SO long as you don't read back (like consulting offsetLeft). I think the Chrome inspector will mark read backs with a little exclamation point in the timeline with a tooltip "synchronous layout forced" and a backtrace to your JS...

The translateZ deal just throws the browser into hardware rendering, which will run much smoother with any GFX hardware that will support it.

The same thing works with all of the other 3d transforms: Putting in a BS value for Z will cause the element to use hardware acceleration.

No, translateZ just makes it a composited layer. Hardware comes much later in the pipeline and possibly in another process.

The content of the layer isn't hardware rendered. It's rendered by the CPU and uploaded to a texture. In WebKit and probably Blink there's a fast path for images, canvas and video so that they can be directly uploaded or (on some platforms like Mac) bound to a texture avoiding an upload copy.

Microsoft and (maybe) Mozilla have a "hardware rendering" path via Direct2D, but Chrome and WebKit don't, they have compositors which can use the graphics hardware to perform compositing, but not rendering.

For what it's worth, WebKit on OS X uses hardware acceleration for both rendering and compositing by way of Core Animation.

Which technologies do benefit from GPU rendering? Aren't Quartz calls rasterized on CPU?

Core Animation layers have a mode in which Core Graphics calls targeting them are both processed asynchronously by a another thread and rasterized via OpenGL.

I presume you mean the "drawsAsynchronously" property. I'm extremely curious, does it really push the rasterization on the GPU? I mean, do you have shaders written, that do all the stuff that CPU normally does? Bezier paths, clipping, stroking, filling?

Oh, nice! For some reason I thought that was only for canvas.

It was only used for canvas in the initial release before being deployed more widely.

The translate z trick does not work in general. It works right now in Chrome (and probably Safari). It does not work in Firefox, it may not work in Chrome in the future. (Because you are trying to trick the browser by gaming it's heuristics, those heuristics might change).

That hardware rendering is smoother is also not true in general, just some cases, which the browser will try to guess for you.

You should be careful when adding translateZ. If you go beyond the GPU memory it's going to be extremely slow and has high chances of crashing the app.

How easy is that to do accidentally? 128MB is enough for 16 screenfulls at 1080p. Can you really trigger the creation of that many hardware-composited layers without intending to?

To prevent layout thrashing yourself, you can use this library: https://github.com/wilsonpage/fastdom

Ember.JS (and possibly Angular?) does this for you automatically.

Angular doesn't do this (batch DOM updates) - but because Angular's $digest cycle is a batch, it's fairly close (or at least, less bad than Backbone's default behaviour).

Dirty checking and watch execution happens in batches, but the resulting DOM updates are executed ad-hoc within each batch iteration - without any regard for requestAnimationFrame or forced synchronous layouts for instance.

Would it be possible / beneficial / desirable to add something like this to Angular?

(React does too.)

Meteor UI does it as well.

Didn't know about fastdom, thanks for sharing!

We've had good success with the "queued rendering with interrupts" strategy as well. The 5.9s to 960ms drop is _slightly_ misleading, since a lot of the rendering has yet to be done, but as long as one remembers they're measuring "perceived" rendering time I'm in full agreement.

Other than allowing the browser to paint in the middle, I'd say it's equally (if not more) important that the _.defer calls allow user events to interleave rendering, so you get a bit of scrolling, clicking, hovering, etc. Not doing so is akin to running an intensive operation in the UI thread (for those coming from Swing or Android), and you get a frozen browser page instead.

The one caveat we've seen, though, is your code gets more complicated due to the async rendering. For us the async render was just a subcall in a larger render method, and some later calls relied on the async rendering being complete for some measurement purposes. We had to move those calls to a callback after the queued rendering was done, but ideally only wanted SOME of it to be deferred (some click handlers, etc, we wanted set up earlier so the user could interact with the page), but in a larger codebase you get into a refactoring nightmare, etc etc.

All being said, though, it was probably worth it. :)

Yes, this was in perceived rendering time, and yes, the trade-off was worth it. We kept the async rendering stuff localized so it doesn't complicate the app much.

Isn't the way React handles the DOM perfectly to prevent layout-trashing? As far as I understand, the DOM in React is nearly write-only, so layout-trashing should never occur, but I don't know the exact implementation details. Please correct me if this is not correct, I couldn't find any hard information on this.

This is correct. Unless you manually add code to touch the DOM in your component methods (which is rarely necessary), React only touches the DOM when it needs to mutate it and more or less does not read from it. For example, when mutating several parts of the DOM, React batches the element creation into a single innerHTML call because that's faster than converting many individual HTML strings into DOM nodes.

> As far as I understand, the DOM in React is nearly write-only

You can still read it if you need[0]. Theoretically, major layout invalidation should only be done in bulk during reconciliation, but I don't know when the rendering phase applies, and React also has component-local state.

I know Om[1] only does rendering on requestAnimationFrame (so at once and synchronised with RAF) but that seems to be done by Om itself[2] and I can't find any clear documentation on that part for React.

[0] although it may not be up to date if you're between a state change and a rendering

[1] https://github.com/swannodette/om

[2] https://github.com/swannodette/om/blob/master/src/om/core.cl...

React is designed so that you should never need to read from the DOM except things like an input field's value. React specifically tries to never touch the DOM except when doing necessary mutations.

If you're doing complicated layout code, then you will need to read from the DOM. Currently React doesn't have great support for managing layout from JS in a clean, efficient way but it's one of the things we're looking to add in the future.

> React is designed so that you should never need to read from the DOM

Oh absolutely, sorry if that was not clear.

But sometimes you need to (e.g. to place an overlay or tooltip thing over an element), and AFAIK that remains possible.


There is this React plugin/extension/whatever for using requestAnimationFrame with "plain React" https://npmjs.org/package/react-raf-batching

But as you can see, the documentation is a bit sparse ;)

And yet there is no real reason why render time should scale at all with content hidden below the fold, and I suspect that one second of lag is considerably higher on ARM. To go somewhat off topic, on native platforms there is much more fine-grained control of when things get rendered, without hacks, and taking advantage of parallelism is easier. When will we have a stack (with a WebGL backend or something) that replaces the browser's rendering from the ground up and achieves better efficiency?

> there is no real reason why render time should scale at all with content hidden below the fold

The reason is that CSS layout is exceedingly complex and all of the elements within or even across container boundaries can influence the position and appearance of other elements.

If they did their own custom layout using absolute positioning + JavaScript, then they could easily create "virtualize" the items in the container, only rendering those that are currently visible.

This is a major reason that I'm a believer in the React.js or other "Immediate Mode UI" models. This is how games have worked forever: You simply query your world state for "stuff currently in view" and draw that as fast as you can, caching anything that change infrequently and is expensive to recompute.

Yeah. I think with the move towards heavy client side apps, things like Reacts model will become the way to go. Possibly even implemented on Canvas... but that's likely a while off.

You might not be that far away from that time. With all the recent asm.js and WebGl advancements, it might be just a matter of time some of the existing desktop GUI frameworks get ported to web.

Just imagine Qt markup rendered by your browser and glued by Javascript.

That sounds like an accessibility and searchability nightmare. A huge part of the web's usefulness comes from its structured and standardized nature.

This has already happened and has been available for some time - both QT and GTK both have HTML5 backends, which will render to HTML5.

> When will we have a stack (with a WebGL backend or something) that replaces the browser's rendering from the ground up and achieves better efficiency?

Actually, if I understand it correctly, famo.us does this / tries to does it. For the past couple of years, actually; it could be a hoax by now. IIRC, they want to make things public somewhere this year. It's just a tech demo so far though, which has already been recreated with three.js

This post makes me want to move from back end coding to front end or full stack.

It seems like those guys have much more fun now more of the cross browser pain has been abstracted away.

There's still tons of pain. tons

Indeed. Try building a Javascript MVC app. I've been doing it for ~3 years, since Backbone.js was a tiny unknown project, and it's still painful.

One positive is the lack of old IE version support these days. But then there is also mobile.

If building a Javascript (MVC or not) app is still a pain for you, you should check out some more feature packed framework like AngularJS or EmberJS and you'll see that most of the pain have been medicated.

Most of the pain will just move elsewhere with some adition pain related to said move.

Due to mobile we've more or less moved back 10 years. Getting things to work consistently across all kinds of mobile browsers is on the same level as struggling with IE6 but with much more randomness.

It's ...interesting. The grass is not as green as it would appear.

No, you should stay in your beautiful world of powerful machines and pure data :)

I'm a little surprised they don't do something like iOS does for rendering list views quickly... have a small subset of item views rendered and basically reuse them. You aren't displaying all 7,000 items so while maybe it makes sense to load the data in one shot, does it make sense to load all the DOM elements? Probably not.

Also, I understand why things get slow, but I will never understand why performance benchmarking doesn't seem to exist in many places as part of the QA process. Writing tests and making things work right is usually there, but making sure things are performant and the user has an outstanding experience seems to get left until it's a "problem".

Airbnb made a JS lib for this: “∞ is a UITableView for the web”.


Nice find!

That could be seen as premature optimization. If you don't know things like the average number of cards, or the average for the power users then you don't know if you will have that problem. That would be like saying they should spend time on supporting 30,000 (made up number) cards right now.

blocked rendering works, but reuse of existing dom components do not. much cheaper to break it into the list item and render around your display area in larger blocks than you would if you use native. detaching from the dom changing is cheaper than item reuse, still cheaper to render multiple fragments and converting and adding them once.

Slightly off-topic, but I was so glad when as an individual I could pay for Trello. It makes me feel a bit safer that they'll stick around rather than do the ol' shutdown or bought out and shutdown dance.

Just checked my company's Trello board. Can confirm, it's faster.

Yes. Minor nitpick: Vertical scrolling lock-in seems to be disabled, or scrolling is more sensitive now. I'm constantly scrolling to the side now when what I want to do is scroll vertically.

Same here. Loving the new speed.

<<harsh criticism - i feel bad for, yet I also feel the standards are too low in general>>

pre-rendering the board on the server would have solved his perceived problem immediately.

then in more detail:

layout thrashing only now a consideration? (advice use a mock dom and see what your operations do in your testing if you decide to handle dom manipulation yourself)

as a developer you only started using the profiler when?

too many http request, can be much optimised (yes i realise cdn, but ttl there can be managed nicely even for a single delivery)

css not remaned and compressed

own js badly minified

using jquery ffs!

>> anyway, perceived rendering would have solved this by the metrics solved is measured here and simply rendering on the server and giving a 500ms ttl on the cdn would have been faster + not overburden their servers. I don't know their stack so perhaps the next staement is useless: is this api with the big taskboard open aka can I have a stab at it and try to explain and proof what I am talking about.

How has mongodb worked out for you in the long-run? Have you looked at tokumx?

I wonder how the trello approach of building more complete DOM elements in backbone prior to insertion, to avoid layout thrashing, would compare to using requestAnimationFrame batching? It seems like RAF might allow the browser to see all those DOM "thrashes" as a single render and not try to render them separately, thus speeding them up.

I'm just getting started with using RAF for some JS animations I want to be very high performance, but haven't seen what impact it would have on something as large as a huge trello board.

RAF performance may also be more variable between browsers than simply reducing layout thrashing. At this point though I'm speculating. Would be good if someone more knowledgable would do a comparison.

While we're on the topic of layout thrashing...

The reason layout thrashing happens is due to cached layout metrics being invalidated, causing information to be re-computed over and over.

But the layout cache isn't global for the page. Browsers do their best to not invalidate cached layout metrics unnecessarily.

For example, an element's height often depends on its width (due to wrapping of text and other inlines). That height can be expensive to compute because it requires layout and word wrapping of all the element's children. But if you move the element to a different container, but the element's width and cascaded/inherited styles stay the same, some browsers will not invalidate the element's height. (Check out how fast the "Reparent" test is on http://jsperf.com/are-reflows-created-equal in Safari and Chrome )

So if you find yourself in a situation where layout thrashing is hitting you hard, try to find ways to give the browser more explicit information about your layout, so that layout cache invalidations don't propagate as far. For example, giving parent elements an absolute width and/or height can help a lot.

This way, you can often eke out the performance you need, while avoiding heavy-handed refactoring necessary to always batch DOM changes. (Unfortunately, you'll need to verify the improved performance in all major browsers -- not all will have the same optimizations. It would be great if browser vendors documented their behavior more!)

Do you think adding something like this - http://github.com/axemclion/browser-perf into the continuous integration process would help over time ?

The project is a NodeJS implementation of the Chromium telemetry smoothness and loading benchmarks and the data from it could check perf regressions.

I could help with the integration if needed.

I didn't compare with simply batching updates, but I've had big wins from using document fragments. In order to allow both batch and incremental updates with Backbone, I simply create document fragments before a batch update. Then I check when adding elements if a fragment exists, or if elements should be added to the DOM directly.

I have a net promoter score of 10 for your product. (it means I am telling people about your product - http://en.wikipedia.org/wiki/Net_Promoter - ignore the criticism, I have inside information)

I don't like the big icons. It takes longer for me to read the same amount of information. Don't take my word for it, but there you go.

Is there anything like a standalone version we can run in a large multinational? External cloud-services are no-go for legal reasons. It's really annoying me that my wife and evening-work colleagues are super-sophisticated with kanban and then I come in to work with a shitty to-do spreadsheet.

For me it still takes one second in script only but not because of heavy scripts but because of forced layouts[0] for each page load.

In your code it’s mostly adding DOM children (invalidating the layout) and getting an offset later on (thus forcing the layout) for ca 25 times. The page should be much more responsive.

[0] https://developers.google.com/chrome-developer-tools/docs/de...

I've found that the use of documentFragment elements can provide an easy way to avoid reflows. They're easy to use, you can build your stuff on the side and then insert the fragment somewhere in the DOM as needed.


I have to say that is very impressive. Progressive rendering is somewhat obvious, but I didn't know layout thrashing could be this important.

Progressive rendering is obvious if you have things below the fold. But it makes sense even if everything is supposed to be visible from the start. The total download+render time can actually become faster, not just because some things are not visible. This is because xmlhttprequest is truly asyncronous so you can parallelize the download and the rendering. This also prevents the "page seems to be frozen" warning most browsers pop up after being stuck in javascript for > 3sec.

I found this technique to make major improvements for a similar task. This only makes sence once rendering is a big bottleneck of course, you don't want to over-parallelize.

Great post! Thanks for sharing. Ive made my way with backbone in a similar way to optimize rendering, and I learned some new tricks with this post. We use trello on daily basis to review our app and system development.

After spending countless hours optimizing HTML5 mobile apps, I find that DOM size and DOM reflow are usually the main issues.

DOM size mostly ends up mattering due to crawling your descendant selectors during a Style Recalculation. That and not having secure layout boundaries. Reflow (aka layout thrashing) hurts for sure, yeah.

I'm curious how do you measure the performance? 7.2 seconds means the load time, domready time or something?

I believe it is time until the visible area is rendered. Neither domready nor load time are very meaningful since all of the interesting rendering happens in javascript, triggered by the onready event.

The article isn't clear on it, but there have been studies (http://baymard.com/blog/making-a-slow-site-appear-fast) that show the perception of fast loading is actually more important than the real thing. Showing the user a "Loading..." graphic is the most common manifestation of this, but there are others. (Unfortunately, the original Forrester-Akamai study seems to be unavailable.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact