Hacker News new | past | comments | ask | show | jobs | submit login
Hacking Twitter's Javascript (jazzychad.net)
114 points by jazzychad on Mar 27, 2011 | hide | past | favorite | 32 comments

"Basically, there is a timer that fires every 2 seconds to update timestamps, but each time it fires, it only updates timestamps that end in a certain number, starting at 0 and going up to 9 then repeating. So every 20 seconds all the timestamps update themselves. This is the weirdest thing I have ever seen"

Sounds like a performance optimisation to me. Having all of the timestamps update in one go could block the JavaScript thread for a non-trivial amount of time, making Twitter feel less responsive. This is a neat way of spreading that CPU intensive operation over 20 seconds, leaving gaps for other tasks to run. I imagine they use similar tricks in other places as well.

That was my second thought as well. But if you look at the selector

    $find("._timestamp[data-time$=" + E + "000]")
surely running this regex-find on DOM elements is not such a savings on performance?

It also has the weird side-effect that some tweets can look older than the tweets below them, making the ordering look incorrect. Meh, not really a big deal, but I was glad to have finally figured out what was really going on there.

While I don't have any information on the selector performance, here is a snippet of an email @bcherry, the author of the self-updating timestamps, sent on some performance findings:

"Unfortunately, as you'd guess, [updating timestamps] is slow, and degrades as more tweets get added to the page. With the last 12 hours of a [test account] timeline up (1,150 tweets), it would take 115ms to regenerate all the timestamps. I'd already built in optimizations to mark >24hoursago timestamps as not needing to be checked again, and to not change the text if the text didn't change, but it was still slow. Anything more than 50ms is too slow for a recurring process.


The unix timestamps appear to be an even distribution in the last digit, so I just do a query on an even slice of them based on the last digit (i.e. $("._timestamp[data-time$=0"])), and update those, changing digits every 2s. This means it takes a 20s cycle to update them all.

On the same [test account] home timeline with 1,150 tweets, each batch of ~110 timestamps took ~30ms, which is totally reasonable."

Interesting. So the total cumulative time to update the timestamps like this is 300ms (10 * 30ms), whereas the time to update them all in one shot is 115ms. This way it blocks for 30ms every 2 seconds (or 300ms spread over 20 seconds) instead of 115ms every 20 seconds. I'm not sure which is better, but it does sort of confirm that doing the extra regex matching introduces some amount of extra overhead.

Good point. I'm very curious now about the selector overhead. We do a number of operations across varying intervals. Surely one of the goals behind distributing the timestamp updates is to minimize the time we block this pipeline. That being said, it would be interesting to benchmark a variety of update interval and batch size combinations.

I've got extensive notes from last summer of more things to try. I totally believe we can get them counting second-by-second.

It's not a "regex find" as the string is not a regular expression. The string is a CSS selector. jQuery can find DOM elements by CSS selector very quickly by deferring to document.querySelectorAll when available.

If I were facing this performance issue, I'd check to see if there was a fast-performing way to restrict the query to just DOM elements in the viewport. I would guess that check would take longer than the date-updating code, though.

"$=" is a regex-find operator to find the ._timestamp elements with data-time property that ends in E + "000"

"$" is the same operator in a regular expression to denote that the match must be at the end of the string, and is obviously where some of the attribute selector syntax came from, but this has nothing to do with regular expressions and is likely not implemented using them: http://www.w3.org/TR/css3-selectors/

well, ok fine. it's a regex-like operator, which still has to do full suffix matching, which isn't cheap, performance-wise... that's the point.

The DOM performance hit is probably smaller than the network hit (both on the client and at the server side) for doing all at once (if I'm understanding it correctly).

It's all done client-side, so in this case there is no network hit for doing the timestamp updates anyway.

Guess I wasn't understanding it correctly then :)

my guess is the first thing the selector does is query by class, and then filters from that set. likely not as intensive as you think.

yes, I'm guessing it filters by class first as well. but it would be interesting to find out if just updating all the timestamps would be as fast (or faster) than doing the secondary regex filter anyway? i'm not sure, i haven't run any benchmarks on it.

Or, another alternative, add a secondary class so that the overall markup is class="_timestamp _timestamp_group0", etc. Perhaps they tested this and found it to be a negligible improvement, but then you're only selecting on the one class, rather than selecting on class and filtering.


Looks like the class only helps slightly. It's definitely doing some optimization though.

It is indeed a performance optimization.

cool, thanks for clarifying, dsa :)

Interesting method to getting current timestamp:

  var I = +new Date;
I've always used:

  var I = (new Date).getTime();

They're the same, +new Date doing an implicit cast to number. I prefer getTime as well for its clarity.

I benchmarked all the variations on Chrome, var foo = +new Date() was the fastest—not by a huge amount than the rest, but I remember there was one approach that was especially slow.

I did this while optimizing performance of a fast-paced game in Javascript (LineRage). Profiling showed that getting timestamps was responsible for significant portion of the processing time. I also ended up switching to a global clock approach.

Unfortunately I no longer have the original benchmark code/results. Would be happy to have someone replicate it.

And another form is:

  var I = Number(new Date);

There's also the equivalent Date.now() (as of ES5). If you'd like to write test code that overrides the current Date, this is the easiest version to override. Monkey-patching Date's constructor is a disaster. Plus, backwards compatibility can easy be added (see https://developer.mozilla.org/en/JavaScript/Reference/Global...) to non-ES5-compliant browsers.

regarding the timestamp performance optimization, seems like the more robust solution is using setTimeout to avoid blocking the main thread with timestamp updating. basically you perform updates on a set amount of nodes and then defer the updates for the rest using setTimeout. This scales to any number of DOM nodes but the solution outlined in this post works for the vast majority of use cases.

setTimeout does not run in a background thread that can be parallelized. There is no "main thread" vs. "background threads." If you write an endless loop in a setTimeout callback, none of your other JS will ever run. You can easily test this:

setInterval(function(){ console.log("hi"); }, 500); setTimeout(function(){ while(true) { console.log("no more hi"); } }, 100);

Yeah "main thread" was a misnomer/mistake. I should have just said thread. But that is exactly my point- running a long loop will block js execution but their solution only avoids the problem in most cases. I think the better optimization is to go through a subset of the timestamp iterations and keep using setTimeout's to do the remainder of the work until all of the timestamps are updated e.g. update 80 timestamps in 4 batches of 20. This is not an original idea- it's considered a best practice in client side programming.

Why wouldn't you use a setTimeout per timestamp set to go off a bit after the next time it should be updated? Seems like that would have a spread distribution and would also minimize the amount of work done.

Not per timestamp. Per x timestamps. For example, update 80 timestamps in 4 batches of 20. This should perform well in all cases because we are doing a predictable amount of work.

Great work but why not just contact Twitter directly and ask them to directly integrate your bugfixes?

If only 'twere so easy.

Have you ever actually ever tried that? Unless you know people on the inside, finding the right person within a company to contact is virtually impossible. And if you leave the info in the generic contact form, you can be sure it gets lost.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact