Hacker News new | past | comments | ask | show | jobs | submit login
Drowning in JavaScript (dangoldin.com)
98 points by dangoldin on Dec 2, 2013 | hide | past | favorite | 50 comments



> Even if these libraries get cached in the browser it’s still quite a lot of JavaScript that’s executed every time a site is loaded.

Is it? 39 libraries that are 35 lines of code each or just define a bunch of functions that your code can then call optionally may well not take that long to parse. How many gems does the typical Rails application use? Libraries are the equivalent on the web, only someone else doesn't resolve the dependencies; you have to include the dependencies manually. Is it that surprising that people are using a variety of helper libraries to do ads, analytics, DOM manipulation, etc, etc? That's fairly standard for development.

I think saying we're “drowning in JavaScript” because we've gotten a lot better at modularizing our code instead of copy-pasting it from everywhere seems like lamenting the wrong part of the problem by far. I would rather every site include Google Analytics, DoubleClick, jQuery, and 10 jQuery UI plugins than be subjected to the cascade of security and functionality bugs folks would run into if everyone was trying to roll their own solution for everything. There are already enough with the use of common code.

Using libraries for third party code is good, and the fact that we're doing it is a step forward, not a step back. Now, there may be room for folks to choose the third party code more wisely, but that's a perennial problem when you're willing to use code that's already been written.


Sure, but it's a leaky abstraction. Downloading 30 .js files (and thus wait for 30/x HTTP roundtrips, where x is the number of allowed concurrent connections) before any javascript on the page can run is problematic unless you're adding extra complexity by using JS minification etc.


> and thus wait for 30/x HTTP roundtrips, where x is the number of allowed concurrent connections

Which is 1.

This is javascript here, not CSS or images. The default behaviour is to stop everything, download the script file (synchronously) then execute it (also synchronously) then resume. This means the default is to download then execute one JS file at a time (which is why you should always put your CSS files before your JS files).

Concurrent download (let alone out-of-order execution) have to be opted-in via specific attributes:

@async will queue script download (asynchronously) and execute it whenever it can once downloaded. Multiple @async scripts may be executed in any order, depending how fast they arrive and when the browser finally decides to exec them. It is supported in webkit-ish browsers, Firefox (>= 3.6) and MSIE >= 10

@defer will also queue script download but it guarantees the scripts will only be executed 1. between parsing and end DOMContentLoaded triggering, 2. in order. It is suported in webkit-ish browsers, Firefox (>= 3.5) and MSIE >= 10 (MSIE has had it since ~IE5, but as usual its behavior tends to be ill-defined and buggy)

(note: this is for <script src> tags in the downloaded source, not when they're dynamically inserted)


This isn't the case, I just verified it. A test.html page with <script src="dummy32k-a.js"> and another for dummy32k-b.js show as downloading concurrently in chrome devtools network timeline.

It makes sense since the HTTP fetch order doesn't affect semantics as long as you obey the execution order from the page. Unless you're doing something very contrived, such as serving them as no-cache and generating different js dynamically from server side if a previous script has been requested...


Last I checked, browsers (chrome? firefox?) only allow 6 concurrent downloads from a web server. Do all resources load concurrently if you have seven large javascripts in a page?


I'm not sure it's a leaky abstraction per se. But there are definitely issues with it—issues things like the Google Closure Compiler try to deal with. One of them is this: eventually, Rails realized you should probably link your application to a particular version of a gem (see bundler), just like the Java world realized web apps probably shouldn't be resolving dependencies at runtime (see WAR files).

Right now, JS apps are often just serving the latest version of libraries for things like DoubleClick and Google Analytics. Google can update what they're serving at any time without letting you know. If you use a build system to bundle the library into your code, you lose the benefits the user could get from already having the library cached. If you don't, you're exposed to someone else's whims on library updates, and you add HTTP requests to boot.

The same goes for bundling something like jQuery into your app. You can bundle it in, but that data has to be sent downwire to the client when the JS goes down, even if it is just in one HTTP request. Or you can include it via CDN and a lot of users won't have to download it again, despite having to do an HTTP request. So it's all about choosing the right tradeoffs.

Also, JavaScript runs as it downloads, in page order. So you don't have to wait for all 30/X round trips, though you may have to wait for X round trips for the last file (in page order) to run. Then it becomes a matter of prioritizing what needs to load first. Naturally this is stuff we wish we didn't have to worry about, but worrying about bandwidth and latency and how and when and in what order things download has always been necessary for network applications, and I don't see that need going away anytime soon. I'd love to be proven wrong though :)


Most properly built sites are already combining the JS and minifying it. It's a largely solved problem that is automated with the right tools. Not much added complexity in it.


But most sites are NOT "properly built" the devs just cut and paste vast librarys (some times more than once) into every site!

Trust me on this I spent my last 3 years doing analysis on a large number of sites for one of the worlds major publishers.


Correctly. Nowadays most of the web application frameworks have a built-in concept of an "asset pipeline". You drop your javascripts and stylesheets in a specific folder, and when starting your application in a production environment, everything is concatenated, minified and optimized, so you have basically exactly one javascript file and one stylesheet file to download for your users.


Wasn't HTTP pipelining supposed to fix the round trip problem circa 1999. If everyone has given up on that, maybe HTML5 needs to bring back .jar files for JS or something.


no every js element is a separate download and these js files are typically 35kb not 35 lines.


the worst offender on this list is Adobe Analytics. If not loaded, it breaks most websites, as I can only imagine that its calls are ... (shudder) ... synchronous.


I agree with some of the conclusions, namely that many people use Ad Block to avoid the slowness that downloading and executing oodles of JavaScript can create.

The methodology here suffers from an apparent lack of insight into what a 'library' is. Note that tech savvy companies such as Facebook and Twitter are low on the list with only a few 'libraries', while media sites have many listed. All this proves is possibly that Facebook and Twitter have less external JavaScript, and are savvy enough to combine files to reduce HTTP requests.

Listing the total KB of JavaScript executed could help clarify the matter. But then, it would also give advantage to those who minify vs those who don't. Fair enough as download time is part if the issue - though the article does not mention that file size plays a role as well as number if http requests. To accurately portray who is using the most JavaScript, you'd have to count opcodes, and combine that with http request count and total download size.


Definitely performance is a nice benefit. I don't use AdBlock; I have a few thousand lines in my /etc/hosts that map various bad actors (vast chunks of Javascript that doesn't do anything useful, 1x1 .gifs, ads, annoying widgets, etc.) to 255.255.255.255 (NXDOMAIN), and use dnsmasq on that machine to get that benefit across to the rest of the local network, regardless of browser. The thing that usually triggers another bout of "block these hosts" is if I notice that my ostensibly idle browser is burning CPU. I don't want to wait several seconds for a page to render, and I certainly don't want to waste laptop battery time on refreshing an iframe with "what people are saying about this on Twitter" or feeding someone's analytics server. Half the time a site runs really slowly, it's usually doing some nonsense that I'd be happier without, or my browser is waiting to download something that I don't need to see.

The total amount of Javascript in kB is one issue, but you don't need to hit the level of opcodes, I think. Counting the nodes in the parsetree would be useful, as well as measuring the amount of memory that is allocated by the Javascript, determining whether it runs once or continuously or on some trigger (e.g., scrolling, hovering, mouse movement). If you ran a profiler and measured the amount of time spent in the Javascript bits of your runtime (e.g., how long to parse the JS, how long to eval it, how often it runs, throw in some memory profiling) you could probably get some interesting numbers.


Yea - this was more of a quick write up and this is a good excuse to jump into the PhantomJS world to see what's actually happening. Once I get to that I'll do a follow up post.


Ghostery publishes a lot of this information too :)

http://www.knowyourelements.com/


Ah thanks for that. I failed at Googling for it. Wish there was an easy way to get at the raw data though - have a list of these libraries by site.


No worries, your information is super straight-forward. Really enjoyed your post.


Thanks. One of these days I'll take a stab at using PhantomJS and trying to pull this data for more sites.


You're definitely using the wrong metric to assess how much JavaScript is used. Look at total weight, not total requests. Just because a site combines and minifies its libraries doesn't mean its not using them.

As for why publishers use more JavaScript than tech companies, it comes down to the amount of technical talent. Maybe 10% of a digital publisher's employees work on the tech team, as opposed to ~50% for a tech company. Given these resource constraints, JavaScript requests proliferate due to two reasons:

1. With less time, engineers turn to external libraries more often to get things done even when they could address the problem with less code if they wrote it themselves.

2. We can't necessarily spend the time to tune everything and combine/minify libraries for peak performance.

That being said, I'm sorry performance sucks. I'm working on it. Can anyone recommend good tools for analyzing performance bottlenecks on the web?


In actual practice, total number of requests may have as much or more impact on observed load time than total bytesize. Depending on speed and latency of the network. Requests can be expensive.


This Business Insider article has over 100 scripts: https://github.com/gorhill/httpswitchboard/wiki/Quick-tour-%...

For a barely six-paragraph article...


I only use Disconnect because I like interactive sites but don't like being tracked. This blog post reports 11 blocked scripts. So even if you think sprinkling some social stuff onto your blog doesn't put you in this category, it does.


Just so you're aware, Disconnect actively breaks some sites, even when they aren't trying to be forcefully invasive.

I have a site that logs AJAX requests to Google Analytics (since that's essentially the only "page views" on the particular site). I was doing a test for if (_gaq) before trying to push into the array, because I wasn't more concerned with giving the users a good experience than tracking page views, and that did work if Google Analytics was being blocked from loading at all. Unfortunately, I eventually learned that Disconnect was causing that test to throw a JavaScript exception instead of just blocking GA from loading and leaving _gaq undefined. As you can imagine, narrowing down that it was Disconnect causing the problem from a few random users' "the site isn't working" complaints was a lot of fun...

It's not a huge site, but I do spend a couple thousand dollars a year keeping it online. I'll be damned if I'm going to spend that to keep a free site online and then feel bad about running some advertising and analytics to make sure it doesn't end up in the red.


John Resig talks about a similar problem and a potential solution - http://ejohn.org/blog/fixing-google-analytics-for-ghostery/


That's essentially what I was doing previously, but the if (_gaq) test was throwing an unusual reference error when Disconnect was installed. I ended up finding that doing an explicit if (typeof _gaq === 'object') was safe (and then wrapped that in a try/catch for good measure).


+1 for Disconnect over Ghostery


What's the major difference, may I ask?


Mostly just that Disconnect is open-source, and isn't backed by a media & analytics company (Evidon Inc.).

Some find it to be more transparent, and in many ways it is, but really you won't see much of a difference between the function of the two. I personally use Disconnect, and have yet to find it break any sites. I find setting Flash to ask before being enabled on web pages tends to break more sites than anything else.


The major different for me is the privacy concerns around Ghostery. http://lifehacker.com/ad-blocking-extension-ghostery-actuall...


I was surprised at the amount of trackers on the net after I installed Ghostery. News publishers are the worst in my experience - up to 20 ad/tracking plugs on one page. Insane. Sadly, some sites do not work properly if you (Ghostery) disables them.

Unlike the author, it's not the amount of JS that bothers me, but the amount of trackers involved. Companies you never hear about collecting loads of data on your online activity.

I was looking for shoes on one site then days later visited a totally unrelated resource. They showed me ad with the exact boots I viewed. Thankfully, with Ghostery this doesn't happen.


Not surprised to see forbes.com at the top of the list of shame. I wonder if they end up losing readers by trying to monetize too much


How do you define "Javascript libraries"?

I just tried Forbes' front page, and I got 108 scripts being pulled + [all inline JS of top page combined] = 109 scripts.

Just for ad.doubleclick.net, it's 30 external javascript files pulled.

And I got this result without the 19 iframes which were blocked, and which certainly would have raise further the script count.


I was relying solely on the Ghostery numbers. In theory I should use a headless browser that executes to the JS to track everything it's doing but that's still on my to do list.


It seems the author of this article is confusing the delivery of physical JavaScript libraries (as separate files) with number of libraries used.

"Five of the 13 publishers I looked at included at least 20 JavaScript libraries while the most libraries included by a social network was 4, which was Pinterest"

For example, he claims 4 on Pinterest, but I quickly looked and one of those files, called: bundle.e3e1df0f.js which has compressed MANY LIBRARIES IN IT (975 KB worth, without gzipping), like JQuery, underscore, backbone, require.JS, Google closure, etc...

Just because a site is packaging up 30 JavaScript libraries into one file, doesn't mean it's not using all these libraries.

I'll also add, that I think a lot of the 3rd party tracking libraries don't really work if you bundle them up and deliver them with your own code, which is why they're usually referenced separately.


These sorts of realizations always make me a little bearish on the future. Sure, we've got a thousand times as much bandwidth as we did fifteen years ago, but performance sure hasn't improved a thousand-fold.

Sure, despite the best efforts of the industry, performance improvement will sneak in here or there. Things probably will trend upwards. But so, so much slower than they could.


Paul Allen (of Microsoft and Ticketmaster fame) refers to this as a "complexity brake" in his critique of Kurzweil's Singularity concept.

Bandwidth is well and good, but latencies are non-fungible, as are other overheads.



It's not surprising your top 10 sites are mostly news site. News websites have tons of third-party ads/tracking, and many of them use third party video/login/comment/sharing systems. These kinds of websites are just not your typical single page web app.


Seems like an inverse correlation between the ability to make money online and the number of libraries.


I wonder if they're serving the files from different domains? Browsers can only download a few files from one domain at a time. So if there are four sub-domains setup for the 35 files, this would make the JavaScript load quicker.

Am I right?


Would be interesting to see if the pages are still usable (i.e. you can get to and read the content) if they just load the css and html.


Including the total size of these libraries and response time would be useful too.


whitehouse.gov minifies, obfuscates, and shards their scripts, giving them anonymizing "filenames." Yes, obviously "files are myths."

I'd rather sites explicitly set scripts so that I can cherry-pick what to block.


Well, it was that way at one point...

[Previous post made from my phone.]


"I suspect most people use AdBlock not because of ads but because of the degraded performance"

Oh come on. Practically speaking, "most people" don't care about the "degradation" of a site due to Javascript. Most people wouldn't even know what that meant.


They don't need to know what it means. They just need to notice "with ads" = slow, "without ads" = fast, and even if the visual presence of ads doesn't bug them, they still see a win with AdBlock.


I think that people blame the ads for the performance and experience issues and that's what causes them to search for ad blockers rather than the getting rid of ads for the sake of getting rid of ads. I might of course be wrong.


> I think that people blame the ads for the performance and experience issues

And they're right. Ads slow things down, especially because many ad networks are still using archaic JavaScript. (Lots of document.write...)


Most explanations for Adblock are made in terms of HTTP Requests, so most people are concerned with performance. They may not know all the full story, as the typical web dev likely would, but your characterization is not offbase nor unjustified.

In any event, MrPleb's dismissive response is unwarranted.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: