
System loads web pages 34 percent faster by fetching files more effectively - qwename
http://news.mit.edu/2016/system-loads-web%20pages-34-percent-faster-0309
======
tyingq
Interesting. The paper was released before HTTP/2 was in widespread use. They
do show that their approach has significant improvements over SPDY alone...I
wonder how the comparison to HTTP/2 alone would fare.

~~~
breischl
Aren't they orthogonal? No matter how fast HTTP/2 is or how much it decreases
connection setup times, requesting resources in the "right" order will always
be faster than doing it in one of the "wrong" orders.

More efficient protocols might reduce the disparity, but there should always
be one. Right?

~~~
tyingq
Yes, it would always be faster, assuming relatively heavy pages. But the
reduced number of requests, hpack compression, and server push features in
HTTP/2 might reduce the performance improvement to something less impressive.

Given that Polaris has some downsides, it might reduce it to the point that
you wouldn't consider it. In it's current state, Polaris requires a lot of
pre-work...including dependency analysis via a real browser. It also serves up
pages that wouldn't work with javascript disabled, and might not be terribly
search engine friendly.

------
qwename
Relevant paper: "Polaris: Faster Page Loads Using Fine-grained Dependency
Tracking",
[http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf](http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf)

~~~
naasking
Argh, don't people Google names before using them? Polaris was HP's virus-safe
Windows project:

[http://www.hpl.hp.com/techreports/2004/HPL-2004-221.html](http://www.hpl.hp.com/techreports/2004/HPL-2004-221.html)

~~~
manarth
The quintessential hardest problem in IT: naming things.

Polaris is also a missile. And a star. And a PowerPC port of Solaris.

We're always going to have name duplication. At least it's fairly easy to
disambiguate with a Google search.

------
leeoniya
but we can already make web pages load 500% faster by not shoveling a ton of
shit, not loading scripts from 60 third-party domains (yes stop using CDNs for
jQuery/js libs, those https connections aren't free - they're _much_ more
expensive than just serving the same script from your existing connection),
reducing total requests to < 10, not serving 700kb hero images, _1.22MB_
embedded youtube players [1], 500kb of other js bloat, 200kb of webfonts,
150kb of boostrap css :/

the internet is faster than ever, browsers/javascript is faster than ever,
cross-browser compat is better than ever, computers & servers are faster than
ever, yet websites are slower than ever. i literally cannot consume the
internet without uMatrix & uBlock Origin. and even with these i have to often
give up my privacy by selectively allowing a bunch of required shit from
third-party CDNs.

no website/SPA should take > 2s on a fast connection, (or > 4s on 3g) to be
fully loaded. it's downright embarrassing. we _can_ and _must_ do better. we
have everything we need _today_.

[1] [https://s.ytimg.com/yts/jsbin/player-en_US-
vfljAVcXG/base.js](https://s.ytimg.com/yts/jsbin/player-en_US-
vfljAVcXG/base.js)

~~~
chrisfosterelli
> stop using CDNs for jQuery/js libs, those https connections aren't free -
> they're much more expensive than just serving the same script from your
> existing connection

Do you have a source for this? My understanding is that, in real usage, it is
cheaper to load common libraries from a CDN because in a public CDN (for
something like jQuery), the library is likely to already be cached from
another website and has a chance to even already have an SSL connection to the
CDN.

Obviously 60 separate CDNs is excessive, but I don't know if the practice
altogether is a bad idea.

~~~
bartread
Using a CDN for _common_ libraries _may_ help, but doesn't always, and is
something you should measure rather than just assume. The situation where a
CDN actually hurts performance is one I've seen periodically at different
clients.

When people talk about serving jQuery, or J. Random JavaScript library, from a
CDN it means the _specific_ _version_ of jQuery (or whatever) that they're
using. There's literally no guarantee that the specific version you need will
be in any given user's browser cache, and this is exacerbated if you loading
multiple libraries from a CDN, or from different CDNs. If your CDNs serve
files with low latency then it may not be a big problem, but not all CDNs do.
Slow responding CDNs will slow your page loads down, not the reverse.

Moreover, if you're serving over HTTP2/SPDY there's even less likely to be a
benefit to using a CDN. Again, it's something you need to measure.

One area where a CDN (e.g., Cloudflare) can benefit you is by serving _all_
your static content to offer users a low-latency experience regardless of
where they are in the world, but that's rather a different matter from serving
half a dozen libraries from half a dozen different CDNs.

~~~
richmarr
> ... something you should measure rather than just assume

How do people do this?

My obstacle with this point is each CDN would have a different impact in each
locale due to the locations of their points of presence, and CDN-ing each
resource would a different impact based on the sites that particuar individual
had visited.

Measuring it in any useful way in advance of a change would be really hard
unless I'm missing a trick, and measuring it in hindsight would require the
kind of performance logging that PaaS-level apps don't have access to.

~~~
leeoniya
my recommendation is to first serve everything yourself over TLS (minified,
bundled, gzipped). if you can, run HTTP2. then measure.

if this does not deliver the perf you expect (chances are that it will), then
look into CDNs.

if you need to serve media that needs a lot of bandwidth (expensive) then look
into CDNs.

if you need some form of real-time comm like WebSockets or WebRTC that
actually requires low latency, look into distributed systems (amazon, google,
azure, cloudflare?, etc..)

~~~
richmarr
Sure, my question was _how_ people measure.

~~~
leeoniya
your best bet is [https://developer.mozilla.org/en-
US/docs/Web/API/Performance](https://developer.mozilla.org/en-
US/docs/Web/API/Performance)

combined with either IP2location or Geolocation apis.

------
tedunangst
So, uh, what does it do? I mean, I can't even tell if it's a server or client
side change.

~~~
tyingq
The research paper[1] describes Polaris. Basically, you have to make large,
sweeping changes to your html, server side. Instead of your original page + js
references, you serve a bunch of javascript that then dynamically recreates
your page on the client side in the most performant way that it can:

• The scheduler itself is just inline JavaScript code.

• The Scout dependency graph for the page is represented as a JavaScript
variable inside the scheduler.

• DNS prefetch hints indicate to the browser that the scheduler will be
contacting certain hostnames in the near future.

• Finally, the stub contains the page’s original HTML, which is broken into
chunks as determined by Scout’s fine-grained dependency resolution

[1][http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf](http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf)

~~~
manmal
That sounds like a fun idea for a precompiler.

------
GrumpyNl
Talk to some people in the porn industry. They will tell you how important
fast pages are. You will also be surprised what they have done to achieve
this.

~~~
sanxiyn
This sounds potentially interesting. Care to elaborate?

------
mikeytown2
From my experience preconnect is a big improvement for connecting to 3rd party
domains. Will also mention that once you have JS deferred, CSS on a 3rd party
domain (google fonts) can cause some major slowdowns in terms of the start
rendering metrics if using HTTP/2 on a slow connection; all the bandwidth is
used for the primary domain connection and not used for blocking resources,
end result being images get downloaded before external blocking CSS.

------
jakeogh
The web is way better without JS. Rendering engines could in principal do the
same (improved) dependency tracking.

------
kvz
I feel Webpack deserves a mention as it resolves the dependencies at build-
time and compiles one (or a few chunked/entrybased) assets, hence also solving
the problem of too many roundtrips

------
WhiteSource1
What are you looking to learn about a CDN?

There are many ways to accelerate page speed and, like everything else, it's a
question of costs and benefits. For most things, some level of technical debt
is OK and CDNs even for jQuery are good. Of course, good design and setting
things up right is always the best - and the other question is where your site
traffic comes from.

------
uaaa
Is there a comparison with Google AMP?

~~~
andrewguenther
Paper was released way before AMP. Also not really related to AMP, so I don't
think it's necessary to draw a comparison.

------
Thiez
> What Polaris does is automatically track all of the interactions between
> objects, which can number in the thousands for a single page. For example,
> it notes when one object reads the data in another object, or updates a
> value in another object. It then uses its detailed log of these interactions
> to create a “dependency graph” for the page.

> Mickens offers the analogy of a travelling businessperson. When you visit
> one city, you sometimes discover more cities you have to visit before going
> home. If someone gave you the entire list of cities ahead of time, you could
> plan the fastest possible route. Without the list, though, you have to
> discover new cities as you go, which results in unnecessary zig-zagging
> between far-away cities.

What a terrible analogy. Finding a topological sorting is O(|V|+|E|), while
the traveling salesman problem is NP-complete.

~~~
to3m
That's amusing, and I wonder if this particular analogy was chosen
deliberately. But I don't think there's anything wrong with it - it's designed
to make intuitive sense to non-programming readers, not to be some rigorous
description that can be automatically translated into optimal code.

~~~
Thiez
Perhaps a different analogy would be better, e.g. ordering materials when
building a house. If you can only order one material at a time, you would
probably want to order the concrete for laying the foundation before ordering
the roof tiles. Loading website resources is a lot like that (at least
compared to the travelling salesman).

~~~
saurik
This still smells NP Hard. I mean, in practice for simple dependencies it is
probably quite tractable, but this is a combinatorial optimization problem
that seems pretty similar to an online modification to Job Shop Scheduling,
where the material requirements map loosely to machine-task pairings that
would be unblocked by orders, acting to make the problem more complex, not
easier.

~~~
Thiez
Your sense of smell is off. Assuming a directed acyclic graph (the usual shape
of a dependency tree), assume we write the result order to a list L. Walk over
all vertices once, create a mapping M of each vertex to its number of incoming
edges, and add all vertices without incoming edges to a list R. This is O(|V|
+ |E|). Now, pick the first item of R and append it to L. For each outgoing
edge in the item we chose, decrement the associated count of its receiving
vertex in M by 1. For each item where the count becomes 0, append it to R.
Keep running until R is empty, and you're done. This second part of the
process is again O(|V| + |E|).

We're done and found a topological order in O(|V| + |E|).

~~~
saurik
Yes, I know that topological sort is O(|V| + |E|). What I am claiming is that
the problem of buying building materials in an optimal order isn't topological
sort: there are numerous topological orders of a graph, but some will be much
much slower than others to build at your construction site. To determine which
of the many possible orders is the fastest one, you have to take into account
how long various tasks will take and how many workers you have available for
those tasks. When you get some materials, that unlocks certain parts of the
gantt chart of what sounds like Job Shop Scheduling. To me, this is the same
form of complaint as pointing out that Traveling Salesman isn't Topological
Sort.

