Hacker News new | past | comments | ask | show | jobs | submit login
Polaris: Faster Page Loads Using Fine-grained Dependency Tracking [pdf] (seas.harvard.edu)
99 points by Libertatea on Mar 9, 2016 | hide | past | web | favorite | 32 comments



This press release is rather light on details. The actual paper can be found here:

http://mickens.seas.harvard.edu/files/mickens/files/polaris....

I haven't completely digested it, but it looks like it works by a combination of pre-calculating a dependency graph for each page, which it then tweaks during the real page load depending on current network conditions.



Based on the related paper[1], this appears to be more of a proof of concept than something you might introduce into production.

For example:

"To evaluate a JavaScript file, the scheduler uses the built-in eval() function that is provided by the JavaScript engine. To evaluate HTML, CSS, and images, Polaris leverages DOM interfaces like document.innerHTML to dynamically update the page’s state"

So, it relies on eval(), which will slow down javascript significantly, and document.innerHTML, which will slow down rendering. It's trading asset download improvements for rendering slowdown.

The high level advice of streamlining the dependency chain of page assets is laudable, but this doesn't sound like the right approach.

[1]http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf


I'm not 100% certain, but it looks like that parsing is meant to be done ahead of time, rather than on the fly?

> "To use Polaris with a specific page, a web developer runs Scout on that page to generate a dependency graph and a Polaris scheduler stub. The developer then configures her web server to respond to requests for that page with the scheduler stub’s HTML instead of the page's regular HTML."

(But even if I've misunderstood that and it is realtime, the network is still a tighter bottleneck than the page renderer in most cases, so it seems like a generally reasonable tradeoff to me.)

I'm curious how well this compares to e.g. a plain old webpack solution though -- ideally you'd have organized your assets well enough in the first place that something like Polaris would be unnecessary. "Ideal" is often a long way from reality, of course.


The calls to eval() and manipulation of innerHTML are runtime...they describe it as what the "scheduler stub" does, which is the runtime code that parses the "processed page" and recreates it.

eval() is a significant concern for performance.


This seems very similar to the system I came up with to reduce load times for JavaScript and CSS assets in Gate One a few years ago:

https://github.com/liftoff/GateOne/blob/master/gateone/core/...

Instead of letting the browser figure out which assets to load--using multiple GET requests--it delivers assets over the existing WebSocket connection and also caches those assets at the client in IndexedDB. So the client only ever has to download JavaScript/CSS assets once unless they change on the server (in which case they'll get re-synchronized automatically).

The mechanism loads much faster than allowing the client browser to perform all those (latency-unfriendly) GET requests. It is also superior to file concatenation since you can change any individual asset without having to worry about the impact of all clients having to re-download that one huge concatenated file.


Doesn't HTTP2 solves this by using Stream dependencies? It seems to be a lot clean than using JS eval()


Yes, it does. By keeping the connection persistent while assets are loaded you completely do away with the latency introduced by using multiple separate GET requests.

HTTP2 is basically like loading all your assets over an existing WebSocket.


I'm curious to know how much this improves page loads when resource hints are used properly. I may be failing to understand the paper, but it seems to me that strong usage of resource hints plus HTTP/2 dependency-based prioritization makes a lot of this technique obsolete.


There's quite a gap and a lot of head scratching between the theory of H2 dependence based prioritization and the reality of implementation - for example Chrome haven't even implemented it yet.

Even with resource hints, H2 etc. the browser is still discovering the dependencies at run time and relying to the developer to hint them correctly for some performance speed up.


Except with resources where it's trivial for the server to determine the dependency graph, like ES2015 modules and HTML imports. In these cases the server can send push dependencies before the client even receives the initial document.


I've seen significant script delays from loading files on the server, before ever getting to the web browser. I.e. apparently the file server was on the network, but not the same machine as the web server. I was using Smarty as a templating engine, and I had a template for a table row that loaded many times during the rendering of a particularly long table. It literally was loading the file once for every time a table row had to be generated, and it came with a lot more overhead than I would ever expect.

I switched the include directive to inline for the table row, but I was using template directives that were not supported inline by the template compiler. In the end I had to add some code to the compiler in order to get a reasonable performance time (thank you open source).


How does it compare with HTTP/2 or even SPDY?


Or websites could just optimize themselves for fast “first render”. It’s not rocket science. Start by making sure your non-critical assets don’t block rendering. (E.g. many large news sites still have FOIT-inducing web fonts, which should be considered an amateur’s mistake at this point.)


It's not an either/or. If this helps people who want content from slow websites without requiring the slow website to do very much, and it has no impact on existing websites, then what's the problem?


Couldn't you get close to optimization by just observing the load order in a real page load? Why go to the trouble to construct the dependency graph?


What I've found helps, if you really need to squeeze that performance out of a server, is setting up a ram disk for your /var/ directory, or one small ramdisk for every small web page you are running.

That running with lighttpd will probably be the fastest you can easily pull from a web-server.

Also I try to never directly edit the ramdisk. I write changes to a git repo and write a small script to pull from the repo and directly write to the disk immediately.


Is the software available to download?


I don't see any place to get it. That's what I've always found funny about academic CS work.


tl;dr anyone? For me - what is seems it tries to automagically optimize poorly written webpages.


> tries to automagically optimize poorly written webpages.

Doesn't this describe all web browser development of the past 15 years?


This is one of the most underrated comments I've seen in a while. I had an audible laugh.


Most web pages have suboptimal designs and dependency downloads. This system speeds things up by changing the order of dependencies such that the pages can be rendered more quickly.

If you do it manually, you can generally do things that will speed things up more significantly, but that requires time, specialized knowledge, and attention to a not very glamorous area of web development.

An example might be an image that shows up on a hidden tab that does not have enough information in the html for the browser to understand it is not a required element for initial rendering. If the request for that image can be pushed to the time frame after the initial render the end user has a better experience.

When the web was limited to 28K speeds for a significant amount of the userbase people paid a lot more attention to these kinds of things. Now that a significant amount of end users have broadband speeds, web developers are more concerned about other things because some things as simple as leveraging a CDN and outputting appropriate cache control headers are enough optimization to provide adequate performance.

It's not clear on my initial read if this is a new rendering engine, a client side script, or a server side parser that re-shapes output, but the intention is obvious - download things in the right order to get the fastest page load times. I've done this kind of optimization before and it can make an enormous difference - especially with bandwidth challenged end users.


It changes the page fairly dramatically. Before rolling out a page, you run it through their parser, which creates skeletal html and a javascript stub which you serve in place of the original html.

Then, when this substituted page is loaded, their javascript rebuilds the original web page with calls to eval() and whatever.innerHTML, in a way that the dependencies are supposedly streamlined.


Ug, just what we need - more pages that don't work without javascript :(. You know what really speeds up the web? Turning off javascript...


InstartLogic have been delivering a service that does something similar for a few years - not sure how good it is though



I never imagined there could be an essay that funny on the failure of Dennard scaling (https://en.wikipedia.org/wiki/Dennard_scaling) and the period immediately preceding it when hardware designers had more transistors than they knew what to do with.

Yeah, thinking about it:

386: we can put a 32 bit CPU on a die

486: we can improve that, add a serious cache, and FPU on die

Pentium: more than one instruction at a time

Pentium Pro: and out of order

Pentium 4: No one listened to those warning that “THE MAGMA PEOPLE ARE WAITING FOR OUR MISTAKES.” (Intel high level engineering management frequently screws up big, see also the multiple memory architecture errors, two resulting in 1 million part each recalls.)


The Mickens of fame and fortune.

I aspire to someday be the kind of guy who gets the definite article prepended to his name


> I aspire to someday be the kind of guy who gets the definite article prepended to his name

Some comedian (Demetri Martin, maybe?) called this "the American version of royalty."


At least it's not hereditary.


That's not necessarily something to aspire to. For example, many people refer to Mr. Trump as The Donald.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: