This might make be better as an RFC3229[1] (Delta encoding in HTTP) implementation-- putting the cache information in the querystring is strange when HTTP has a bunch of headers dedicated to it.
Cloudflare has a similar solution called Railgun[2] for updating dynamically generated pages. "reddit.com changes by about 2.15% over five minutes and 3.16% over an hour. The New York Times home page changes by about 0.6% over five minutes and 3% over an hour. BBC News changes by about 0.4% over five minutes and 2% over an hour."
Out of curiosity, I went ahead and calculated deltas between jQuery 2.2.3 -> 2.2.4 minified versions.
It generated a 512 byte delta string, instead of downloading the new version of 83KB, 512 bytes seems like a pretty significant optimisation. Also, 2300 bytes for 2.2.0 -> 2.2.4.
I haven't seen a lot of great service worker uses so far but this seems plausible. Good job.
I looked into bsdiff, and apparently it internally uses bzip2 for compression of several components independently. As a quick hack, I modified it to output all those pieces uncompressed (producing a large file of mostly 0s), and then tried compressing the result with various compressors, both to test other compressors, and to compress the entire file as one unit rather than as separate components.
The result ("ubsdiff" is bsdiff without compression):
So, 376 bytes for unmodified bsdiff, 309 bytes by compressing the whole uncompressed bsdiff file with bzip2, 264 bytes by compressing the whole uncompressed bsdiff file with brotli, and (strangely) 252 bytes with quality 9 brotli (the default is 11).
In this case, bzip2 of the whole file did noticeably worse than bsdiff's compression of three separate components. brotli of the whole file still won, though. Which made me wonder if brotli of the individual components would do better than brotli of the whole file. Turns out it does: 8006 bytes.
I've heard that Google's Inbox (and likely other websites) uses this technique, although the implementation AFAIU is not open source.
...actually, I think their's is different, in that it doesn't depend on service workers; AFAIU the approach is:
if you move from js-lib-v1.js to js-lib-v2.js, they'll go ahead and source js-lib-v1.js in the browser, and then also load js-lib-v1-to-v2.js, which is a server-side generated JS file that redeclares/redefines only the JS functions/modules/whatever that have changed from v1 to v2.
So, I believe their approach is much more intricate, because I believe it diffs the JS at a semantic level to generate the "patch the already-loaded JS by doing another JS load", vs. AFAICT your approach of just doing a textual diff.
Assuming my assumptions about both approaches are right, I definitely prefer yours in terms of simplicity; albeit the Google approach is (or was?) necessary to benefit most users.
I'm reminded of https://github.com/google/module-server/blob/master/README.m... from a (linked) 2012 presentation. I believe I found it in 2013, there might be other more recent talks. If I recall, the unique feature of the proposed Google loader was that it would dynamically take into consideration a dependency graph of the next JS to load and serve up just enough to show whatever page you needed, and if the next page in the graph had different dependencies, it would load just the ones you didn't already have. Can't remember the details right now, it's been a few years. Made tremendous sense at the time, but the implementation might need to be improved a bit given HTTP/2 and more modern JS loaders, including a WhatWG spec.
I often see packages or solutions to problems related to browsers/servers on hackernews. Why don't people who make browsers or servers implement these things as a standard? IE. There seems to be so many people using jquery/react. Why not just implement some of their functionalities natively into the browser?
There seems to be a trend of making un-opinionated software that acts as more of a sandbox for down-stream developers. I think that's great because we are now seeing the great solutions coming out of that sandbox. But at what point do we start to have a general consensus and implement some of these great solutions natively to remove the resource overhead?
A number of the core jquery innovations have been implemented in the browser. document.querySelector is the clearest example I think.
The big push in browser standards is the Extensible Web Manifesto[1]. The idea being that we should first give web developers powerful and general purpose primitives to explore and build solutions with, because there are many more web devs than there are browser implementors, and the standardization process is slow. We can then explore standardizing the most successful and useful results.
This project is using service worker, one of the canonical extensible web APIs as it gives sites tremendous control over their network use.
One of the reasons the web has succeeded so much more broadly than other UI toolkits, like Windows.Forms or Cocoa is that the web is very underspecified. The lack of a comprehensive framework has led to a proliferation of frameworks, each with different strengths and weaknesses. Competition between these has created thousands upon thousands of tools optimized for very specific use cases.
Because there are strong standards on a platform like iOS, you see almost no competition in many parts of the stack, and so that platform is limited by Apple's developer resources and constrained by the necessity of designing the architecture for the lowest common denominator.
The general philosophy in web standards groups has been to start by providing the simplest possible API that will give developers access to the functionality they need, allow developers to build frameworks on top of that. And then as common use cases emerge, back port the most widely used features into the API.
Do you really have to make them standard? There are not that many versions of most of these. Caching every common version of jQuery is pretty trival. The problem is that many people host it themselves or use one of many CDNs to serve it up and then expose users to the potential tracking that CDNs can do. Why don't the browsers just cache based on the integrity hash of the css and js links when it is there instead of the url? Then everybody can host his own copies for the cases where the user doesn't have it and cache hits will be much more common.
Where JQuery was once a necessity for building complex dynamic crossbrowser web pages, on today's browsers the popular parts of JQuery mostly just make the syntax sightly shorter and fix tiny inconsistencies.
JQuery functionality is for the largest part integrated in modern browsers, JQuery is living on because of momentum, it's ecosystem and the desire to support older browsers.
React meanwhile is popular in some places, but hasn't reached nearly the age and ubiquity that would warrant a native browser implementation.
Shadow DOM is the cloaking of the insides of a particular element, and thus a tool for web componentization and encapsulation. IIRC, while React encourages building reusable, encapsulated components, it doesn't hide the internal state from view [1].
Perhaps you're thinking Virtual DOM, which is a technique to apply only diff-ed state changes to the real DOM, which isn't actually standardized (yet), although alternate implementations (outside of React) exist?
Excellent use of service workers - are there any production sites around actively using service workers in production? Or are we yet to see the application of these mainstream?
I used Service Workers in production for a client to do offline mode and also to get truly instant page reload times for their SPA, which was important for kind of obscure reasons.
The Service Worker saved a complete HTML rendering of the current state of the React app, which was then served on reload, so that the correct view showed up even before any JavaScript was loaded.
Then in the next "requestIdleCallback", the React app was initialized with store data that was also cached.
It only works on Firefox and Chrome, so it's great for performance improvements or if you control the client's browser setup.
Other than that, try looking at your browser's debug pane for service workers and you'll see if any site has installed them. In my experience there are quite a few.
ProductHunt uses them to send notifications when you're off the site, which is a bit annoying, but maybe I said yes to it at some point...
How does this interact with the browser's cache? For each potentially-cacheable request, the browser goes to its cache and looks up the entry by method and URI. When no hit is found, it forwards the request to the server, then depending on properties of the response, it may cache that response. At what point in the flow does the sw-delta-client code intercept the request? (Before the browser cache, or after the browser cache but before the web request?)
The sw-delta-client rewrites the URL, and the altered request is sent to the server. The server responds -- let's assume with a cacheable respone -- and the browser's cache gets updated.
Next time, we request the same URI but it's already cached and not stale, so it can be served right away from cache without having to go to the server. Does sw-delta-client intercept such a get-from-cache request? What if the cached entry is stale and needs to be revalidated by making a conditional GET to the server. Does it intercept revalidation requests?
I haven't looked into the code yet, but the service worker has almost complete control over network requests and caching. It can calculate the correct response for a url (e.g. based on this delta encoding system), cache it, and use it in place of or alongside network requests in order to respond to a browser request.
Cloudflare has a similar solution called Railgun[2] for updating dynamically generated pages. "reddit.com changes by about 2.15% over five minutes and 3.16% over an hour. The New York Times home page changes by about 0.6% over five minutes and 3% over an hour. BBC News changes by about 0.4% over five minutes and 2% over an hour."
[1]: https://tools.ietf.org/html/rfc3229 [2]: https://blog.cloudflare.com/efficiently-compressing-dynamica...