
SW-delta: an incremental cache for the web - instakill
https://github.com/gmetais/sw-delta
======
Scaevolus
This might make be better as an RFC3229[1] (Delta encoding in HTTP)
implementation-- putting the cache information in the querystring is strange
when HTTP has a bunch of headers dedicated to it.

Cloudflare has a similar solution called Railgun[2] for updating dynamically
generated pages. "reddit.com changes by about 2.15% over five minutes and
3.16% over an hour. The New York Times home page changes by about 0.6% over
five minutes and 3% over an hour. BBC News changes by about 0.4% over five
minutes and 2% over an hour."

[1]:
[https://tools.ietf.org/html/rfc3229](https://tools.ietf.org/html/rfc3229)
[2]: [https://blog.cloudflare.com/efficiently-compressing-
dynamica...](https://blog.cloudflare.com/efficiently-compressing-dynamically-
generated-53805/)

~~~
niftich
This was actually recently discussed on the github issues:

[https://github.com/gmetais/sw-delta/issues/1](https://github.com/gmetais/sw-
delta/issues/1)

------
eknkc
Out of curiosity, I went ahead and calculated deltas between jQuery 2.2.3 ->
2.2.4 minified versions.

It generated a 512 byte delta string, instead of downloading the new version
of 83KB, 512 bytes seems like a pretty significant optimisation. Also, 2300
bytes for 2.2.0 -> 2.2.4.

I haven't seen a lot of great service worker uses so far but this seems
plausible. Good job.

Delta if you wonder what it looks like:
[https://gist.github.com/eknkc/fb27cfaee871a007c3cabfda5df03a...](https://gist.github.com/eknkc/fb27cfaee871a007c3cabfda5df03ab0)

~~~
JoshTriplett
Nice!

I wonder how the delta size compares to something like rsync or bsdiff?

~~~
deno
376 bytes for bsdiff(utf8, utf8), which is of course worse than
brotli/gzip(utf8(text delta)) (267/305 respectively).

~~~
JoshTriplett
Interesting!

I looked into bsdiff, and apparently it internally uses bzip2 for compression
of several components independently. As a quick hack, I modified it to output
all those pieces uncompressed (producing a large file of mostly 0s), and then
tried compressing the result with various compressors, both to test other
compressors, and to compress the entire file as one unit rather than as
separate components.

The result ("ubsdiff" is bsdiff without compression):

    
    
        85659  jquery-2.2.3.min.js
        85578  jquery-2.2.4.min.js
          376  jquery.bsdiff
        85994  jquery.ubsdiff
          264  jquery.ubsdiff.brotli
          252  jquery.ubsdiff.brotli9
          309  jquery.ubsdiff.bz2
          403  jquery.ubsdiff.gz
          360  jquery.ubsdiff.xz
    

So, 376 bytes for unmodified bsdiff, 309 bytes by compressing the whole
uncompressed bsdiff file with bzip2, 264 bytes by compressing the whole
uncompressed bsdiff file with brotli, and (strangely) 252 bytes with quality 9
brotli (the default is 11).

~~~
JoshTriplett
Trying the same thing with a larger delta (jquery-2.2.4.min.js to
jquery-3.1.0.min.js) produced different compressor rankings, though:

    
    
         85578  jquery-2.2.4.min.js
         86351  jquery-3.1.0.min.js
          8663  jquery.bsdiff
        101359  jquery.ubsdiff
          8380  jquery.ubsdiff.brotli
          9209  jquery.ubsdiff.brotli9
          8853  jquery.ubsdiff.bz2
          9550  jquery.ubsdiff.gz
          8360  jquery.ubsdiff.xz
    

In this case, bzip2 of the whole file did noticeably worse than bsdiff's
compression of three separate components. brotli of the whole file still won,
though. Which made me wonder if brotli of the individual components would do
better than brotli of the whole file. Turns out it does: 8006 bytes.

------
stephen
Nice!

I've heard that Google's Inbox (and likely other websites) uses this
technique, although the implementation AFAIU is not open source.

...actually, I think their's is different, in that it doesn't depend on
service workers; AFAIU the approach is:

if you move from js-lib-v1.js to js-lib-v2.js, they'll go ahead and source js-
lib-v1.js in the browser, and then also load js-lib-v1-to-v2.js, which is a
server-side generated JS file that redeclares/redefines only the JS
functions/modules/whatever that have changed from v1 to v2.

So, I believe their approach is much more intricate, because I believe it
diffs the JS at a semantic level to generate the "patch the already-loaded JS
by doing another JS load", vs. AFAICT your approach of just doing a textual
diff.

Assuming my assumptions about both approaches are right, I definitely prefer
yours in terms of simplicity; albeit the Google approach is (or was?)
necessary to benefit most users.

~~~
lstamour
I'm reminded of [https://github.com/google/module-
server/blob/master/README.m...](https://github.com/google/module-
server/blob/master/README.md) from a (linked) 2012 presentation. I believe I
found it in 2013, there might be other more recent talks. If I recall, the
unique feature of the proposed Google loader was that it would dynamically
take into consideration a dependency graph of the next JS to load and serve up
just enough to show whatever page you needed, and if the next page in the
graph had different dependencies, it would load just the ones you didn't
already have. Can't remember the details right now, it's been a few years.
Made tremendous sense at the time, but the implementation might need to be
improved a bit given HTTP/2 and more modern JS loaders, including a WhatWG
spec.

------
daemonk
I often see packages or solutions to problems related to browsers/servers on
hackernews. Why don't people who make browsers or servers implement these
things as a standard? IE. There seems to be so many people using jquery/react.
Why not just implement some of their functionalities natively into the
browser?

There seems to be a trend of making un-opinionated software that acts as more
of a sandbox for down-stream developers. I think that's great because we are
now seeing the great solutions coming out of that sandbox. But at what point
do we start to have a general consensus and implement some of these great
solutions natively to remove the resource overhead?

~~~
dexterdog
Do you really have to make them standard? There are not that many versions of
most of these. Caching every common version of jQuery is pretty trival. The
problem is that many people host it themselves or use one of many CDNs to
serve it up and then expose users to the potential tracking that CDNs can do.
Why don't the browsers just cache based on the integrity hash of the css and
js links when it is there instead of the url? Then everybody can host his own
copies for the cases where the user doesn't have it and cache hits will be
much more common.

~~~
dexterdog
Concept covered in more depth here: [https://mntr.dk/2016/content-addressable-
browser-caching/](https://mntr.dk/2016/content-addressable-browser-caching/)

------
ricardobeat
Chrome has had native support for SDCH[1] for years - I wonder why it hasn't
been widely adopted.

[1]
[https://en.m.wikipedia.org/wiki/SDCH](https://en.m.wikipedia.org/wiki/SDCH)

------
nateguchi
Excellent use of service workers - are there any production sites around
actively using service workers in production? Or are we yet to see the
application of these mainstream?

~~~
mbrock
I used Service Workers in production for a client to do offline mode and also
to get truly instant page reload times for their SPA, which was important for
kind of obscure reasons.

The Service Worker saved a complete HTML rendering of the current state of the
React app, which was then served on reload, so that the correct view showed up
even before any JavaScript was loaded.

Then in the next "requestIdleCallback", the React app was initialized with
store data that was also cached.

It only works on Firefox and Chrome, so it's great for performance
improvements or if you control the client's browser setup.

Other than that, try looking at your browser's debug pane for service workers
and you'll see if any site has installed them. In my experience there are
quite a few.

ProductHunt uses them to send notifications when you're off the site, which is
a bit annoying, but maybe I said yes to it at some point...

~~~
nathancahill
That sounds like a great structure for instant page reloads. Any writeups on
that?

~~~
mbrock
I didn't, but I might soon... I'll email you if I do.

------
niftich
How does this interact with the browser's cache? For each potentially-
cacheable request, the browser goes to its cache and looks up the entry by
method and URI. When no hit is found, it forwards the request to the server,
then depending on properties of the response, it may cache that response. At
what point in the flow does the sw-delta-client code intercept the request?
(Before the browser cache, or after the browser cache but before the web
request?)

The sw-delta-client rewrites the URL, and the altered request is sent to the
server. The server responds -- let's assume with a cacheable respone -- and
the browser's cache gets updated.

Next time, we request the same URI but it's already cached and not stale, so
it can be served right away from cache without having to go to the server.
Does sw-delta-client intercept such a get-from-cache request? What if the
cached entry is stale and needs to be revalidated by making a conditional GET
to the server. Does it intercept revalidation requests?

Depending on some of these answers, the addition of query strings to the
browser-perceived URI may influence whether browser caching is performed
(properly, or at all). See: [1]
[http://stackoverflow.com/questions/24354119/](http://stackoverflow.com/questions/24354119/)
[2]
[http://stackoverflow.com/questions/3131518/](http://stackoverflow.com/questions/3131518/)
[3] [https://support.cloudflare.com/hc/en-
us/articles/200168256-W...](https://support.cloudflare.com/hc/en-
us/articles/200168256-What-are-CloudFlare-s-caching-levels-) [4]
[http://stackoverflow.com/questions/23603023/](http://stackoverflow.com/questions/23603023/)

~~~
rictic
I haven't looked into the code yet, but the service worker has almost complete
control over network requests and caching. It can calculate the correct
response for a url (e.g. based on this delta encoding system), cache it, and
use it in place of or alongside network requests in order to respond to a
browser request.

------
alexcasalboni
Here is a Python serverside implementation:

[https://github.com/alexcasalboni/sw-delta-
python](https://github.com/alexcasalboni/sw-delta-python)

