
Compression by shared resource - ricklamers
Hello Hacker News,<p>I would really like to get some input from the HN community on this topic I&#x27;ve been researching.<p>It&#x27;s a bit of a legend&#x2F;vague story, but I&#x27;m trying to find out whether there&#x27;s any truth to it.<p>In short: Dutch electronics technician Jan Sloot (1945-1999) supposedly invented an incredible compression technique that could potentially change the digital landscape. Investors like Tom Perkins and Dutch billionaire Marcel Boekhoorn were interested in signing a deal.<p>There&#x27;s a lot of mystery around the actual technology but what I could find was this:<p>Link: jansloot.telcomsoft.nl&#x2F;Sources-1&#x2F;More&#x2F;CaptainCosmos&#x2F;Not_Compression.htm<p>An article by a fan website that discusses the technology a little bit. Atleast, their interpretation of what the technolgy was.<p>What the article claims in summary is this: if there&#x27;s a (e.g.) 4 GB shared &#x27;magic dictionary&#x27; that exists on both clients and the only data that is transmitted is references to parts of this dictionary, massive file size reduction can be achieved  if the purpose is transmitting&#x2F;distributing the data to many parties (e.g. video streaming services like YouTube).<p>An analogy given is this: a PDF document which references fonts instead of embedding them, hence reducing the file size of the file that&#x27;s transmitted.<p>What I would like to know from you is whether there&#x27;s any potential in the above system for file size reduction and&#x2F;or whether something like this already exists today? Perhaps executed in a similar fashion by other companies?<p>Perhaps some of these concepts are used in modern compression libraries like .7z, I&#x27;m not too familiar with those so I can&#x27;t tell.<p>Long story short, I&#x27;m not experienced enough to judge the technology described but I&#x27;m curious to know your thoughts on the matter.<p>Kind regards,
Rick Lamers
======
dmfdmf
Shared information at each endpoint can be used to communicate efficiently but
it doesn't qualify as "data compression" as that is currently defined. For
example, if you have a library of books on your end then I can send you a few
byte number (the ISBN) to identify a specific book instead of sending a
normally compressed copy of the whole book. If you compare the few byte ISBN
to the byte size of the book, of course the compression rate is phenomenal but
this normally isn't considered data compression but it is related.

A real life example is Binary Delta Compression that Microsoft uses in their
Windows Update protocol. If you make minor changes to a file in going from
version 1.0 to 1.1 say, then instead of transmitting the whole file WU just
sends a file identifier + the bits that need to be modified. This is a trade
off between sending data and computation at the sending and receiving end.

Every so often I read about someone who has the idea to generalize this. The
idea is to use/index the standard Windows installation (or something common on
most computers) as a "library" to reference. Then to "compress" a file you
send a sequence of pointers to the library necessary to recreate the file you
want to send at the other end. The problem with this is twofold. First, it
takes a lot of computer power to analyze a file and break it down into library
references. Second, you are on the wrong side of 2^n and as the file size
increases the probability that it or large pieces of it are contained in the
library quickly goes to zero.

The only way that Sloot's idea would work is if he found a way to generalize
this for random (i.e. already compressed or high entropy) data and a method to
define an easily created library on the endpoints that is comprehensive and
independent (i.e. a basis) for random data. That is a tall order.

------
jgrahamc
The idea of using compression dictionaries like this is not new. Common
compression systems (e.g. zlib) allow you to do this. On the web there's a
Google spec. called SDCH to do the same thing.

~~~
ricklamers
Kind of what I expected, thank you for referencing SDCH. Directs me to more
interesting reading material.

Edit:

Found this:
[https://aprescott.com/posts/sdch](https://aprescott.com/posts/sdch)

Maybe SDCH isn't that useful after all? Looking to find some quantitative data
on potential gains from SDCH now.

~~~
jgrahamc
Of course, Sloot might have managed to come up with a nice dictionary of
chunks of binary that are common across lots of videos which might have been
useful.

