

The Lost Quarter-Century in Data Compression - davidst
http://blog.podly.tv/the-lost-quarter-century-in-data-compression

======
jws
Huffman coding is computationally much simpler, take a histogram of the input
tokens and make a token->bitstream function. At the time it was released a
newish computer would be an IBM 650 with 2000 words of memory on the drum and
60 words in core. (Note that absence of 'k', 'm', or 'g' in those numbers.) It
was "quite popular". 1800 of them were sold.

The LZ algorithms of the later '70s are more computationally involved with
dictionary lookups in mutating dictionaries. At this time a newish computer
would be a PDP-11 with 64k words of memory and a hard disc drive that could
store megabytes! I'm not sure how many were sold, but universities were
littered with them.

I suggest LZ style algorithms merely waited for appropriate physical tools.

~~~
davidst
It's more the rule than the exception when research is done on ideas that at
the time of publication are impractical to manifest as a working machine or
program. To pick just one (possibly the most well known) example: Einstein's
paper on the special theory of relativity was published in 1905. The most
significant outcome of this paper took 40 years to manifest itself. Einstein
certainly wasn't waiting for appropriate physical tools.

Lempel and Ziv's critical insight - that modelling and coding of probabilities
are distinct and separable - could have been understood decades earlier. It
wouldn't have mattered that their research couldn't be immediately turned into
a working tool. Information Theory would have benefitted and we would surely
be further along today.

But, yes, it is certainly true that advances in technology give us an
incentive to ask questions that had previously been ignored or misunderstood.

~~~
10ren
Computer science seems actually to be the exception. Yes, though the insight
could have been had earlier, it somehow wasn't. And this is common in computer
science. I now attempt to support this thesis.

Even the stereotypical "impractical" research algorithms (with efficient
complexity but huge constants) are _implemented_. That's the magic of computer
science, that you can breath life into abstract ideas. But if you cannot
implement it at all, there are millions of other cool ideas that you can
implement, so you may gravitate towards them. (another argument is that if you
can't implement it, it's classed as pure maths. And if you can't prove it, or
define it clearly as a conjecture, I'm not sure what it is or what you could
do with it - perhaps futurism/science fiction, which has its place.)

The value of implementation is due to computer science ideas tending to be
complex and abstract - but once you have an actual concrete working model that
you can play with, they can become almost trivial to understand. And so,
comparatively, more progress in research is made based on working code than on
purely theoretical ideas. It's so _much_ easier to find mistakes in an idea if
you can build it. And once your theory is embodied in living, working code,
it's tremendously easier to reason from that as a base; much as it is easier
to see what is over the next hill after climbing it.

Hardware is closer to your idea of incentive: the inventor of the Wallace tree
multiplier told me that it was a simple and obvious idea, but it wasted a lot
of silicon; and no one entertained that kind of waste. It simply wasn't
conceivable. Perhaps this is due to the incredible rate of improvement in
silicon (Moore's Law), so that it's psychological difficult to appreciate (or
even believe) that your iPhone will be 1000 times faster in 15 years; or what
changes that even means for algorithms. Perhaps the most basic assumptions of
the algorithm changes? When x10 quantitative improvement often translates into
a qualitative improvement, what does a x1000 change mean? Too hard to predict.

~~~
davidst
Computer science gives us a better opportunity to test ideas than, say,
particle physics. The ability to experiment so easily is a major part of the
appeal for many of us.

> Even the stereotypical "impractical" research algorithms > (with efficient
> complexity but huge constants) are > implemented.

Well, they're all Turing machines, but you're not going to find any video
processing algorithms that were implemented on a 4K Altair so let's not go too
far.

It's interesting to consider a given novel idea and wonder how far back in
time it would have been possible to implement it in a practical way and how
much further back in time it would have been useful simply to know how it
works.

Let me pick a concrete example that we are all familiar with: Hashing. Hashing
and hash lookup algorithms have been around a long time. It's an area that
people might expect would have been exhausted of significant new ideas long
ago. Not so! Cuckoo hashing was invented in 2001 and gives us dramatically
more efficient use of memory at a slight performance cost. It can make the
difference between a hash table that fits in memory and one that doesn't so
it's a meaningful advance.

Cuckoo hashing could have been implemented on a 4k Altair and it would have
been useful there. But no one had the idea then.

But wait-- It gets stranger! Cuckoo hashing was literally under our noses and
somehow no one saw it! It is a variation of set associative hashing which is
what most CPUs had already been doing in their caches for years. It's not
exactly the same algorithm but if you check you'll see it's so close the
distinctions are minor.

Fascinating, isn't it? Here is an important algorithm that was actually live
and running in hardware yet no one considered building it in software for a
long time.

~~~
jws
Isn't the last part of the Cuckoo hashing paper the part that documents it as
slight less good compared to existing algorithms? I mean it never comes out
and says it, but if you look at the pictures… I think Cuckoo's best point was
the provability of its bounds, but it has been a while since I read that
paper.

I like the example of Parsing Expression Grammars. The idea of memoized top
down parsing drifted past back in the 60's and early 70's when the folks like
Aho and Ullman were busy slaying wild streams of symbols with ever more clever
algorithms but was dismissed because of its memory footprint. Then in 2002 or
so Bryan Ford said " _Oi! What do you mean too much memory? My PC has an L2
cache bigger than that._ "[1] Now PEGs are widely used and I must say (having
built dynamic grammar LR systems back when my time shared vax had a main
memory the size of the L2 cache in my iMac) especially useful for dynamic
grammars.

[1] Being an entirely fabricated quote, there is no applicable reference for
this quotation.

~~~
davidst
> Isn't the last part of the Cuckoo hashing paper the > part that documents it
> as slight less good compared > to existing algorithms?

No, Cuckoo hashing is a trade of a little run-time performance for better
memory utilization. It's still O(1). The performance cost amortizes as a
constant.

If you have gobs of memory you don't need it. But if you're memory-constrained
it's a big win.

------
btilly
If you're interested in another example of how success breeds stagnation, see
_On Proof and Progress in Mathematics_ by William Turston, available at
[http://www.ams.org/journals/bull/1994-30-02/S0273-0979-1994-...](http://www.ams.org/journals/bull/1994-30-02/S0273-0979-1994-00502-6/S0273-0979-1994-00502-6.pdf)

In particular look at section 6 where he describes how his research into
foliations had the effect of killing the field for a number of years. He
contrasts this to how his different approach to geometrization of 3-manifolds
lead to the healthy growth of the field. By, among other things, his efforts
at exposition, and his deliberate withholding of results he already knew the
answer to so that other people could prove them instead.

~~~
gruseom
_his deliberate withholding of results he already knew the answer to so that
other people could prove them instead_

That's extraordinary! I never heard of such a thing. Are other cases known?

~~~
gjm11
Gauss seems to have proved lots of (surely publishable) things that he didn't
publish. I don't know what his reasons were.

~~~
ajb
According to Erdos _, Gauss was very discouraging to his students, because
when they came and told him that they'd proven something, he would tell them
he'd proven it years before.

_ from "The man who loved only numbers.' by Paul Hoffman.

------
terra_t
Silly.

Practical dictionary-based schemes have an entropy encoder on the back end and
often that's a Huffman encoder or something similar in spirit.

Huffman encoding is superseded not by dictionary-based methods, but by
arithmetic encoding, which can compress beyond the Huffman limit because it
can use fractions of a bit.

7zip, for instance, beats the pants off gzip because it uses a very large
dictionary window AND arithmetic compression instead of huffman encoding on
the back end.

~~~
davidst
I think you're missing the point.

The article does not claim that Huffman coding was superseded by dictionary
methods. It does not confuse statistical modelling with encoding.

------
ck2
I remember reading an interview with Phil Katz (pkzip) like a decade ago about
how they were looking next at markov models for more advanced compression. It
even made me go to the library to look up "markov".

So over a decade later, nothing really new.

