
Ask HN: Looking for compression algorithm that took 6GB to 800MB. Anyone know? - drvc33
Looking for a good media compression &amp; archiving algorithm for an app I&#x27;m building. More specifically, I&#x27;m looking an algorithm I had used 10 years ago but had totally forgotten the name of.  Description: It was a self-extracting archive (about 800MB) that spat out a folder of over 6GB (mainly media: movies &amp; audio). Now, the media files themselves were already in compressed format (MP3, MP4, etc...) so I was really impressed. Only drawback (as expected) was that it took over 1 hour to decompress on an Intel dual core. Does this kind of performance ring any bells for anyone? I just need a name.
======
insoluble
Call me a skeptic, but there is no way there was an algorithm available to
humans 10 years ago that _reversibly_ compressed 6GB of unique, already-
compressed media files down to 800MB. The _only_ way this could have happened
is if there were shared files or shared segments between the files. For
example, if a DVD had a bunch of audio tracks but some of the tracks were
basically just direct copies of the others, then the compressor could
recognise the similarity and capitalise thereon. For lossless compression of
general data, 7-zip set on Ultra is probably the best available right now. On
the other hand, algorithms such as FLAC or PNG work well for losslessly
compressing uncompressed media.

------
ta808945
I did quick internet search and only app mentioned by people that is able to
achieve such compression rate is KGB Archiver[1]. And according to its
wikipedia page it uses PAQ[2].

[1]
[https://en.wikipedia.org/wiki/KGB_Archiver](https://en.wikipedia.org/wiki/KGB_Archiver)
[2] [https://en.wikipedia.org/wiki/PAQ6](https://en.wikipedia.org/wiki/PAQ6)

------
ChuckMcM
We just to joke at NetApp that the Oil and Gas industry had the best
compression algorithm, it could compress 100TB of seismic imaging data into a
single bit {oil / no-oil } :-)

I created a theoretical compressor which I haven't yet been able to implement
which uses the fact that every sequence of bits appears in pi somewhere, so my
compressor would just return the digit offset and the length of data. I keep
looking for a source for all the digits of pi though, have yet to find it.

~~~
komon
After having experimented a bit in high school with ideas for compression
algorithms like this, I can tell you that one of the problems you'd run into
is that you may have to look so deep into pi that the offset may actually take
the same number of bits (or even more bits) than the sequence you're trying to
find.

~~~
ChuckMcM
If you're up for it there is some great number theoretic work to be had in
that statement :-). Given a sequence of digits X, P{X} E {Offset n .. Offset
p} ? len{Offset n} > len{X}

------
rahimnathwani
I don't believe there is any known lossless algorithm which can achieve 7.5:1
compression on already-lossy-compressed media files.

If you want further lossy compression for existing media files, look at the
newest algorithms supported by ffmpeg.

If you want generic lossless compression, you can do a bit better than the
usual suspects (gzip et al), but only if you're willing to put up with very
slow compression times.

If you have some other type of specific data (e.g. sparse files) then you
could do something custom, but I guess this is unlikely to be your situation.

------
DanBC
You can look through the software here:
[http://www.maximumcompression.com/benchmarks/benchmarks.php](http://www.maximumcompression.com/benchmarks/benchmarks.php)

Or the software linked via here:
[http://prize.hutter1.net/](http://prize.hutter1.net/)

As other people say, what you're asking for probably isn't possible.

------
drvc33
I wanted to add: The self-extractor had no interface -- it was all in the
windows command prompt. A message in the prompt said something about "Media
compressor" and below it was the percentage extracted.

------
toreriklinnerud
Pied Piper?

~~~
samfisher83
inside out compression algorithm

