

Analyzing unknown binary files using information entropy - egorst
http://yurichev.com/blog/entropy/

======
woliveirajr
There is some technique called "normalized compression distance" that does
sort of it. It uses compression to compare how similar some data is to some
another.

For a similar problem, you can work like it was answered here:
[http://reverseengineering.stackexchange.com/questions/2897/t...](http://reverseengineering.stackexchange.com/questions/2897/tool-
or-data-for-analysis-of-binary-code-to-detect-cpu-architecture/2900#2900)

------
snarfy
I always thought this idea could be greatly expanded upon.

I've seen it used to guess the native language of a text file based on the
compressed input. I always believed this could be used as a sort of universal
translator. You could compress the audio sounds of birds, throw this algorithm
at it, and extract meaningful content.

~~~
woliveirajr
To tell the truth, it already has been proposed and there are some papers
about it. Search for "NCD - Normalized compression distance" and you'll get
some results. It's even used to verify who is the author of some document
(authorship attribution). Very interesting.

------
rasz_pl
Cantor.Dust - the future was here, but turned out to be vaporware :(

[https://www.youtube.com/watch?v=4bM3Gut1hIk](https://www.youtube.com/watch?v=4bM3Gut1hIk)

