
Visualizing binaries with space-filling curves - dpeck
http://corte.si/posts/visualisation/binvis/index.html
======
cortesi
Why, hello there Hacker News. Just a note that I've written two follow-on blog
posts after this one, developing the idea a bit further.

First, using a color function that encodes local entropy to show how crypto
keys and other high-entropy data can be picked straight out of a
visualization:

<http://corte.si/posts/visualisation/entropy/index.html>

And next, using this in bulk on samples from a malware database, with what I
think are beautiful and interesting results:

<http://corte.si/posts/visualisation/malware/index.html>

~~~
jeremysalwen
I'm curious as to how it would look if you used some other measure of entropy
besides local symbol frequency. Probably a general purpose compression
algorithm should give you a good idea, like gzipping the blocks.

Hmm, I'm not sure if this could work, but perhaps if you use a stream based
compression algorithm you could relatively precisely see how much compressed
data it takes to represent up to a certain point in the file, with only a
single pass (rather than having to compress a huge number of local windows).
Of course this is probably going to weight earlier parts of the file heavier,
simply because the compression won't be calibrated yet to efficiently encode.
So you could also run it on a byte-reversed version of the file, and a
"rotated" version of the file (i.e. file[n/2:n]+file[0:n/2]), and a rotated-
byte reversed version, and combine all those metrics together in some way
(maybe min(entropy1,entropy2,entropy3,entropy4)).

That way you could get an entropy measure which compensates for the sort of
alphabet runs that fooled shannon entropy.

~~~
cortesi
I'm working on something like this for the next evolution of the entropy
visualizations. I've toyed with compression, but have had nicer results so far
with other randomness estimators. I'll do a writeup once I have something to
show.

For a continuous compression function your idea of running on a rotated or
reversed version of the file and then taking a minimum is a cunning one! Right
now, I'm still working with sliding windows, but I'll keep it in mind if I
turn back to compression.

~~~
jeremysalwen
I'm curious what other randomness estimators you'll be using... (or maybe I'll
just have to wait for another blog post).

------
ChuckMcM
This reminds me of a guy who took the address pins and the data bus pins and
hooked them up to two digital-to-analog converters and fed them into an
oscilloscope.

The resulting 'knot ball' of traces could actually be discerned once you
watched a few of them. You can pick out all sorts of things like when the
video was refreshing, keys were being processed, etc.

~~~
ZenPsycho
"You get used to it. All I see is blonde, brunette, red-head"

------
doctoboggan
Its amazing how much data we can process through our retinas if it is just
presented intelligently.

------
0x0
I remember I did something similar (although much less sophisticated) in the
90s; I'd write a small program to switch into VGA mode 0x13 and just load a
file into the 320x200 framebuffer at 0xa0000. Even with just the default VGA
palette, identifying areas of interest and patterns was quite possible.

Nice writeup(s) and interesting approach with the entropy calculations!

------
rsiqueira
I wish hexdump had an option to create an output image like those.

------
chanux
Can someone please note down what are the benefits of such a visualization?

~~~
biot
Humans are awesome at recognizing visual patterns. It's difficult to get a
sense of what a file contains from a hex editor, but if you apply the right
transformations on the data, distinct features stand out in the visualization.

This allows people to recognize similarities between different files as the
resulting patterns are revealed. Take this previous blog post by the same
author:

<http://corte.si/posts/code/sortvis-fruitsalad/index.html>

This shows various sort methods operating on a set of unsorted data. If all
you had visibility into was how the data was manipulated, by generating an
image based on the data transformations, your pattern matching brain will
quickly be able to see which sorting algorithm was used.

------
grannyg00se
I really enjoyed these posts. What a great use for a mathematical concept that
was developed over one hundred years ago.

