Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Bin-graph: Visualize binary files (github.com/8dcc)
57 points by 8dcc 3 months ago | hide | past | favorite | 17 comments
This program provides a simple way of visualizing the different regions of a binary file. Written in C, depends only on libpng.

Currently (commit 1dd42e3) it is able to generate PNG images that represent various aspects of the binary:

- Grayscale: Byte values, 00..FF. - Ascii: Printability of each byte. - Entropy: Of a "block", changed with --block-size. - Histogram: Bar graph of the byte frequencies. - Bigrams: Each point is determined by a pair of bytes. - Dotplot: Measure self-similarity. Image width/height is N^2.

In the future, I plan on adding an SDL version that allows the user to view a section of the file interactively (sections are currently supported with --offset-start and --offset-end).

More information on the README.




From the perspective of marketing/spreading this thing--even to Engineers--I think some pictures of example output would help quite a lot.


I love these kinds of tools! For part of my PhD research I made a bunch of digraph heatmaps of (differently-obfuscated variations of) stdlib binary files (raw byte sequences and asm mnemonics shown side-by-side):

https://alexshroyer.com/misc/digraphs.mp4

There are often bright spots in these kinds of visuals that you end up seeing over and over again (e.g. clusters of ASCII).


> There are often bright spots in these kinds of visuals that you end up seeing over and over again (e.g. clusters of ASCII).

Indeed, this is specially true in the "bigrams" mode, where each point (X,Y) is set if the bytes X and Y (00..FF) appear in that order in the input. If you look at the bigrams example in the README, you can see that there is a bright zone where the lowercase ASCII characters are, since that graph is plotting the .rodata section of the binary (using the bin-graph-section.sh script). These patterns appear with other kinds of data, not just text (e.g. x86 instructions).


Have you checked out this demo?

https://youtu.be/C8--cXwuuFQ?list=PLUyyOw61zxiJXMihb4PjYbGHE...

The 3d pan/tilt/zoom visualizations of the trigraphs are especially nice to look at.


This immediately popped into my head! I remember it blowing my mind years ago


Some examples would be nice, I'm not familiar with binary visualization, so I'm curious why and when I would personally use this


these days, when using the right vis techniques you can easily train something like resnet to find malware. some is even plainly obvious to the eye. some packers and encryption is easy to spot but more subtle patterns a neural network can identify. for a reverse engineer or malware analist, seeing what part of files contain 'executable code' is also possible. for example if shellcode is embedded in a different file type or such. check the chris domas video linked on the github. its pretty epic :D


I added screenshots to the README.


Kudos!


So you made a visualization tool and don't show an example image in the readme?

I am not sure if I should be shocked or impressed, but consider the following question: Would you download a random program of the internet because of the promise of it creating a useful visualization? You probably would if the visualization looked useful to you. But as you can't see it without downloading.. You get my point.


The amount of times I see this with projects on HN is crazy. The one thing I want to know is how the output looks.


Maybe the OP fixed it after seeing these comments, but I see output examples now in both the 'examples' folder as well as the readme.


I added them after reading these replies, yes. I usually don't add screenshots because I don't like adding images to the git repository (people don't really want to clone that), and I don't want to use an external CDN for uploading my images.

In this project in particular, I understand that it is important.


Not sure if there are any restrictions keeping you from linking directly from the project readme, but perhaps a separate repo can drive: https://pages.github.com/


Some months ago, before a GitHub update, you could just paste an image while editing a Markdown file and it would be uploaded to their CDN. You could use that link even in other files (e.g. Org files in my case). They changed their upload system and this is not the case anymore.


No screenshots or demo - how do I know if this is better than the already awesome web-first binvis.io?


My project was inspired by Cortesi's blog posts, among other things. I thought binary visualization could be useful for reverse engineering, a fun thing to program, so I did. I wanted to share it in case someone was interested in the code for generating the image, since I made it as "extensible" as I could, but I am sure there are a lot of improvements to be made.

That website is more interactive, shows a hex dump of the binary and doesn't require you to download/compile anything. It's probably more practical for most users, but my project has some other modes that might be helpful for recognizing patterns in different file formats (look at the talks linked in the README for more information on what I mean). Also, as far as I know the source code for binvis.io is not public.

P.S. I added a link to binvis.io to the README as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: