
BARF – A Binary Analysis and Reverse Engineering Framework - galapago
https://github.com/programa-stic/barf-project/?0.2.1
======
n1x0n997
Neat! I started with radare2 recently - quite a steep learning curve, but once
you understand that it works like VIM (: command etc..) , you are flying !
Worth spending time on learning it. Just make sure you are pulling from git
daily as they make several commits/day. Did few CTFs with it (i.e. - trying to
learn on tool, but know it well), it can binary patch, detect
variables/arguments to functions, and you can rename both args and function
names which makes it even easier to read. The only problem is that
documentation sucks. You have "cursor mode" so you can essentially move line
by line in your asm view, hit return on jump and it moves you to jump location
(I like that). It also has ascii-based graph, I do VV (that's the command), it
prints the graph, from there you can move between jumps/calls by pressing tab,
but even better - if you press t (for true) it moves you to the block that
would be followed if jump/call was taken, and f (for false) if it wasn't -
also very neat. And that's probably like 1% of what it can do, amazing
project.

------
OneOneOneOne
"Not in here mister! This is a Mercedes!"

Can anyone recommend a good tool for exploring and decoding other binary
formats? I am interested in analysis of non-code binary data.

~~~
dr_zoidberg
What binary formats are you interested in? For my graduate thesis I had to
read a lot about JPG, PNG, SQLite3 and MS-OLE file formats and I could give
you some references to read from and shamelessly link to my github project
where you can find some tools related to this (and others) file formats.

~~~
pmorici
How would you diagnose an unknown compression format? That's a problem I've
encountered recently and I'm hoping there is an easier way than stepping
through the reference compressor in a debugger.

~~~
Kalium
If you're looking to find out what it is and what's in it, there are some very
flexible unpackers out there. Titanium Core from Reversing Labs is excellent.

~~~
pmorici
I've already know what it is, and it isn't maleware or a packer. What I'm
looking for is something that can do analysis on a bit of data and say what
compression scheme is likely used.

~~~
Kalium
Ah. Well, the product I mentioned can usually identify packing schemes and
unpack them. Commercial close-source, though.

------
amelius
Interesting. A few questions:

Is it possible to backtrack using this framework? (I.e., execute code, stop
it, go back a bit, continue, etc.)

Is it possible to save/restore the machine state to a file?

And can it deterministically (re)run multi-threaded code? And other (normally
non-deterministic) OS functions?

~~~
galapago
> Is it possible to backtrack using this framework? (I.e., execute code, stop
> it, go back a bit, continue, etc.)

In the develop branch we have a fully integrated debugger (using python-
ptrace), so this should be doable.

> Is it possible to save/restore the machine state to a file?

Nop. If you want to do this, you should try to find a more VM-oriented
framework (pyqemu maybe?)

> And can it deterministically (re)run multi-threaded code? And other
> (normally non-deterministic) OS functions?

Not yet, but if you instrument some functions using the debugger in the
develop branch, you could do it.

edit: PANDAS is also a very good option
([https://github.com/moyix/panda](https://github.com/moyix/panda))

------
MichaelCrawford
I myself reverse engineered two file formats: the Movie Magic Scheduling
database files - it's an application-specific project management tool for
motion picture and TV production - as well as the Zeni 4 Electronic Design
Automation (ie. Electronic CAD) Physical Design documents.

I've been thinking just in the last few days that it would be cool to write up
how I did it.

The basic way is to create a document with nothing in it, then a second
document with a very small difference. For example in Movie Magic I had one
new document that I saved without putting anything in it, then a second where
I entered the letter "A" into a certain field.

I made hex dumps of them both, then compared the hex.

Then I would make some modest changes and additions, such as the letter "B" in
a different field, or I would change the "A" in that first field to "AB".

This approach only really works if you've had experience implementing file
format code; it's particularly helpful to have designed original formats
completely from scratch. For these specific reasons I am better than most at
file format reverse-engineering, however I don't really have a clue about
doing that for network protocols as the Samba folks did.

Once I have some guess about the file format, I start writing a file
interpreter, typically to dump the binary format into human-readible text.
"Actor's Name: A", "Movie Title: B".

Rather more importantly, that binary file dumper is chock-full of assertions.
It's rather more important to get the assertions right than it is to display
the actual file payload values.

Loosely speaking, a set of correctly-implemented assertions is, in itself, the
documentation for the file format you just reverse-engineered.

It's also important to round-trip. So I come up with some very simple text
input format, then a filter that creates documents that are readable by Zeni
or Movie Magic, then I try to open the files. If that doesn't puke on my
shoes, I edit the file a bit, then run that edited file through my human-
readible dumping program.

Lather, rinse and repeat. It is very tedious and slow but it is quite cool
when you discover something that works.

I've been puzzling over how I could offer a consulting service where I would
reverse-engineer documents as a service to other companies. No doubt there is
lots of need for this but I am unclear as to how to market it.

The DMCA only covers Digital Restrictions Management. If the DMCA doesn't
apply, as it did not in the case of files that you create yourself as with
Movie Magic Scheduling or Zeni 4, while there are some procedures one must
take care to follow, reverse-engineering is completely legal.

Another way to put it, is that among the reasons we have patents, is so that
reverse-engineering won't be necessary.

~~~
jakobegger
That tip about assertions is great! It would tell you immediately when one of
your assumptions is wrong...

One tip I have, which might be obvious: Write docs as you work. While I was
working on an app that reads MS Access file, I simultaneously worked on a
guide that documented everything I found out:
[http://jabakobob.net/mdb/](http://jabakobob.net/mdb/)

That guide will be useful when I need to fix a bug in the future, and I hope
it will also be useful for other people.

~~~
MichaelCrawford
Indeed. I didn't mention that, but that's something I do myself.

Really the written documentation is far more important than any code. Consider
someone who wanted to write a file filter in some other programming language.
I use C or C++, they don't look a whole lot like Python or Haskell.

Really the written document should be regarded as authoritative, and not any
of the code.

However, it's not clear how to go from a written specification to a working
file filter.

One way that would work well but that would be laborious, would be to give the
doc to someone who had no prior knowledge of the format in question, so they
could write their own filter.

If I gave such a doc to you, then you and I could round-trip our files back
and forth between my implementation and yours, then we could be quite
confident that the document was correct.

To this very day, OpenOffice will not round trip my resume between .odt and
.doc. The are is subtle but I can see it every time I try.

