Hacker News new | past | comments | ask | show | jobs | submit login
Reverse engineering guide for beginners: Methodology and tools (0x00sec.org)
801 points by ingve on June 12, 2017 | hide | past | favorite | 63 comments

I highly recommend this guide on how Samba was written, describing the techniques involved on RE [0].

[0] - https://www.samba.org/ftp/tridge/misc/french_cafe.txt

Thanks so much. I respect Samba developers so much for how we take them, and Wine devs for granted, and especially Tridge for RE'ing the Bitkeeper protocol and taking heat from Linus and indirectly leading to git with his unstoppable curiosity.

After brushing up on this, if you're looking for something "fun" to work through, the NSA's 2016 Codebreaker challenge is good, granted you have a .edu email address (only US .edu too, unfortunately).


I think they're going to be keeping the 2016 version up for a while longer. They generally start a new one in September each year.

Frankly, Micro-corruption, and crypto-pals is way better if you are looking for something to spend your time on for learning.



Do you happen to know of any similar learning resources or tools for reversing file formats? Reversing the code that actually reads it is one way, but I guess I'm thinking more along the lines of static analysis of the save file.

For example, we plug known data into the program, save it, then figure out how to extract that info from the save file.

Bonus if anyone knows the legal status for that kind of work. My impression is reversing file formats has been successfully defended in US courts but I haven't started researching fully and would appreciate any leads there, cases, etc. to review and potentially discuss with lawyer. (What kind of lawyer would know about that kind of thing?)

Damn, too bad they don't accept .ac.uk addresses :( I feel like there should be a standardised way to ensure users are students/affiliated with an educational institution (maybe an international federation on top of SAML a la the UK Access Management Federation).

I guess at the end of the day the NSA want students in the US, who are likely US citizens and thus eligible for a job.

> I feel like there should be a standardised way to ensure users are students/affiliated with an educational institution

or you know, they could just open it for the public, given that there's such a wide gap in quality, cost, subsidy and affordability between higher education worldwide.

only after I stepped out the university bubble, I saw how unfair this "educational license" stuff really is. so many people who just can't afford to, so many people who don't have access to actual quality education (and don't want to throw money away just to get a piece of paper), so many smart people that spent years in "lower" (level) education (often family background not aiming high enough for their daughter/son) and then not qualifying for further funding (I'm seeing this a lot in the Netherlands, recently, it's really sad).

question is, what are you really selecting for by only accepting affiliation with an educational institution? who are you excluding and why?

(note: I put "lower" in quotes, because I feel everyone should in principle receive education to the best of their abilities. learning skills/knowledge/stuff is the goal and there's nothing wrong if that's not university-level. it's a wide world out there with so many beautiful different kinds of people)

You raise some really interesting points, and it's something I've never really considered befoer. I'm curious as to how you think it should work, though - a universal non-commercial license that people can use personally?

Do you think there's much scope for abuse in terms of people using the licenses commercially without funding the company behind the product, though? As an example I'm considering JetBrains' IDEs, but I think the same concerns could be applied to a wide variety of products.

> I feel like there should be a standardised way to ensure users are students/affiliated with an educational institution

There's ISIC (International Student Identity Card) [1], but I don't imagine it's used widely.

[1] https://www.myisic.com/isic-card/

Not a standard but https://github.com/leereilly/swot comes to mind.

Hehe, it's nice to see Tech leading the pack. Go Jackets <3

An excellent introduction to Windows reverse engineering are lena151's video tutorials: https://tuts4you.com/download.php?list.17

I learned a lot from lena151's tutorials 8 years ago but I'm not sure if her tutorials could work on Windows 10.

I don't see why not. What problems are you having?


Maybe fix the title to make it clear that this is about reversing binaries? Because RE is quite a broad term, even within the field of computing and/or generally "topics of HN interest". You can reverse engineer so many other things than just executable binaries. And not just other kinds of software (web), but hardware, communication protocols, even organisations and bureaucracies, or processes in the widest sense of the word.

It's not like this article teaches much about the general "reversing mindset" (similar to the "hacker mindset", but not quite exactly the same), or the "methodology" as promised in the title. Because yes there is some very interesting overlap in skill within the broad field of RE. Ask any pentester who also picks locks.

Not to discredit the article itself, btw, which is fine given what it actually covers. Which is about Linux binaries, and in particular with the object of solving a crackme puzzle.

Maybe "Reverse engineering a crackme for beginners" would be a bit more descriptive.

Just to provide another perspective: I was expecting exactly this when I saw the title. Maybe we could also use reverse engineering for binaries and come up with new terms for all those other, largely unrelated, things. It's probably too late though and naming things "by committee" is unlikely to work anyway.

Binary Ninja is a fine piece of software, but it is more ethical to advertise this article as "nice reversing tutorial included with said software", because not-so-hidden shameless advertisement for it is worse.

IMO, I really don't see this as "shameless advertisement". As far as I know, there are really only three worthwhile static analysis tools available right now, Radare2(free), Binary Ninja($99-$300), and IDA pro($500-$5000+). Using Radare2 at the beginning can be very daunting if you are new to RE. IDA is so expensive you can really only obtain it legally through your workplace or college/university, or if you're willing to spend a very large sum of money. That really only leaves Binary Ninja.

With that said there is nothing stopping people from starting with radare2 if they wish to, there is a lot of great tutorials for it available online. But in this case the recommendation of using Binary Ninja was one which the author (Nitrax) made because "due to its low cost .. compared to the functionalities provided", the author even further added "A demo version is available for free and should be enough for beginners."

I can understand the insistence on free and open-source software, but a lot of OSS tools in many fields have simply not caught up to their paid counterparts in all aspects. And to me, it feels much like calling someone a shill for saying "Make sure you use a good drill for this" instead of supplying references to a free & open 3D printable drill schematic.

Here's a nice short video where someone (who I think is the same LiveOverflow that commented elsewhere ITT about something else) is using radare2 and python to reverse engineer a CTF challenge binary. https://youtu.be/y69uIxU0eI8

FWIW, I've found the IDA Freeware version to be good enough to learn from. Personally, I couldn't really tell the difference between the freeware version I was using at home, and the PRO version that we were using at school (Although that might've been an old version). https://www.hex-rays.com/products/ida/support/download_freew...

(Hopper is also $99.)

Ah, yeah. I totally forgot about hopper. I've never used it, but it looks nice, especially for Objective C and Swift.

I'm sad that Windows support for Hopper was discontinued.

All I see is a sponsored article to show Binary Ninja as wonderful and talk trash on r2.

And a lack of openness about that fact.

I'd love to know more about disassembly. I've recently had more and more reason to go deeper into applications I'm running as dependencies. A few issues I've found and fixed just by using strace to get an idea of the system calls.

There was one thing in particular where I knew there was a jump somewhere (if some_length < some_width) that caused bad outputs. I was playing around looking at registers etc in gdb while following along with a disassembled version of the code, but it was impossible to get any idea where to start.

I wanted something that could give me a few seconds worth of samples of where the instruction register was spending its time as a starting point, but couldn't find any such tool (linux).

Within my control:

    - giving input files to explicitly set unique numbers to watch out for
    - giving inputs that would generate bad output numbers only in the bad code path
    - giving inputs to force a load of jumps down the bad or good code paths
Does anyone have any advice on how you might approach such a situation?

AFL automates this to a large degree. It doesn't do it with the type of machine control and monitoring you are asking for, but if you can instrument it with AFL, this is a perfect use case for it. It will do the hard work for you.

If you want to control the computer, you have options. Qemu can give you fine grained CPU level logging you want: http://moyix.blogspot.jp/2014/07/breaking-spotify-drm-with-p... -- this is an article that walks through logging them instructions. (https://github.com/panda-re/panda)

Finally there are RE frameworks where you can instrument (hook) function calls (Frida being a good cross platform option).

I would try AFL first. But playing with Panda can be really fun too.

(edit: and a colleague just pointed out -- https://github.com/angr, which will also let you work right at the level you want to, I think).

What you want is a hit tracer.

A hit tracer sets a breakpoint at the beginning of every basic block, and records and clears every breakpoint hit (so the performance hit is relatively low).

You'd use a hit tracer to record a "baseline" of your target program when it's not doing the thing you care about. Then you'd run the program again and trigger the behavior. Then you'd diff the traces.

AFL is doing something related when it explores states in targets.

There are a lot of free hit tracers, but they're also kind of a build-your-own-light-saber deal. If you have a Python debugger library and a disassembly, a hit tracer is close to "hello world".

>I wanted something that could give me a few seconds worth of samples of where the instruction register was spending its time

You could write this with Pin [1], but I'd be surprised if there wasn't a profiling tool with instruction level analysis available. If there truly isn't, then there are Pin examples that can be pretty quickly modified to achieve this.

1. https://software.intel.com/en-us/articles/pin-a-dynamic-bina...

It's not free, but IDA Pro has tracing functionality similar to what you want. There are bootleg copies if you look hard enough, an old version (5.0) free for windows only, and feature limited demos.

orthogonal :

I honestly wish CMU would release the lectures and full class materials for 15-213 (the course most typically associated with the bomb lab mentioned here). The lectures combined with the accompanying text and labs form a masterpiece, and it's a shame the community at large can't take better advantage of it. It's like SICP for systems : that effing good.

The tests, however, are just awful. Those can safely be dumpstered.

Even if you can't get access to this course, I highly recommend that you read the text book for it.

Computer Systems: A Programmer's Perspective: https://www.amazon.com/Computer-Systems-Programmers-Perspect...

It's one of the best books ever written on this subject.

One of my courses at UQ (Australia) followed some of it and I'm glad that I took it. There's no better way to asses the skills involved in debugging/analyzing machine dumps than the binary bomb assignment.


That includes PDFs of the lectures and videos of said same. It looks similar to Berkeley's CS 61C.

Thanks for the link. Didn't realize they'd started both recording lectures and releasing them. Probably has something to do with the change in teaching faculty for the course. Wonder if the second half of the semester is available somewhere (I assume it'll be uploaded throughout the summer).

hackermailman's post elsewhere in these comments links to: http://csapp.cs.cmu.edu/3e/labs.html https://news.ycombinator.com/item?id=14522391 which further links to lectures: https://scs.hosted.panopto.com/Panopto/Pages/Sessions/List.a...

It sounds like you've been through the course - why not release it yourself?

I don't have the lectures. At least when I took it, they were very specifically not recorded (for some hand-wavy pseudo-pedagogic reasons).

The labs and text are already available online.

The bomblab is from CS:APP student labs section if anybody is interested https://news.ycombinator.com/item?id=14522391 specifically here http://csapp.cs.cmu.edu/3e/labs.html

Upvote if you grew up on Fravia and tKC

Was going to mention Fravia. He died in 2009 if you weren't aware, age 56.

Website here: http://acrigs.com/FRAVIA/index-old.htm

+ORC ( https://en.wikipedia.org/wiki/Old_Red_Cracker ) always interested me. Like Satoshi Nakomoto, he has never been identified. Fravia's site has his tutorials here: http://acrigs.com/FRAVIA/orc1.htm

Some pretty serious old school hackers.

Yup, blast from the past. I got serious about programming when I wanted to hack games for extra lives on my Spectrum.

When I later moved to the PC +fravia's resources were very much a sight for sore eyes.

Though I admit it was only years later when it occurred to me that +orc was almost certainly +fravia in a weak disguise. I took it literally at the time..

Obviously I don't know for sure, but the +ORC == Fravia seems to be rejected by most(?) +ORC stalkers.


Of course, without looking deeper he's the most obvious candidate

OpenSecurityTraining[1] videos are also a golden resource for beginning reverse engineers

[1] http://opensecuritytraining.info/Training.html

Evan's debugger is nice for Olly fans

edb is great. Very useful. Also ht editor, very similar to Hiew and very keyboard friendly.

In around 2001 I started reverse-engineering games on the PC before having any programming skills (later moved to programming).

I remember MadWizard's assembly tutorial[0] being very helpful at the time.

[0] http://www.madwizard.org/programming/tutorials/

wow times have changed from using softice and ollydbg. When I used to RE for fun it was sad seeing how expensive programs could just be rigged by a simple NOP or JNZ/JMP change.

My best challenge was Brazil (3ds render engine). It had all types of checks that would only show up when rendering.. But that was no match.. Good times

Where RE is reverse engineering, as opposed to say, regular expressions.

Use regex for regular expressions (some people even use regexpr)

Looks like the title has been amended

Nobody in tech thinks RE stands for regular expression...

Search Google for "re documentation" and look at some of the results.

Google is effectively aliasing RE to regular expression. None of the links use "RE" as an acronym. Regex/Regexp has always been the accepted acronym.

They do though. RE is a python library, an Erlang library, there's a reference to perlre, clojure calls it re, etc. Then there's the C lib pcre.

Also Google chooses to alias certain things for a reason.

Blank page without javascript. Bye.

0) that's most definitely not blank page, I have noscript and it does load content.

1) what kind of response is this?

I have JavaScript enabled and it showed a blank page for me too, I had to reload the page a couple of times, I guess it's getting some traffic from HN

Using JSBlocker here. no content. Just doing some futile effort to show web developers that to get some basic text without javascript is cool. Of course, do whatever you want with your site. Maybe I was a bit rude.

Noscript+Firefox here. There's full content - the entire page has a JS-disabled full text of every post wrapped in a large <noscript> block.

Does jsblocker completely wipe out noscript tags or something?

Idiot who voluntarily breaks their web browser, then complains that it's broken. Bye!

Context: Visiting a website written by/for reverse engineering, a subset of Hacking.

Assertion: Visiting a hacking website with Javascript disabled is a bad thing.


Not a bad thing. You are obviously more than free to configure your browser however you want. Just don't whine about it. Maybe give some constructive criticism (It would be nice if..., Maybe you could consider to...).

Or just open the developer console and inspect the source, like a hacker/reverse engineer.

This site works without javascript. If you block <noscript> tags however you won't see anything.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact