
A Gentle Primer on Reverse Engineering - luu
https://emily.st/2015/01/27/reverse-engineering/
======
sdevlin
First: great article.

One nit, though. There's a subtle error in the main function:

    
    
        char* input;
        printf("Please input a word: ");
        scanf("%s", input);
    

Local variables are not automatically initialized in C, and we never assign
input to point to any particular block of memory. This means it's probably
pointing off to some random location - basically whatever address happened to
be sitting on the stack when main was called. So when scanf writes the user
password to input, it's going to go to some unpredictable location with
unpredictable results. This could lead to code execution, or at least a
straightforward denial of service.

~~~
zrail
The author addresses this in footnote #2. They're simplifying the C code to
get to the point of the article faster.

~~~
EpicEng
Simplifying does not mean incorrect. Initializing the variable would not add
appreciable complexity to the code. It's a snippet and should be fixed lest it
make the author appear incompetent.

~~~
zrail
I'm not 100% confident, but I'm pretty sure the author _does not care_ if you
think they're incompetent. The author's skill and apparent interest is in
explaining an inherently difficult concept (reverse engineering a complied
executable) in a way that won't cause their audience ( _not you_ ) to tune out
and/or go running for the hills.

Could they adjust the snippet? Sure. Will it add anything to the essay to do
so? Nope.

~~~
EpicEng
Eh... I'm not really buying it. I think it was an honest mistake. C'mon; this
makes that snippet harder to read?

    
    
        char input[SIZE];
    

I really don't think so. I'm all for simplifying example code to get the core
concept across, but I wouldn't go as far as to invoke undefined behavior. I
also don't understand the use of scanf. At all. A seasoned, competent C user
would never consider using scanf.

Anyway, I didn't mean to derail things too much here. It's really a nitpick
and has little to do with the article itself.

------
_nullandnull_
Kind of sad that 75% of the comments in this thread are negative or pedantic.
I work as a reverse engineer and I thought it was a good "gentle primer".

~~~
aptwebapps
Whenever I read a comment like this of sufficient age I wish I could see a
snapshot of the page at the time. Right now there's only 48 comments in total
and I don't really notice much negativity. There's some back and forth nit-
picking, do you consider that negative?

------
jkubicek
This was great. If there's ever a "Gentle Primer, pt. II" I'd love to see a
walkthrough on replacing the existing hardcoded password with a different
string.

------
shanemhansen
I used emacs hexl-mode and
[http://support.amd.com/TechDocs/24594.pdf](http://support.amd.com/TechDocs/24594.pdf)
to edit a je to a jne which caused the program to think I put in the correct
password. That was fun.

~~~
palmer_eldritch
But it doesn't work anymore with the correct password...

I always found it was cleaner to either force (jmp) or remove (nopnop) the
jump rather than inversing its condition. It's more explicit.

Also, in the real world, cracking's usually a bit more than finding the right
jump to force/remove. Although, if it's enough to reach your goal, you should
do it.

------
i5rider
The title seems a bit misleading, e.g. one could reverse engineer source code
into UML. Perhaps a more appropriate title would be: A Gentle Primer on Code
disassembling.

~~~
jasode
For what she's describing, " _reverse engineering_ " is actually the more
common phrase rather than "code disassembling".

If you search amazon.com, " _reverse engineering_ " is in the titles of the
first 2 books:

[http://www.amazon.com/s/ref=nb_sb_noss_1?&field-
keywords=rev...](http://www.amazon.com/s/ref=nb_sb_noss_1?&field-
keywords=reverse+engineering)

The way most people use the terminology now, I'd say "reverse engineering"
encompasses all the strategies of analyzing and unraveling the logic of
assembler code. On the other hand "code disassembling" is more specific about
using a tool such as IDA to disassemble the binary executable into assembly
_before_ the intellectual task reverse engineering begins.

~~~
revelation
That's a bit misleading. Reverse engineering has, on the face, little to do
with assembler of any sort; it's about understanding your target and choosing
the best angle of attack to learn more about it. That often depends on how the
target was produced in the first place. If it's written in C or C++, sure, go
to the assembler. If it's some Adobe Air program that is really just a
launcher for a SWF, well, you're not going to have much luck. Similarly, the
_application logic_ in a program like EVE Online is almost entirely written in
Python; you wouldn't exactly know it from the outside, because they've wrapped
all the native things they need. But then you won't learn much either just
looking at the assembly of what are various layers and layers of wrappers.

------
GotAnyMegadeth
Something interesting I found out whilst following these instructions: My
version of linux has two different echo commands `echo` and `/bin/echo`.

The basic one cannot accept any flags and when I type `echo "\x01"` it prints
`\x01`

The one in bin can accept flags, and requires the -e flag to interpret
backslashes.

This changes the echo line of the program to this:

`/bin/echo -e "\x01"`

I found this out because it was changing the byte from 00 to 5C rather than
01, because 5C is the ASCII for \

~~~
zrail
Interesting. You might be seeing the bash (or other shell) `echo` builtin.

~~~
GotAnyMegadeth
$ which echo

echo: shell built-in command.

------
Kenji
This is slightly offtopic but I find the vulgar example passwords amusing. I
used to use a lot of vulgar language in my code but when you have to commit it
and other people (e.g. your supervisor) read it, you have to be more careful.

~~~
jkubicek
I find it especially amusing because "poop" and "butts" are my default "I need
to enter some text here" strings.

~~~
seanp2k2
This is why we have [foo, bar, baz, quux] and example.com though:
[https://www.iana.org/domains/reserved](https://www.iana.org/domains/reserved)

~~~
caipre
I recently had to expand my list and came up with:

    
    
        foo bar baz buz qux pip pop tut rof art uff dex dom zed

~~~
juliangregorian
Except there's already foo, bar, baz, qux, quux, garply, waldo, fred, plugh,
xyzzy, thud.

[http://www.catb.org/~esr/jargon/html/F/foo.html](http://www.catb.org/~esr/jargon/html/F/foo.html)

------
AlexeyBrin
_C lacks a boolean type_

This is false, as of C99 we have booleans in C, just include stdbool.h in your
code, e.g.:

    
    
        #include <stdbool.h>
        
        ...
    
        bool test = true;
    
        ...

~~~
zrail
Let's look at the source of stdbool.h:

    
    
        #define true 1
        #define false 0
    

[http://clang.llvm.org/doxygen/stdbool_8h_source.html](http://clang.llvm.org/doxygen/stdbool_8h_source.html)

~~~
AlexeyBrin
I know that sdtbool actually contains macros for defining true and false, but
_Bool is a new type in C99 ... But it is in the standard, implemented by all
major compilers, so it should be used for the sake of clarity.

IMHO this:

    
    
        bool is_valid(char* password);
    

is more clear than this:

    
    
        int is_valid(char* password);

~~~
zrail
But here's the thing: in the assembly (the very essence of the essay),
_booleans are still represented as integers_. Using a standard macro would
obfuscate the path to getting to the payoff of being able to switch a single
byte from 0 to 1 and crack the program, because the author would then have to
explain macros and delve into a bunch of ancillary stuff.

~~~
farmdve
Starting to work with Reverse Engineering assumes prior knowledge of
programming and/or some basic information about assembly for the specific
platform.

Furthermore as the examples are given in C, it is implied that the reader
needs to be familiar with the language, and as a consequence, knowing about
macros, should be if not a given, then at least required.

~~~
zrail
The original audience of this content (i.e. not HN) is a broad interest group
so you can't really assume that kind of knowledge. Again, think "hey this is
neat!" vs "this is absolutely technically correct in all ways."

------
zellyn
Again (hackernews should just prepopulate my comment box with this text), if
you enjoyed this, you'll probably enjoy
[https://microcorruption.com/](https://microcorruption.com/) Surprised to see
that it wasn't already mentioned, especially since one of the creators,
tptacek, was active on this thread.

------
kbart
Slightly off-topic. Could somebody recommend good resources on reverse
engineering (especially C and Linux)? I'm writing C code for living, but
binary level security is not my strong side and I wish to improve it.

~~~
Qworg
I don't do this for a living, but an accessible resource is the UIUC ACM
SIGMil guide. It is a little old, but relevant.

[http://althing.cs.dartmouth.edu/local/www.acm.uiuc.edu/sigmi...](http://althing.cs.dartmouth.edu/local/www.acm.uiuc.edu/sigmil/RevEng/)

------
davenportw15
Wouldn't be cleaner to implement is_valid like this:

    
    
      int is_valid(char* password) {
        return strcmp(password, "poop") == 0;
      }

~~~
codygman
Maybe, but if/else is more approachable especially if you haven't seen that
pattern before.

------
mwcampbell
Would have been a bit more realistic if the program had been stripped of
symbols before disassembly. Still, the call to strcmp would be pretty obvious.

------
kschmit90
FWIW I would wager that a male would not have to inb4 pedantic arguments. Also
that a male would not get so much criticism in regards to nit picky things.

But maybe it's just a common trait of detail oriented programmer types to be
pedantic?

