
Ask HN: What's the prerequisite to become an exploit developer? - Qrius
I want to learn reverse engineering (RE) and exploit development.<p>There are great resources for both of them (like MBE http:&#x2F;&#x2F;security.cs.rpi.edu&#x2F;courses&#x2F;binexp-spring2015&#x2F; and RE https:&#x2F;&#x2F;github.com&#x2F;fdivrp&#x2F;awesome-reversing).<p>The only problem is that there is hardly any article which actual lays out a path for a complete beginner. 
I want to understand what are the ACTUALLY NECESSARY topics required and in RIGHT ORDER to MINIMIZE the TIME WASTING and wandering in between topics so that the knowledge aqcuired is more practical in context of current vulnerabilities rather than being more theoretical.<p>Something like programming fundamentals&gt;python&gt;C&gt;Assembly&gt;Computer Organization&gt;Windows Internals&gt;Reversing&gt;Fuzzing etc. (its only an example, please teach me the correct order)<p>Actually there&#x27;s an article http:&#x2F;&#x2F;www.myne-us.com&#x2F;2010&#x2F;08&#x2F;from-0x90-to-0x4c454554-journey-into.html but it is quite outdated.<p>Your opinions along with updated resource links and views on them will be greatly appreciated. Please, only focus only on topics that are actually necessary because RE and Exploit development are vast topics in themselve (for eg: Do I actually need to learn computer architecture or organization? AFAIK those circuits and architecture specific thing is more helpful for a electrical or electronics engineer rathen a RE. We need to learn the intricacies of a particular architecture like x86 or ARM but in my opinion those circuits won&#x27;t help any RE in his daily reversing schedule, so please foucs on the only required part like one I know: OS fundamentals.
======
Qrius
EDIT: Please also describe your best practices and things you learned from
your mistakes.

------
kdbg
[Part 1 of 2]

I'm sorry this didn't get more responses, its a worthy question.

First, I've tackled this topic a few times:
[https://www.reddit.com/r/AskNetsec/comments/5i73db/path_to_e...](https://www.reddit.com/r/AskNetsec/comments/5i73db/path_to_exploit_developer/db61ken/)
is probably the most concise post of mine but I'll give a more complete
(opinionated) guide below

> There are great resources for both of them (like MBE
> [http://security.cs.rpi.edu/courses/binexp-
> spring2015/](http://security.cs.rpi.edu/courses/binexp-spring2015/) and RE
> [https://github.com/fdivrp/awesome-
> reversing](https://github.com/fdivrp/awesome-reversing)).

I like the RPI course but without lectures it feels like there are too many
gaps for it to be something I actually recommend to people

> Actually there's an article [http://www.myne-
> us.com/2010/08/from-0x90-to-0x4c454554-journ...](http://www.myne-
> us.com/2010/08/from-0x90-to-0x4c454554-journey-into.html) but it is quite
> outdated.

Don't worry about content being dated, you will not find any good resources
that cover right up to modern exploits what you will find are resources that
can be used to train you up in the necessary foundations for you to understand
modern exlpoits with a bit of your own leg work. The "From 0x90 to 0x4c454554"
article actually looks like a very good collection of resources. Dates
resources are still valuable as modern exploits are still doing the same thing
(trying to get control of the IP register) its just now there are mitigations
in place that require extra steps or to follow certain restrictions to be
successful, but if you don't understand the foundation you can't learn the
modern stuff.

> I want to understand what are the ACTUALLY NECESSARY topics required and in
> RIGHT ORDER to MINIMIZE the TIME WASTING and wandering in between topics

The problem with that is that that time wasting and learning other topics that
are not immediately useful are immensely beneficial on a whole. Honestly, a
big part of exploit development is spending hours researching dead
ends...seriously that's a big part of it. It requires a wide breadth of
knowledge not just a bunch of tips and tricks specific to exploit development.
In order to craft sophisticated exploits against a system you need to
understand that system, how its built and how it works. This comes from that
wandering research, it may not be immediately valuable but doing it often
leads to building up a set of topics that you are deeply knowledgeable in.
And, you you can draw from that knowledge in exploiting certain interactions
later on. That wandering, beating your head against a wall, getting stuck not
knowing what to do is part of exploit development. Trying to avoid it honestly
doesn't sound like a good idea, you'll end up knowing a bunch of tricks but
lacking the background to apply them creatively.

None the less, the first thing you need is some development background. If you
want to break software start by learning how software is built.

So, I recommend starting by learning C. First C is the lingua franca of
exploit development and reverse engineering. If you understand C you can break
any high level software down into its 'C parts' and from that C you can
determine what the machine code probably looks like. C is the perfect middle
ground, thinking in machine code or assembly is too tedious and languages like
Python or Java are too high level to capture whats going on at the CPU level.
Software exploits ultimately are about controlling the IP register on the CPU
that is why being able to understand the whole stack is important.

Start with C, but learn a higher level scripting language, something that is
used for quick jobs, prototyping, etc. While C is nice it also requires a lot
of lines to do some tasks that are very simple in higher level languages
(string manipulation and network communication for example). So, its useful to
know a scripting language to actually do quick jobs in. Python and Ruby are
the two common choices right now, historically Perl was the most common
choice. Now everyone I work with knows Python, but Ruby has its place too.

If you struggle to learn C, you might find it useful to start with a scripting
language which are often said to be better for learning to program with. Maybe
it is but I tend to side with Evan Miller who writes:
[http://www.evanmiller.org/you-cant-dig-
upwards.html](http://www.evanmiller.org/you-cant-dig-upwards.html) in that C
is a better place to start even if the results are not as immediate. Still, if
C isn't working for you start with the scripting language then learn C.

Further, once you know C and a scripting language, learn one of the modern
work-horses, Java or C# (I lean towards Java). Most developers will know one
or both of the languages and getting an understanding of Object Oriented
Programming is useful to understanding how software is built. You find OOP
design patterns in most software.

Once you know a native language (C, C++) an intermediate language (Java, C#,
etc) and a scripting language (Python, Ruby, Perl, etc) you'll be able to work
with the majority of code-bases out there even if you don't know the specific
language in use you'll understand the core concepts that underline most
languages. There are some other paradigms seen in less common languages like
prolog (logic programming) or haskell (functional programming) that you won't
understand but those three languages cover practically all modern software
you'll encounter.

Recommended Books: you don't need to read all of these but its a selection you
might find useful. Also, I am a fan of the Head First series which usually are
not very thorough and some find the writing to be too casual and annoying so
YMMV.

1\. Head First C, covers the basics of C in an easily digested format. Some
I've recommended it to have found its section on pointers useful.

2\. The C Programming Language, the Bible of the C language, you'll pick up
some bad habits from it, and its dated but its a concise overview of C.

3\. Violent Python, Its a bit steep of a learning curve but its teaching
Python with a security focus so it fits in very well.

4\. Head First Programming - If you struggle with the basics of coding, it
might be worth checking this book out.

5\. Dive Into Python -
[http://www.diveintopython.net](http://www.diveintopython.net)

6\. Design Patterns: Elements of Reusable Object-Oriented Software - Once you
understand the programming its time to learn the architecture of software,
this is the classic book on the topic. 7\. Head First Design Patterns - if the
last book was a bit too steep for you the Head First book might be a bit
easier.

8\. Beej's Guide to Network Programming - You need to understand sockets, and
socket programming, no exceptions and for C this is the best guide available.

While you're learning to code you should also be trying to get some experience
working with Linux. Even if Windows is your daily driver (as it is mine) Linux
is inescapable you'll rarely come across a network that doesn't use Linux
anything (and similarly, you'll rarely come across a corporate network without
some Windows servers too, so don't ignore Windows server). I don't have any
book recommendations on this, just setup your own Linux box learn the
different package managers and how to do what you need from the terminal.

I do recommend working through the following to get some general experience.

1\.
[http://overthewire.org/wargames/leviathan/](http://overthewire.org/wargames/leviathan/)
\- You'll learn about Linux with this one, and it doesn't require any
programming ability so its a fair starting place.

2\.
[http://overthewire.org/wargames/bandit/](http://overthewire.org/wargames/bandit/)
\- Bandit is a easy server/wargame to start with, and it'll teach you a bit
about Linux and require a bit of coding but there is nothing that requires
technical exploit development knowledge. 3\. [https://exploit-
exercises.com/nebula/](https://exploit-exercises.com/nebula/) \- Nebula is a
little more difficult than Bandit but still non-technical and imo a little
more fun.

Working through these give you some comfort working in a terminal and some
experience getting the attacker's mindset and breaking things. You can do
these while learning the stuff above.

You may also want to get some practice programming with challenges such as
those at:

1\. ProjectEuler.net - These often require some mathematical insight but
they're fun so I'm including it

2\. [https://leetcode.com](https://leetcode.com) \- This is used by some
developers to practice before an interview, various levels so it'll work just
for getting some experience with programming.

~~~
kdbg
Now that you've got a foundation in software development we can move onto
exploit-development. Well not quite first you need to bridge your knowledge
from building software to breaking it. For this you need to start learning
about how software works at a lower level, the CPU level. There is a good
book: Computer Organization and Design. Its solid but its also a textbook and
covers a lot more detail than you need (though still valuable to know). It
covers MIPS and how the CPU works, you probably can skip stuff about the
hardware and microcode though. MIPS is a simpler assembly language than
intel's stuff so its a nice starting place though.

Intel is what you'll most often encounter though so you do need to learn x86
and x86_64.

OpenSecurityTraining.info provides a number of courses that are valuable for
this bridge to breaking software.

1\. Life of Binaries -
[http://opensecuritytraining.info/LifeOfBinaries.html](http://opensecuritytraining.info/LifeOfBinaries.html)
\- This helps you go from understanding to software to understanding the
system around the software and the context in which software runs. 2\.
Introductory Intel x86 -
[http://opensecuritytraining.info/IntroX86.html](http://opensecuritytraining.info/IntroX86.html)
\- Really boring/dry class on x86 instructions but gives you the introduction
you need.

3\. Introductory Intel x86-64 -
[http://opensecuritytraining.info/Intr](http://opensecuritytraining.info/Intr)
oX86-64.html - Just slides this time, good to review and get a sense of the
differences between 32bit and 64bit intel assembly.

Once you've gotten the basics its finally time to move onto learning the
actual exploit development skills.

1\. Introduction to Software Exploits -
[http://opensecuritytraining.info/Exploits1.html](http://opensecuritytraining.info/Exploits1.html)
\- In my opinion this is simply the best resource out there to learn the
basics. It uses the book "The Shellcoder's Handbook" as its textbook and I
completely recommend the book.

2\. Hacking: The Art of Exploitation - This is the most often recommended
book, its great and has a much better introduction than the Shellcoder's
Handbook but if you can make it through the course above without probably you
can probably skip this book as the two resources covers more content but this
book is one of the best introductions available. 3\. Corelan's Exploit
Development Tutorial Series -
[https://www.corelan.be/index.php/2009/07/19/exploit-
writing-...](https://www.corelan.be/index.php/2009/07/19/exploit-writing-
tutorial-part-1-stack-based-overflows/) \- It'll start in familiar territory
but it'll get into some new stuff, overall a good series. 4\. Exploitation in
the Windows Environment -
[http://opensecuritytraining.info/Exploits2.html](http://opensecuritytraining.info/Exploits2.html)
\- You'll find some overlap with Corelan's tutorial series and this course so
you might want to take this course and reference the tutorials as you go.

5\. A Bug Hunter's Diary - Excellent book that covers some similar topics as
the previous resources but spends a bit more time on actually finding
vulnerabilities not just exploiting them and goes into more mitigations than
the previous resources also, skip the stuff you already know. 6.

While learning all this exploit development stuff, there is another necessary
skill to actually finding vulnerabilities: reverse engineering.

There are two books that I frequently recommend on the topic:

1\. Reversing: Secrets of Reverse Engineering - this is the most popular
recommendation and its a great resource to work through.

2\. Practical Reverse Engineering - This is a new comer (2014) but I quite
like it. It isn't as 'complete' as Reversing is but it covers a wider rage of
topics that I find more useful.

3\. (Bonus) Malware Analyst's Cookbook - Malware Analysis is probably the most
RE heavy field you can be in so this is a solid book on the topic. Just
because of its name I didn't give it a fair chance when I was reviewing books
to recommend but I did review it recently and do want to give it a plug and it
has a lot of practical information and labs to work on.

By this point you should have a reasonably solid foundation and a good
understanding of exploitation. You will not be up to writing the latest
browser 0day but you'll have the foundation necessary to understand (and learn
from) modern sophisticated exploits so you can find and development them
yourself. There are no resources to fill in the final gap but to go out do
your research on a system and apply what you've learned to find some way to
break them and development that weakness into an exploit.

To get experience, there are a few resources I can recommend:

1\. Exploit-Exercises, I already mentioned Nebula, Protostar should be
accessible to you once you've done the first Software Exploits course, and
Fusion after the second one.

2\. Over the Wire, I've already mentioned a couple of their servers, check out
the rest of them.

3\. Pwnable.kr - Challenges are at various levels use the harder ones to
challenge yourself.

4\. Capture-The-Flag competitions - every year several CTFs are run, sign up
and play in them. What is nice about CTFs is that they are bite-sized
challenges, still difficult, still involving modern techniques (the ones worth
the most points atleast) but not tedious and they don't require a big time
investment to find a weakness in. The focus of the challenge is on the exploit
development rather than on finding vulnerabilities. 5\. CVE lists - find
software that interests you, find a known vulnerability and try to build your
own exploit in it.

6\. Real world software, go and break something of interest to you, learn how
it works, find a vuln and exploit it.

You may need to learn a new language, or research some new techniques to
handle some mitigations, but you should have the foundation necessary to
figure out what you don't know and how to learn what you need.

...and with all this content I never even touched on breaking web
applications, so I must atleast give mention to "The Web-Application Hackers
Handbook" cover that book, practice against any of the many vulnerable meant
to be hacked web-apps out there (Damn Vulnerable Web App, OWASP Mutillidae 2,
HackThisSite, HellboundHackers, Enigma-Group, HackThis.co.uk, etc, etc)

Good Luck!

~~~
Qrius
I'm extremely grateful and was not at all expecting such an explanation.

I wanna exlpain few things.

Let me rephrase what I meant by "minimize the time wasting". You see there are
lot of great advice available online. You ask something on a subreddit or here
and then people will share great resources. I love this and this kind of
learning. My concern is that sometimes these resources and advice is given
along the lines of "although its not completely necessary, it'll still be an
experience in itself".

The problem here is that such kind of learning sometime waste too much of time
and leave you with confusion. People daily ask so many questions on CompSci
and you'll find books starting from complete basics of computer like Code
[https://www.amazon.com/dp/0735611319](https://www.amazon.com/dp/0735611319),
Nand2tetris course [http://www.nand2tetris.com](http://www.nand2tetris.com)
etc to something very sophisticated like AI. I hope you can understand that if
a person spends too much time on these kinda things given that he's got a job
or he's student in university with a sweet CompSci curriculum (you know what I
mean) then its a problem. Although the above mentioned resources are
exceptional there are others too which teaches the same thing. Can a person
read all of them one by one "just to satisfy his curiosity and thinking that
it'll help him in future"?

RE is already an extremely sophisticated and vast field which requires
computer mastery. I'm in college and it has made me hate things I loved. I'm
extremely curious guy and can spend 10-20 hours in front of PC easily. I've ~6
years of experience with linux. Now I'm literally not in a state to read 2-3
400-800 page books on a single topic which I don't even know would be required
in RE. There are some topics which are quite difficult but at least if we have
an idea that it IS mandatory for RE then you can be sure and refer other
resources. If you don't even know what's your syllabus how can one concentrate
and master it let alone learning. RE requires you to study every minute
details or computer system but wasting too much of time on those horrible
digital logics and design is really not worth it.

So My purpose is to make it completely clear what I actually need to know so
that I can focus on it instead of reading each and every topic in complete
detail thinking that if I'll miss the direction of even a single electron in
I/O I won't be able to do efficient reversing. I'm literally fed up of those
architecture diagrams with arrows and cramming those definitions ROM, EEROM,
EEPROM.............. again and again for tests and assignments.

I've few questions for you:

You mentioned Computer Organization and Design which I think is authored by
Patterson and Hennessy which is used by almost all Universities. I'm just
curious about its not so good looking amazon reviews. Also what's your opinion
on Tanenbaum's books which you've mentioned in that reddit link.

Now let's summarize what I've understood (PLEASE help me correct if I'm wrong)

>>>> UNDERSTANDING the system you want to hack

> Learn the most used fundamental programmming languages. (the way we TALK
> with computers) 1\. C (also C++ in some cases) 2\. Python or Ruby (given its
> dominance in industry right now thanks to its productive nature, also being
> used exploit writing) 3\. Java or C# (object oriented programming which
> along with above languaged completes our programming fundamentals) 4\.
> Assembly (obviously needed in RE) I think it need not be mentioned that we
> need to have good grasp of Data Structures and Algorithms with above
> languages (obviously not all)

> Understand each and every data flow and HOW a computer system work

Computer Organization and Design and Architecture

(OS fundamentals, memory management, virtual memory, paging, caching etc,
Linux(macOS too) and Windows internals part I think comes here)

You restored my faith in humanity when you said I can skip the hardware and
microcode part (please explain what specific topics, I swear I won't look at
them again until I'm done with required topics.)

> Network Fundamentals and Programming Basics of http, TCP/IP and other
> protocols.... Socket programming

>>>> THE HACKING PART

> Learning WHAT loopholes are there in this above process of data read write
> Types of attacks (buffer overflows, heap overflows....)

> HOW those loopholes are exploited

>Reverse Engineering (Learning tools of trade: IDA, gdb.....) learning and
practising reversing. Fuzzing

>Exploiting the bugs making exploits.

Please review and correct. Thanks again.

~~~
LiveOverflow
Shameless self-promotion. I have a YouTube channel where I basically try to
offer a path for learning exploitation. I'm done covering all the basics, and
we will soon move to more advanced stuff. I have videos on various different
security topics, but here is the probably more relevant playlist:
[https://www.youtube.com/playlist?list=PLhixgUqwRTjxglIswKp9m...](https://www.youtube.com/playlist?list=PLhixgUqwRTjxglIswKp9mpkfPNfHkzyeN)

~~~
Qrius
I know your channel very well. Its praised everywhere because of such good
content. I will be happy if you go through my main concern in the details and
read the above discussion. Thanks again for such a wonderful channel. I'll
surely learn from it when I'll cover the prereqs to understand what you're
saying in those videos.

~~~
LiveOverflow
> I want to understand what are the ACTUALLY NECESSARY topics required and in
> RIGHT ORDER to MINIMIZE the TIME WASTING and wandering in between topics so
> that the knowledge aqcuired is more practical in context of current
> vulnerabilities rather than being more theoretical.

To be honest with you? I consider that sentence almost offensive. I hear you,
but I think you have absolutely wrong expectations. You want to learn
something that is not a profession like plumber where a really good expert can
teach you everything you need to know with all the little tricks learned over
the years. The field is sooo huge diverse and complicated that this won't
work. And I think my playlist offers a rough outline that you can follow, but
without going down rabbit holes left and right, and getting stuck many many
times, you wont become good at it.

I understand the frustration that you don't want to "waste time" and that you
are busy already. But everybody I know who is good in this field, including my
own experience shows me, that nobody learns this stuff through a straight
path. And everybody knows that most of the time will be spent chasing rabbits
through a labyrinth and getting stuck.

Also there is no clear path. It's a complicated web you have to learn to
traverse. For example like "Learn C" \- what the f __* does that even mean? To
what extend? Hello World? Drivers? Or Operating System? "Learn assembler" \-
which assembler? have you looked into the Intel Instruction spec once? I doubt
any human knows every instruction. Also who said that intel is the way to go,
why not ARM or AVR. All of these fields offer a lifetime of studying in
itself.

The "art" in becoming good at security and RE is to get a broad knowledge of a
lot of things and try to simultaneously go deeper 'n deeper in all of them.
And if you are interested in a specific field, put more weight on those
topics.

You know how long it takes to reverse engineer something? People stare on IDA
for weeks or months at a time. You can't learn RE just by reading a book or a
blog. You gotta start to just doing it, and hopefully find a few blogs and
people to keep up the spirit.

~~~
Qrius
Why is it that K&R is referred as the greatest book on C but never recommended
to a complete beginner but only seen as a reference book?

Why is it that several resources exist on buffer overflows yet we ask question
on which one is better?

Why is it that you started your channel even though resources like Art of
exploitation and Shellcode Handbook already exist?

Why is that there are people asking question like "computer science books you
wish you had read earlier"?

Are the one who is questioning or answering is asking or telling a short-trick
to become the super h4x0r?

Internet forums exist for a reason. It is always wise to take the advice of
someone more experienced than you. I don't see any wrong in it.

The people who are on top are there because of a reason. The root of hacking
lies in outsmarting a coder by exploiting the mistakes in his code. Now even a
field like this has become a corporate profession.

But there's something that differentiates a hacker from rest of the people. I
think learning from somebody else's mistakes is one of the smartest thing you
can to do.

