
Why learn Graph Theory? Here are some reasons ... - RiderOfGiraffes
http://andresosinski.com.ar/blog_view_entry/?id=1
======
daeken
I have to throw my personal favorites into the ring: decompilation and
deobfuscation.

In either case, after a certain amount of analysis you'll end up with a set of
nodes (typically basic blocks) which connect to each other, but they'll be
what I call 'immature' nodes. You know they connect, but they all effectively
end with gotos. To make any sense of them, you have to make them into mature
nodes: if, if-else, while, do-while, for, etc. Via graph reduction this is
possible.

For instance, a few years back I was reverse-engineering Apple's Fairplay DRM
and ran into an odd obfuscation scheme. There were dozens of functions which
had the same structure: a switch inside a while statement; a state machine.
After some hand-analysis, I figured out that at the end of each state, it'd
set the state variable to either a constant value (a jump) or one of two
conditional values (a conditional branch). At this point, it became clear that
they were simply taking basic blocks and turning them into a big obfuscated
state machine.

To make this into something worthwhile, I wrote a script that would go through
the function and 'execute' it, to determine the state connections
automatically. Then from there, I wired them up into a big graph, with two
types of nodes: Fallthrough (no conditional) and Branch (conditional). Now,
that helps a lot in terms of showing you what's happening, but not nearly as
much so as control flow structures do, so the battle was only half over.

Now, if you look at common control flow structures from a graph perspective,
they all look pretty straightforward (using something like dot graph
notation):

    
    
        A; if(X) {B} else {C} D  becomes A -> B, A -> C, B -> D, C -> D.
        A; if(X) {B} C  becomes A -> B, A -> C, B -> C
        A; while(X) {B} C  becomes A -> B, A -> C, B -> C
    

So if you walk over the graph looking for these patterns (only looking at
single nodes, mind you), replacing them with their mature equivalents, and
recursively do that until you don't change anything, you can end up with a
fully deobfuscated, cleanly readable result.

~~~
tptacek
Great story. Worth pointing out here that reversers have been running with
this idea full speed since ~2000. For instance, Halvar Flake and his team at
Zynamics use graph isomorphisms of basic blocks in BinDiff, which is the
industry standard tool for reversing security patches back to vulnerabilities.

Obviously, anyone who's ever used IDA Pro is familiar with control flow graphs
and bblocks; hit the space bar in IDA and you get it graphically.

~~~
daeken
Yea, there's a lot of interesting things that have been done with graphs. I'm
currently working on a reversing platform, and one of the core things I'm
doing is elegant graph manipulation. It's amazing how much you can do when you
have nice graphs, e.g. native pattern matching over graphs.

------
anatoly
Most of these examples are either naturally graphs already, or on the
contrary, can be presented as graphs through ingenious reductions (e.g.
coloring as an approach to register optimization). So transforming the domain
problem to a graph theory problem is either trivial or hard.

I found that a huge benefit of knowing some graph theory lies in being able to
approach problems where this transformation is _easy_ \- neither trivial nor
hard, but easy once you have some experience and intuition working with
graphs. Every time you have a bunch of entities, and you can kinda think of
their interdependencies as a binary relationship of some sort, there's a
warning bulb going off in your brain - "is this a graph problem?". And if it
is, then very possibly you immediately get lots of insight into your problem,
basically for free.

This occurred to me a few weeks ago as I was thinking about this puzzle:
design a circuit that inverts three inputs, but uses only two NOT gates (plus
arbitrary amount of ORs and ANDs). It's an interesting puzzle to solve
manually, but I also wanted to quickly write a program to find the solution.
Circuits are trivially graphs, but there're too many of them (an infinity) to
iterate over. Once you see the problem as a reachability problem in a certain
fixed graph, however (easy, but not trivial), the code writes itself. Someone
with no exposure to graph theory might easily just balk at this easy problem,
not knowing how to proceed.

~~~
assemble
I can get the circuit with 4 XORs and 1 NOT, or 3 XORs and 1 XNOR. I can't see
a way to simplify to just ANDs, ORs and 2 NOTs. It's an interesting problem,
were you able to find a AND, OR and 2 NOT solution? Or did you wind up using
XORs?

~~~
anatoly
No, the requirement is specifically to use just ANDs, ORs and two NOTs. There
is a solution, I did find it. It's not particularly easy to find.

------
lmkg
On a related note, Graph Theory is one of the top reasons to learn Linear
Algebra. All graphs (included directed, weighted, and multi-graphs) can be
represented intuitively by adjacency matrices, and matrix operations often end
up being meaningful in terms of the graph they represent. Seeing the
connections between a graph and its matrix helps to understand both of them,
and being able to switch back and forth between mental models is often useful.
For example, Markov Chains are the go-to tool in many fields of modeling, are
most easily thought of as weighted graphs, and are most easily manipulated as
matrices.

------
kmak
Graph theory was easily the most practical thing I learned from programming
contests. Most of the applications are "obvious", but only if you play with
enough graphs.

------
ableal
He misunderstood the circuits problem. It's not a graph planarity problem -
anything significant is not going to be planar. Only very simple circuits are
wireable with less than two layers (and then by, for instance, sneaking a
couple of lines under a resistor, etc. - not planar).

But you do get to use graphs to solve the component placement problem, in
heuristics that try to minimize total wire length. That hopefully means that
wiring delays are smaller, and routing more feasible. One of the classic
heuristics involved successive bipartions on the min-cut of the graph.

Nitpicking aside, it's a good post, well argued. My intro book recommendation:
[http://www.reddit.com/r/mathbooks/comments/aho2h/graph_theor...](http://www.reddit.com/r/mathbooks/comments/aho2h/graph_theory_with_applications_ja_bondy_and_usr/)

~~~
michael_dorfman
I like the cheap Dover intro: [http://www.amazon.com/Introduction-Graph-
Theory-Advanced-Mat...](http://www.amazon.com/Introduction-Graph-Theory-
Advanced-Mathematics/dp/0486678709/)

~~~
antester
If this is the same Dover book we used in our graph theory class, I found it
difficult to learn from without a professor explaining the unclear sections.
Not just doing the exercises, but some of the things they presented as trivial
had some important steps missing.

------
argv_empty
_This is an NP-complete problem, so you need to know that no optimal solution
can be found in a reasonable time for large problems._

Apparently, the interference graphs that arise from SSA code are a well-
behaved sort of graph, colorable in polynomial time.

<http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/6532>

------
mark_l_watson
Good summary list. For most developers, there is a nice abstraction barrier
for graph theory because tools like neo4j, efficient RDF stores, search
libraries, etc. make it fairly easy to build apps without getting one's hands
dirty in the details. (But knowing the details makes it more likely to get
creative and novel apps.)

Off topic, but: since so many things are well described as graphs, the
GraphViz tool is great for communication: few lines of code to write your data
to a 'dot' file, and a few seconds later you have a visual representation of
your data.

------
simonw
What's a good curriculum / reading list for learning graph theory? What kind
of things do you need to understand before you can check it off the "things to
learn more about" list?

~~~
RiderOfGiraffes
This book is on-line, free, and the few bits I dipped into are sufficiently
accurate.

[http://www.ecp6.jussieu.fr/pageperso/bondy/books/gtwa/gtwa.h...](http://www.ecp6.jussieu.fr/pageperso/bondy/books/gtwa/gtwa.html)

Not sure what it would be like to learn from - certainly you would have to
"Read Like Math" and not "Read Like Prose". You would be strongly advised to
do the exercises properly and not just skim.

------
owinebarger
I agree with his assessment. Any data representation can be construed as a
graph - this is the basis of garbage collection. Besides the areas where
graphs (or networks) are explicit in a problem, the most powerful use comes in
analyzing a given model/algorithm. If you don't take that step to the meta-
level, you might not see the use for graph theory.

------
j_baker
I find that the word "dependency" is a huge indicator that I'm going to need
some kind of graph theory.

------
petern
Awesome post, love the pragmatic and no-bullshit approach! There are some more
real world graphy projects on the Neo4j wiki,
<http://wiki.neo4j.org/content/Neo4j_In_The_Wild>

------
swah
Hm, blog has two entries, both are on HN at this moment... is it considered ok
to ask for friends to post links to your blog? Just askin...

~~~
Zev
Perfectly fine to do; even better if you post it yourself. If the posts are
interesting, they'll get upvoted. If not, they won't.

~~~
eru
Yes, self posts are perfectly fine, as long as you don't overspam.

------
Locke1689
Can I just add in anything and everything having to do with networks?

I've run into a number of forest problems due to distributed systems.

------
wlievens
Good post, but it's not like anyone with a bit of imagination can't come up
with (most of) this list himself.

~~~
timwiseman
This is true of most things.

A few things require true genius, but most things can be done by anyone with
enough effort. But few people put in the effort so when someone does and
produces good results, they deserve the credit.

------
sid0
One very cool thing I learned today is that knowledge amongst a number of
people, with the help of modal logic and Kripke possible world semantics,
essentially reduces to a graph reachability problem.

