
Using Genetic Algorithms to Break Things - Argentum01
http://westleyargentum.github.io/blog/2014/05/18/using-genetic-algorithms-to-break-things/
======
jal278
As a researcher studying evolutionary algorithms, one thing these algorithms
are quite adept at is finding _holes_ in the objective functions you give to
them. They are like automated lawyers seeking loopholes.

You want to evolve a controller for a robot that walks far, it will find
exploits in the physics engine that defy gravity or somehow catapult the robot
through the air (this sort of exploit is the bane of those doing research in
virtual creatures [1]). You want to evolve a talented player for an atari game
like beamrider, and evolution will find an infinite score-loop exploit [2].
It's meeting the letter of the law in maximizing your measure, but it's
missing the forest for the trees.

This kind of problem has led some in the community to look at driving
evolution by other means -- e.g. my dissertation was on evolving neural
network controls for robots by choosing the most novel behaviors instead of
those that performed the best by the fitness measure [3].

[1]
[https://archive.org/details/sims_evolved_virtual_creatures_1...](https://archive.org/details/sims_evolved_virtual_creatures_1994)

[2] [http://www.cs.utexas.edu/users/ai-
lab/?atari](http://www.cs.utexas.edu/users/ai-lab/?atari)

[3]
[http://eplex.cs.ucf.edu/noveltysearch/userspage/](http://eplex.cs.ucf.edu/noveltysearch/userspage/)

~~~
acadien
That sounds like over fitting. It happens with all optimization routines, is
the EA somehow special in this regard?

~~~
jal278
Overfitting is when your model is over-fit to the data you are training on
[1], but what we're talking about here is slightly different -- the EA has
maximized the given objective function _correctly_ , but it turns out that
your objective function is corruptible: A high value on your simple measure
doesn't match the intuition motivating it.

You had something particular in mind when you set the objective function for
your biped robot to be distance travelled before it falls. Finding an exploit
in the underlying physics engine that allows the biped to explode and travel
quite far -- while indeed excelling at the function, is far from what you had
in mind. Yet it does reveal a vulnerability in your physics engine, which
while unexpected, may be valuable in its own right, as the application the
post's author argues.

This isn't entirely unique to EAs -- but it seems to happen more frequently in
the kinds of domains EAs are often applied to -- complex simulations where the
gradient of an objective function is difficult or impossible to calculate.

[1]
[http://en.wikipedia.org/wiki/Overfitting](http://en.wikipedia.org/wiki/Overfitting)

~~~
darkmighty
I think he had something else in mind, somewhat related to overfitting --
you're maximizing (too much) a metric over an unrealistic solution space, and
that solution when applied to a real solution space won't behave as intended.

So those are "overfitting" for the exact quirks of the engine. I agree, it's a
stretch, but not unlike in ML, it shows that optimizing too much will show
quirks if your 'objective function/simulation' pair is not exactly equal to
the 'desired objective function/reality', or if there is an innumerable
ensemble of situations you might expect the solution to behave well.

------
stcredzero
Thomas Ray, who wrote the A-Life program Tierra found that genetic algorithms
were really good at finding bugs in his code and exploiting them. One bug
where that happened was a register he forgot to clear between simulated
viruses. A virus evolved to exploit the data and use it to attack other
viruses.

~~~
kordless
We should put them to work finding bugs in code. Breed one to click buttons
and fill in webforms...and it gets to replicate when it breaks things.

~~~
rainforest
EvoSuite springs to mind for that one [1]. Java only though, but their
publications describe how it works well enough to port the ideas from what I
can tell.

[1]: [http://www.evosuite.org](http://www.evosuite.org)

------
fizwhiz
I'm in complete agreement. A short while back I mused on whether we could
leverage genetic algorithms to actually find a single point of failure in the
systems we're designing (linked here:
[https://news.ycombinator.com/item?id=7660998](https://news.ycombinator.com/item?id=7660998)).
Heck, I can totally see people using genetic algorithms to come up with
nuanced unit/functional test input-data just to see what happens (sorta like
Haskell's quickCheck but on steroids spanning subsystems). Tired of using a
simple-minded tool like jMeter that just hits your service with 1000
concurrent requests that look similar? Let's use GA to simulate how a certain
"variety" and "volume"of requests can cause a cascading failure while your
app's JVM is undergoing garbage collection. That's a contrived example, but
you get what i mean, right? For real though, I wonder how many people are
using this on a daily basis to spruce up their test cases (or as you said,
rewarding things like colliding with walls). It would be interesting to see
how much success they've had with this.

~~~
stcredzero
There are people using "generated" tests based on specs crafted by
programmers. The whole thing is called "generative" testing.

The reason why this stuff works, is that there are almost always lots of bugs
out there, and that RNGs aren't subject to the misconceptions programmers are
subject to. It's also why fuzzing works.

I could see GAs doing very well with this.

~~~
fizwhiz
I definitely think that GAs can uncover a variety of bugs ranging from simple
NPEs and more nuanced bugs like memory leaks (which would then required human
dev intervention to investigate for a post-mortem) by dynamically generating
the test-inputs.

In addition to simply generating the input data, do you feel like GAs could
broaden their span to essentially "mock" the states of other components in the
system? I'm thinking of a case where you have some set of services deployed on
different machines that communicate with each other. In theory, could we have
the GA "mock/simulate" a network jitter sporadically (thus intercepting a
request from ServiceA to ServiceB and deliberately dropping it)? This extends
beyond the input data for some entry point at ServiceA, and instead
encapsulates some sort of an "ether" surrounding all the components. Every
permutation and combination of the subsytem state could in theory be
controlled by the governing GA.

If someone actually built a DSL/library that handled these things I'm sure it
would benefit everybody in a remarkable way.

~~~
platz
Also: "Genetic algorithms are not really an off-the-shelf black box that you
can just plug your data into and get results."

[https://news.ycombinator.com/item?id=7712863](https://news.ycombinator.com/item?id=7712863)

~~~
fizwhiz
You post seems to be insinuating that we deem GAs as a panacea. No one here is
saying that; to the contrary, we're just trying to see how we could use them
for dynamically generating interesting test-input data. In addition to that,
I'm just thinking out loud if you could possibly append to that functionality
and see how to "prepare" more interesting test cases when multiple layers are
involved.

No one is disputing the fact that you need an expert to tune these to get the
desirable result for hard problems. I argue that having a "good enough"
understanding of GAs (i.e. you don't need a PhD in the subject) should be
sufficient for you to solve simpler problems such as the one we're discussing.

Do you have any counter-arguments to that? Can you cite any other examples
where this view is challenged?

------
justin66
"We will not discuss genetic algorithms further, to discourage you from
considering them for your applications."

Skiena in _The Algorithm Design Manual_ , at the end of the three-paragraph
blurb where he talks about them.

~~~
sitkack
If you are considering GAs, you have already given up at that point.

------
maspwr
As another example of using genetic algorithms for breaking things, a friend
of mine wrote his dissertation [1] on using genetic algorithms for file format
fuzzing. I believe there may even be some source code available on GitHub.

[1]
[http://trace.tennessee.edu/utk_graddiss/1347/](http://trace.tennessee.edu/utk_graddiss/1347/)

------
srean
A long time ago I had huge interest in genetic algorithms and on other soft
optimization techniques. It is what inspired me to take up graduate studies in
the first place. But very quickly I got thoroughly disillusioned by the
community around it.

What I am going to say is going to be very unpopular to the audience of this
post. What turned me off and left a bad taste is the tendency for the
community to push it as magic, much like what a shady snake oil salesmen would
do.

Rather than building the theory on analysis, classification and measurable
properties and definitions of objective functions for which GA with specific
mutation and crossover operators are a good solution, the way of the community
is to push it as snake oil. Often taking examples of cooked up functions that
arent really important to globally optimize over and totally ignoring speed of
optimization with other competing tools.

The method of choice of the community seemed to impress the audience, not with
how fast and effective the methods are compared to the traditional, but with
colorful analogies. Heck, a bead of perspiration rolling down an armpit
optimizes a function, that does not mean that is how we should optimize any
function on a computer that way.

This is really unfortunate, because if one goes back to GA's genesis,
_classifying and characterizing functions that yield themselves to GA was a
big part of David Goldberg 's book_. There are indeed functions that are very
amenable to GA and other such methods. Characterize them mathematically,
rather than what seems to me, cook up two adhoc operations name them
'crossover' and 'mutation' and look cool.

That is hardly progress.

 _On the other hand if you prove a theorem stating that for this types of
problems, if you let GA run for this amount of time you are going to be close
to the optimum with high probability. That would be phenomenally useful._

 _At its basic, GA is a parallelized and randomized exploration of the space.
Tell me why is that particular way of randomization and parallelization the
right thing to do, or for what classes of problems is it the right thing to
do. Without this, they are anecdotes._

I will give you a specific example of what I think is a better way. Consider
the clustering algorithm K-means. Obtaining its global minimum is an NP hard
problem and has been a target for GA, ant colony, swarm optimization and what
have you. All of them are blown way off the park by simply better
initialization of the standard K-means algorithm. For example just initialize
it with a set of furthest first points and it will give good results. Not only
that, it will provably be very close to the global optimization, but people
did not know that in the beginning. Some people had to say to themselves "I am
going to find out more about the structure of the problem so that I can find
efficient methods". These make progress.

Take another example, non-negative matrix factorization: another objective
function that is NP hard to optimize. I will make a different point with this
example (although the same point could be made with the k-means example). The
cases where the simple local algorithm for NMF does not converge to near
global optimization are provably uninteresting to begin with. Even if the
global optimum was indeed found, it would serve no purpose.

Another example global optimization of low rank matrix factorization models
for recommender system, same problem again, optimal solution NP hard to
obtain. Again simple methods do provably well if you utilize the structure of
the problem.

Often the difficult function that one is trying to optimize can be entirely
circumvented by modeling the phenomena with a better behaved function, model
the problem differently so that better techniques apply. This not always
possible, but I see people not even trying, apparently it is easier to publish
a few papers when you spray it with colorful analogies. An example: protein
folding. It is an misguided effort to find the globally optimized shape,
nature does not fold proteins optimally.

Another example: consider classifiers. Misclassification as a function is the
worst nightmare, non differentiable, nonconvex, with huge swaths of zero
gradient. So what do people in machine learning do, bound it on top by a well
behaved function and optimize _that_ function. They dont stop there. They
first validate empirically that optimizing the surrogate has good performance
and then prove theoretically that this comes with guarantees on performance.

I am more convinced that analyzing important objective functions (that are
important to optimize), studying their properties and motivating performant
optimizations schemes is a far better contribution than coming up with
colorful analogies of genetics, evolution, swarms etc without characterizing
precisely what it is that makes it work and when. Demystify the problem and
the technique rather than mystify and please compare running times and
resources with more traditional optimization.

Again, I am not saying that GA, or swarm optimization methods are wrong to
use. There are indeed cases where it is exactly the correct thing to use.
Consider gene/dna assembly, here GA is indeed the right thing to use. GA works
well when the optimal solution looks like assembly of small good solutions.
Swarm works well when the optimization function looks like a smooth well
behaved function overlayed with a high frequency fluctuation. I wish this is
what the community focused on.

~~~
toehead2000
I was going to post a comment with similar points, but instead Ill just add to
srean's ideas.

If nothing is known about the objective function, no optimization algorithm
can possibly be said to be better than any other. This is the no free lunch
theorem.

[http://en.wikipedia.org/wiki/No_free_lunch_theorem](http://en.wikipedia.org/wiki/No_free_lunch_theorem)

So there is absolutely no sense in talking about optimization algorithms in
isolation from the problems they're meant to solve. An intelligent appraisal
of genetic algorithms would talk about the types of objective functions they
seem to be able to find good answers for.

~~~
sampo
> _If nothing is known about the objective function_

What that "nothing is known" means, in the context of NFL theorems, is that
values at each point are random, and independent of values at other points.

For example, if our domain is a 100-by-100 grid, and we fill the grid with
random numbers, now we have such a function. And indeed, no optimization
function is of any help in trying to find the largest of the 10 000 random
numbers.

But this kind of totally random functions pretty much do not exist in the real
world. So if out objective function came from the real application (and not
from a random number generator), we already know enough about it that we can
say that the NFL theorems don't apply to it.

~~~
srean
Sure, no particular instance of a function will be so utterly random, but I
think you are probably missing the point of the parent comment. All that NFL
says is that if you fix an optimization algorithm and _average_ its
performance over all possible measurable functions, no algorithm will be
better than the other. The average is with respect to the distribution you
described. However, if you skew the distribution so that certain class of
functions are more likely than others, then an ordering will be imposed over
some of the algorithms. Some algorithms will be better than others.

Therefore, trying to design a fully generic optimization algorithm is a fool's
errand. On the contrary, a specific objective function will have specific
properties and in that case it would be valuable to use specific algorithms
suited to its properties. To follow the argument forward, it would be very
useful if it was characterized what is exactly the class of objective
functions that EA, GA, swarms, ant colon algorithms handle well. Under what
assumptions is their specific randomization and parallel evaluation the right
thing to do. These are resource intensive procedures, so such a
characterization will tell us when is that effort well spent.

One uncharitable but plausible way to read the tail end of your comment is
that just because a particular function is not random, it will violate NFL and
magically make EA, GA style algorithms appropriate, that is not true. Glad
that you were able to reword it before the edit window closed.

Edit: replying here because this thread is becoming nested too deep.

Hi @Dn_Ab I think we may have talked passed each other, so clarifying. Of
course a bias / preference will naturally get induced over algorithms when you
select objective functions non-uniformly. No magic there. But that is not
going to make a particular choice of an algorithm (in this case EA, GA et al)
magically appropriate for whatever specific nonuniform distribution over the
objective function chosen. The choice either has to be deliberate (in which
case we would need to know a measurable description of the class where these
algorithms work better) or one has wait to get wildly lucky, the latter is
about as productive as playing lottery except that the tickets are pretty
expensive when we play EA, GA etc.

@sampo > have very specific mathematically defined meanings here, and those
meaning are probably very different than what a casual reader might expect.

Good point, upvoted, now I understand your previous comment better.

@sampo > I also like your "snake oil" metaphor, and I was happy to see your
top comment on this topic.

I expected it to be downvoted out of existence given a few snarks that I
yielded to.

~~~
sampo
> _All that NFL says is that if you fix an optimization algorithm and average
> its performance over all possible measurable functions, no algorithm will be
> better than the other._

> _Therefore, trying to design a fully generic optimization algorithm is a
> fool 's errand._

I don't think we have any factual disagreement. In pure mathematical context,
what you say is absolutely true.

I just wish to emphasize, that "all possible measurable functions" and "fully
generic optimization algorithm" have very specific mathematical meanings here,
and those meaning are probably different than what a casual reader might
expect.

In a set of "all possible functions" an overwhelming majority are functions
that are indistinguishable from random noise. They are e.g. not continuous,
not even remotely like continuous, they have no structure at all whatsoever.

On an intuitive level, everybody understands that it is a fool's errand to try
to design an optimization algorithm to find the maximum from an array of
random numbers. But when you just say "trying to design a fully generic
optimization algorithm", people may not realize that the "fully generic"
contains the requirement that it should also work on random noise.

The No Free Lunch Theorem does not say anything about the feasibility of fully
generic optimization algorithms, if we restrict the "fully generic" to mean
anything that can be expected to appear in any real world application. A lot
of people might agree that an algorithm that performs well in any real world
problem is "fully generic", even though it may not perform well in all
imaginable abstract mathematical settings (which mostly means settings of
random noise).

To your last comment: I also agree with you on this front. Also my
understanding is that for every or almost every type of problem there are
other optimization algorithms that drastically outperform genetic algorithms.
I also like your "snake oil" metaphor, and I was happy to see your top comment
on this topic.

------
Cieplak
This is a great read (handbook of neuroevolution through erlang):
[http://www.cabafx.com/trading-ebooks-
collection/newpdf/Handb...](http://www.cabafx.com/trading-ebooks-
collection/newpdf/Handbook%20of%20Neuroevolution%20Through%20Erlang.pdf)

I highly recommend the hard copy

------
lalos
Good and short intro to genetic algorithms by coding a tetris ai player
[http://luckytoilet.wordpress.com/2011/05/27/coding-a-
tetris-...](http://luckytoilet.wordpress.com/2011/05/27/coding-a-tetris-ai-
using-a-genetic-algorithm/)

------
kriro
I think the best use of GAs is sort of rapid prototyping. Throw them at the
problem to get a better feel for the search space then think about it deeper
and find the best algorithm you can come up with.

With the appropriate disclaimers that there's usually better algorithms I
think they are pretty valuable from a pedagogical point of view (actually I'm
thinking more of nature inspired algorithms in general not just GAs). It helps
if you can relate an algorithm to something in nature because most algorithms
are very abstract and hard to grasp. I also think nature inspired algorithms
can help people think about problem solving in creative ways (just wander
through your garden and see if you can replicate some of the stuff in
algorithmic form etc.)

------
has2k1
I like the implied statistical wisdom, i.e. If you fail often, then help us
find where we are failing. Rather than, if your passing solutions are
brilliant then help us find passing solutions.

An aside, for things like unit tests where in most cases the result of the
test is binary i.e fail/pass. How do you think about and ultimately represent
the fitness criteria.

------
turnersr
This line of thought has a long research history. Check it out at
[http://scholar.google.com/scholar?q=fuzzing+machine+learning...](http://scholar.google.com/scholar?q=fuzzing+machine+learning+vulnerabilities)
.

