

Baby Steps into Genetic Programming - aerique
http://aerique.blogspot.com/2011/01/baby-steps-into-genetic-programming.html

======
gnosis
_"As you will find out if you start experimenting with the code in this
article bloat is a big problem, as well as a successful form taking over the
entire population."_

A very simple way to fight bloat is to penalize an individual's fitness value
in proportion to how long it takes to find that individual's fitness value.

Then, if you bias selection such that fitter individuals are more likely to
survive, then the individuals that take a shorter time to evaluate should have
a slightly higher chance of surviving; and those individuals should be the
very individuals that are less bloated.

Of course, the problem is determining just how much to penalize their fitness
values. And, as in much else in evolutionary computation, that generally comes
down to simply doing a number of runs with different penalties and see how
they compare.

One down side to keep in mind when using any kind of penalty for bloat is that
some research has shown that unrestricted tree growth is often required for
solving larger and more complex problems. So there's a trade-off involved in
reducing bloat vs allowing better solutions to evolve.

Now, as to the second problem of maintaining diversity, that is a much tougher
nut to crack.

Two of the simplest techniques are using mass extinction events and separating
the larger population in to a number of smaller populations (called "demes",
"islands", or "subpopulations") and is generally referred to as "the island
model".

Mass extinctions are very simple. Just wipe out a large portion of the
population in a given number of generations (or when you've determined that
the population has become too inbred) and repopulate it with random
individuals. This will maintain diversity.

Of course, you usually don't want to wipe out your best individuals, so you'll
spare them. But that's where the problem comes in, as the best individuals
will likely quickly come to dominate the population again. But hopefully in
the mean time they'll breed with some of the new random individuals and the
population will get some new blood. And when the population starts to get too
inbred again, or a certain number of generations pass, there could be another
extinction event, etc..

In the island model, if the islands are kept totally separate and no
individuals from any island are allowed to breed with individuals from any
other islands, then this is obviously equivalent to simply running a number of
separate smaller populations at the same time. This is might be fine for
maintaining diversity, but no benefit will accrue from combining the solutions
in each of the subpopulations.

So, what's usually done is to let individuals from one subpopulation breed
with individuals from another subpopulation at given times during the run.
Again, if they are allowed to interbreed at every generation, then this will
be equivalent to just having one large population as usual. So you'll want to
keep individuals on any given island from breeding with individuals from other
islands for a number of generations. The longer you keep the subpopulations
from interbreeding, the more diverse they'll be (or so the theory goes). At
the same time, the longer you keep them from interbreeding, the slower the
convergence rate (the time it takes to reach a global optimum in the search
space) will be.

Unfortunately, there have been mixed results reported about the island model.
Some research has suggested that it's better to just use one very large
population. But on the other hand, there have also been successes reported
with the island model, and reports that smaller populations outperform larger
ones. So it's a mixed bag, and usually very problem dependent, as much of
evolutionary computation tends to be.

Finally, there are also many, many other much more sophisticated and effective
techniques for fighting bloat and maintaining diversity. But I'll leave
finding those as an exercise to the reader. Just a quick tip: type "genetic
programming" and "bloat" or "diversity" in to the form on this page:

<http://citeseer.ist.psu.edu/>

~~~
user24
re: diversity, wouldn't lack of diversity be a consequence of the selection
method? That's what I found with the genetic algorithm for hello world I wrote
(which I fully admit is decidedly less complicated than GP). When I simply
culled all candidates below a certain fitness, I quickly evolved a population
of clones. Likewise roulette-wheel still got the population converging on one
fit solution.

Whereas with tournament selection I found that the population remained
diverse.

Just my findings though, I'm no expert on GA and certainly not on GP.

~~~
gnosis
_"wouldn't lack of diversity be a consequence of the selection method?"_

Diversity of individuals in the population depends on a number of factors. The
selection method is one such factor. But another factor is what individuals
there are to select from in the first place.

Using the simple mass extinction event technique, you effectively limit the
pool of individuals that can be selected from to those few that survive the
extinction plus the new randomly created individuals that take the place of
those that went extinct.

I suppose this could be considered a special rare breeding phase that combines
an elitist selection method along with a 100% mutation of the remaining
population.

Some other factors that influences diversity are the crossover and mutation
operators that are used, along with the decisions concerning which parts of
the program trees are allowed to change and where they get the replacements
from.

For instance, if crossover were allowed only from or to certain parts of the
tree, that could decrease (or even increase) diversity compared to the
standard case of imposing no such restrictions during crossover.

Another possibility is performing crossover multiple times per generation on
the same individual. The more similar a given individual is to other
individuals in the population, the more crossovers could be performed on that
individual, hopefully increasing the diversity in the population. Multiple
mutation could similarly be employed.

Yet another factor that could affect diversity is simply the population size.
Clearly, a population of one individual is minimally diverse. As the size of
the population grows, so too (hopefully) does the diversity. There has been
some research in to dynamically sizing the population to increase or decrease
diversity as needed.

Something else that has been proposed for increasing diversity is segregating
highly fit individuals from very unfit individuals. This way the highly fit
individuals won't come to dominate the very unfit individuals, and thereby
hopefully preserve diversity. This could be done with an island model, and as
higher fit individuals on a certain island evolve, they can be transferred out
of the islands of lower fitness to other islands of higher fitness.

And there are probably dozens if not hundreds of other ways of trying to
influence diversity, as it is a critical aspect of genetic programming.

One last thing I should mention (maybe I should have mentioned it first) is
that a lot of this depends on exactly how you measure diversity (yet another
important field of research).

Is it enough to call a population diverse if every individual is different
from every other individual? Well, what if they are but they differ only in
ways that don't affect fitness? Or what if they do differ in ways that affect
fitness, but with only minor variations from individual to individual? And
what about structure differences? It's possible to have two individuals with
the same fitness value but different structures. And there are many other
measures as well.

So, depending on how you measure diversity (and when you do so, and to which
members of the population), you could come up with different approaches to
trying to influence it.

~~~
aerique
Thank for your extensive comments, gnosis. They're very elucidating.

------
tehwalrus
Koza's "gp.lisp" seems to be a prerequisite for getting any of this stuff
working, but I can only find dead links to it.

I tried copying and pasting the stuff in here: [<http://www.genetic-
programming.org/gplittlelisp.html>] into a file and loading it into clisp but
it throws a bunch of syntax errors.

anyone know where you can grab a working copy of it? (I have never used lisp
before, so I am probably not the best person to debug by hand...)

~~~
unoti
I bought this book a few years ago, by Koza. It has tons of working code in
it-- each problem he steps through building up a solution with a working
program:

[http://www.amazon.com/Genetic-Programming-Computers-
Selectio...](http://www.amazon.com/Genetic-Programming-Computers-Selection-
Adaptive/dp/0262111705)

This is one of my most prized computer books, perhaps _the_ most prized book I
have. Even though I wasn't a Lisper when I started reading this book, I ended
up building (without realizing it at first) a little Lisp inside Python for
exploring the concepts.

If you thought Koza's old website is inspiring, consider getting this book.
It's huge and expensive, but to me it feels like one of the 3 books I'd want
to bring with me to a desert island or through the destruction of
civilization.

~~~
aerique
You certainly can sell a book :-) I'll be getting it for my birthday in a
couple of months.

------
paraschopra
I had made a small library for Genetic Programming in Python. Have a look:
<http://paraschopra.com/sourcecode/GP/index.php>

The problem with genetic programming is that as soon as you start a
approaching a program of reasonable complexity, even small mutations start
wreaking havoc to whole program logic. If we make a biological analogy, it is
like evolving an elephant from an ape-like ancestor. Evolution in real world
isn't so drastic.

~~~
gnosis
_"Evolution in real world isn't so drastic."_

Remember that elephants (and all other life on this planet) evolved from
single-celled organisms (and even from non-life), which are far more different
from elephants than even apes are. So drastic evolutionary changes do happen
in the real world. But they just take a very long time.

The problem is that not only do we not have millions or even billions of years
to evolve our artificial organisms, but that the environments in which our
artificial organisms grow, their representation, and the operations performed
on them during their "lifetime" and during breeding are incredibly restricted
and simplified compared to what happens in the real world.

It's very difficult to model even a tiny fraction of that (see the
supercomputer time dedicated to creating a detailed model of even a single
neuron, for instance). And, as for the programs that are evolved via GP, it's
rare for them to be allowed to create completely arbitrary code. Their
instruction sets are usually very limited so as to reduce the search space.
They are also usually given quite limited data to work with, are modified very
little during their lifetimes (if at all), and the sorts of things that can
happen during breeding are similarly curtailed. So it's really no wonder that
their existence and evolution is not always so spectacular as what nature has
been able to achieve over the millenia.

Now, the problem you mention regarding the disruptive nature of mutation
(crossover can also have this effect) is also important. The attempts to solve
it have generally focused to finding ways to reduce this disruption (usually
by making the crossover or mutation less random). I recommend searching
Citeseer for "destructive crossover" to see examples of some proposed
solutions.

~~~
paraschopra
I meant that changes are not drastic in a single level (at least on phenotype
level). You don't expect progeny of an ape-like ancestor to give birth to an
elephant like organism. But that is what precisely happens with Genetic
Programming.

~~~
tgflynn
It sounds like maybe a speciation mechanism could help with that, ie. don't
let two individuals cross-over if they are too different by some measure.

------
anakanemison
John Koza is a name worth googling if you're interested in learning more about
Genetic Programming.

~~~
aerique
Ah yes, perhaps I should have mentioned him :-(

Some of his work has even created patentable inventions! (I'm on my phone and
cooking so no links.)

~~~
gnosis
Not only should Koza be mentioned because he's the inventor of genetic
programming, but also because his original implementation of GP was done in
Lisp!

Given that most later GP implementations have been done in other languages
(from C to Java to Python), it's good to finally see it come full circle back
to Lisp once more.

~~~
tgflynn
It seems like it should be much easier to do this in Lisp. In other languages
wouldn't you need to work on the AST instead of on the source code to have
much chance of generating syntactically correct programs ?

~~~
copper
Lee Spector has made a Forth(like) language, called PushGP, which, in my
opinion, handles syntactical correctness in a nice way. There's even a JS
implementation, which wasn't there the last time I checked.

<http://hampshire.edu/lspector/push.html>

