Evolutionary Algorithm: Evolving "Hello, World"

user24 · on Sept 28, 2011

I did the same in Javascript a while ago with a genetic algorithm. http://www.puremango.co.uk/2010/12/genetic-algorithm-for-hel... (HN discussion: http://news.ycombinator.com/item?id=2002673)

Again just evolving a string not a full program.

Mine hit the target significantly faster than OPs (around 70 generations rather than between 2 and 4 thousand).

I'm trying to work out where he went wrong.

First of all his 'mutate' function in the second example is actually performing both mutation and crossover.

As far as the mutation goes it looks fine; almost identical to the way I mutate genes. I think the problem lies in the way he's doing his crossover (breeding), just trying to work it out.

He's using one-point crossover, same as me.

The selection method is a kind of hybrid elite method where fitter members are more likely to be selected than unfit members, which is reasonable.

It might just be that his crossover and mutation probability is 100% and his population is only 20. But if you plug those figures into the interactive demo on my post (http://www.puremango.co.uk/genetic-hello-world.html) then it gets as far as "Hbumr Xqnkd" by the 70th generation. Still looks like it's evolving faster than his code. Must be the way he's selecting parents...

pgroves · on Sept 28, 2011

I would agree. With a population size of only 20 the algorithm is closer to being hill-climbing with random restart. Crossover can't do much if there isn't a lot of diversity in the population, and a small population can't be very diverse.

But it still worked.

maercsrats · on Sept 28, 2011

Right, I was thinking it was more of a random walk algorithm: http://en.wikipedia.org/wiki/Random_walk

There was too much randomness in his evolving. When I was evolving agents for my M.S., I used tournament selection (http://en.wikipedia.org/wiki/Tournament_selection) with an 80% crossover rate for the top 2 winners and only a 5% mutation rate on the offspring. I also started off with a population of 500 individuals. This approach seemed to produce much better results.

bermanoid · on Sept 29, 2011

There was too much randomness in his evolving. When I was evolving agents for my M.S., I used tournament selection (http://en.wikipedia.org/wiki/Tournament_selection) with an 80% crossover rate for the top 2 winners and only a 5% mutation rate on the offspring. I also started off with a population of 500 individuals. This approach seemed to produce much better results.

I don't know about that - with a population size of 500, every generation is very expensive to test, so your runway is a lot shorter.

I set up a test using your parameters, and it seemed to take an average of about 50 generations to evolve "Hello, World". That's a total of 50x500=25000 fitness evaluations (assuming you've cached results properly) to solve the problem. Whereas with a population size of 20, 100% crossover and mutation rates, the average is about 150 gens, for 150x20 = 3000 evaluations, beating your parameter set by a factor of 8.

Which doesn't mean much, other than that this is not such a good problem to use evolutionary methods on.

user24 · on Sept 28, 2011

Yeah, I was taught roulette wheel at university but I think that tournament is a much better way of doing selection.

smoyer · on Sept 28, 2011

And ... throwing away the least fit every time can lead you to climb the hill to a local maxima. In this case, the only way to leave a local maxima is to get lucky with a mutation, but if you cull the population randomly with a log probability that more fit parents will be kept, you could reach the goal in fewer generations.

You could even eliminate mutations altogether if you have the proper character in each position in at least one member of the starting population.

bermanoid · on Sept 29, 2011

I just ran some tests, and actually, mutation and selection probabilities of near 100% are optimal, as is a very small population size, at least if you use the mutation and selection policies as given in this article (multiply two numbers on 0->1, multiply by population size, floor the result and select the candidate out of a fitness sorted array that has that index). When I say "optimal", you have to be sure to evaluate the population size multiplied by the number of generations to estimate how well the algorithm is doing, because a bigger population takes more work to evaluate even if it requires fewer gens.

Honestly, I'm not sure why the OP's algorithm seems to take so long - with population of 20, and both crossover and mutation set at 100%, I'm seeing an average of about 190 generations needed to evolve a perfect "Hello, World" when started randomly (character codes between 32 and 127). This should be exactly the same setup as the OP has, though I haven't checked his code against mine in serious detail (mine is in JS). I just wrote my code up in a couple hours, so it's quite possible there's a mistake, but it seems unlikely that I coded a bug that made it perform better than it should...

One difference I note is that the way he does crossover is actually two point crossover, in that he selects a start and end point randomly from a uniform distribution over the length. That's a little odd, but probably not a major problem: even when I disable crossover altogether in my code, and roll with mutation-only, I still end up with an average of 342 gens to reach the target, which is an order of magnitude under the 3100 that the OP claims it took for him.

It's worth noting that the OP's first, "naive" algorithm, with a single member in the population, performed at roughly the same number of generations as the one with pop size of 20, mutation, selection, and crossover, according to his report. That's disturbing...I checked, and in my code the "naive" algo (mutate, then either accept or reject the mutation based on whether it's beneficial or not) performs at somewhere around 3000 generations to completion, which makes me suspicious that somewhere in OP's code there's a bug that's somehow missing out on the benefits (measured by gen count - see below) of having a higher population and crossover.

FWIW, for this problem, at least the way the OP set it up, the "naive" algorithm is actually a very good way to go - when I increase the population size to 20, and set the mutation/selection/crossover policies OP used, I find that the average number of fitness checks required to hit "Hello, World" (about 3510) is actually higher than the number in the naive version (in the neighborhood of 3k, usually a bit under). Also, the real time taken is larger. Which means that adding "genetic" to the algorithm has actually hurt us...

In fact, even with my full GA codebase in hand (not a substantial one, I wrote it in response to this post, but it's more flexible than the OP's), I couldn't find any situation where having a population size more than a few members helped - single member mutation (which is accepted/rejected if better/worse) always won. This is a good indication that this type of problem is vastly better suited to gradient descent than it is to a genetic algorithm.

Of course, the reason for this is that the fitness landscape is very smooth, and the local gradient very accurately tells you where to go - an decent hill climber that knew how to work with integer parameters could solve this in probably ten or fifteen steps, max.

Which reminds me, I keep meaning to write up an article on how to pick out problems where GAs are actually of some use. There are a few, but you almost never see them, and almost every demo I've seen on HN over the past few years would be better solved by gradient descent.

juretriglav · on Sept 28, 2011

Evolutionary algorithms are fun. I solved a laser ablation modeling problem with them. The idea was to improve the quality of the produced data, by applying what knowledge of the process we had.

Laser ablation measurements work something like this: Sample -> Ablation -> Measurement -> Data

Since there are several distortions that happen in the middle two steps, you of course get a distorted measurement. You'd like to get the Sample from the Data.

Since we know pretty well what happens in the middle two steps, I evolved my theoretical Sample by starting with random data, running it through the model of the middle two steps (ablation and measurement) and then comparing the modeled random data to the experimental data. The fitness function was exactly this difference between experimental and modeled data.

I took the best ones of each generation and crossed them over. I then added some mutations and watched the magic happen. It ended up taking most of the night for a very simple set (our experimental sets are mostly orders of magnitude bigger). I took screenshots of every new generation, and made a movie of the whole process. When I played the movie in the morning I had one of those rare and special Eureka! moments, when all that work pays off and then some.

I presented that movie at a conference and stole the show :)

Long story short, evolutionary algorithms are fun, but I failed to apply them at the required scale.

andrewcooke · on Sept 28, 2011

if anyone's interested, i evolved a program in malbolge (an esoteric language - http://www.lscheffer.com/malbolge.shtml) that prints "HEllO WORld" - see http://acooke.org/malbolge.html

this was years ago, the work was done in lisp, and not long afterwards someone sent me a better solution that they had done by hand...

cpeterso · on Sept 28, 2011

You are too modest. Wikipedia says you wrote (or generated) the first Malbolge program:

Malbolge was so difficult to understand when it arrived (in 1998) that it took two years for the first Malbolge program to appear. The program was not even written by a human being: it was generated by a beam search algorithm designed by Andrew Cooke.

https://secure.wikimedia.org/wikipedia/en/wiki/Malbolge

jcmoscon · on Sept 28, 2011

I bet you put a lot of effort to program this, eh? I think we can say that this is an argument in favor of intelligent design.

paisawalla · on Sept 28, 2011

If you look closely, you'll notice that for each character, I square the difference. This is to to put extra emphasis on larger differences. If we don't do this, the string "Hannp" would have a fitness of 0. You see, the difference between 'e' and 'a' is -5, between 'l' and 'n' is +2 (which we have twice) and between 'o' and 'p' is +1. Adding these up yields a fitness of 0, but it's not the string we want at all.

Squaring is a dumb way to avoid that problem. You can still have false positives: for example if a string differs from the target by two letters, who distances are +1 and -1, then squaring will not mitigate the issue.

EDIT: I realize now that -1^2 is a positive number...

I'm very surprised that the author doesn't know about the Levenshtein distance metric http://en.wikipedia.org/wiki/Levenshtein_distance

That seems like it would be the perfect utility function for this case.

sparky · on Sept 28, 2011

Err.. I would argue that (+1)^2+(-1)^2 would not yield a false positive.

You are right that the explanation is incomplete, though. The squaring provides an 'absolute value' effect as well as disproportionately weighting larger alphabetical distances.

mahcuz · on Sept 29, 2011

Regarding the squaring - would it be enough to take the absolute value of the distance? (abs -5) + (abs 5) = 10.

paisawalla · on Sept 29, 2011

It's still a problematic way to quantify distance between strings. It says that "Rello World" is as close to the target as "Ifmkn Vpqkc"

Levenshtein would quantify those differently, and I'd argue, more correctly. It would say the first is much closer than the second.

mason55 · on Sept 28, 2011

Effectively, we square each difference so that they can't cancel each other out.

He seems to sort of confuse himself here, at first he says the squaring is so that it emphasizes bigger differences, then he says it's to make positive & negative cancel each other out. I think that the second part is just a nice bonus to the first part, but I don't get why he doesn't just use the absolute value of the distance anyway. "Hellp" is no more or less fit than "Helln".

The crossover and mutate are sort of mashed together and could be made much more robust. For example, giving each gene a random chance to mutate would do a much better job at keeping the populations from evolving into one big similar mass.

This is a good start but leaves a lot to be done for any sort of real implementation of a genetic algorithm.

tel · on Sept 28, 2011

If we're going to talk loss functions, then 1-0 loss is most appropriate here. As you say, "Hellp" is really no more or less fit than "Helln" whereas even absolute distance wouldn't reflect that.

Squaring admits a function into the class of loss functions which must be strictly non-negative to be meaningful, but really it's just always been done to make derivatives easy. If you don't need derivatives (or expectations), you can probably get away without using squared loss if you don't want it.

TomNomNom · on Sept 28, 2011

I wrote something similar with a slightly longer string in PHP a while back.

Read: http://www.tomnomnom.com/posts/methinks_it_is_like_an_incest...

Code: https://github.com/TomNomNom/PHP-Evolution-Sim

tariqk · on Sept 28, 2011

That's a pretty interesting look into what happens in an evolutionary algorithm.

But for a minute there I was hoping that the OP was going to evolve an actual "hello world" program. That might be an interesting exercise to do over a Lisp-like language.

tomjen3 · on Sept 28, 2011

Somebody actually did that, in Lisp, to be able to write a hello world program in a completely fucked out language.

The problem with just generating a hello world program is that there is no guarantee that you wont end up with a bunch of endless loops.

rmc · on Sept 28, 2011

A lot of programmes have a regular grammer & parsers for their language. You can then use parsers and the like to turn a tree into a source code (and vice versa) and evolve programmes.

user24 · on Sept 28, 2011

yeah I think genetic programming is often done with Lisp.

dmoney · on Sept 28, 2011

I don't see a reason to write the genetic algorithm in Lisp, but for the code that's being evolved it seems like the logical choice due to its minimal syntax. I'm working on a PHP program for evolving programs, and the evolved programs are in a lisp-like dialect made for just that purpose.

user24 · on Sept 28, 2011

IIRC it's something to do with the fact that lisp can easily be represented as a tree and so crossover is very simple; just swap branches. I don't know lisp so I'm just regurgitating info I picked up at university.

kruhft · on Sept 28, 2011

One problem with using that technique is that it's has been patented, along with derivatives of it, for quite a while.

perone · on Sept 28, 2011

Using Python Pyevolve framework: http://pyevolve.sourceforge.net/0_6rc1/examples.html#example...

koryk · on Sept 28, 2011

I came across this example before when I was writing a small genetic algorithm library in js. https://github.com/koryk/evolvejs

I wrote a few examples as well. tic-tac-toe: http://www.korykirk.com/evolvejs/examples/tic-tac-toe/tic-ta...

circle: http://www.korykirk.com/evolvejs/examples/circle/circlega.ht...

kailashbuki · on Sept 28, 2011

For absolute beginners, here's a simple example i worked on about a year ago ... http://www.kailashbudhathoki.com.np/2010/07/genetic-algorith...

piaskal · on Sept 28, 2011

This is a nice example of simple evolutionary algorithm. Next steps could be separating crossover and mutate operations, selecting more parent pairs for crossover and applying mutation in a more random fashion (not with every crossover).

bekirdag · on Sept 29, 2011

here is the same algorithm in php http://www.bekirdag.com/en/php_mysql/a_hello_world_genetic_a...

maeon3 · on Sept 28, 2011

I want a program that will take a set of unit tests, and create a program that takes a piece of code and evolves it until all those tests pass.

And I want to be able to guide its evolution, prompt it to use certain technologies in certain places.

jacques_chester · on Sept 28, 2011

> I want a program that will take a set of unit tests, and create a program that takes a piece of code and evolves it until all those tests pass.

Look into Genetic Programming (as opposed to Genetic Algorithms, Evolutionary Programming or Evolutionary strategies, three other fields that arose independently).

In GP, the genotype being evolved is a program, usually represented as either a tree of instructions or a string of assembler instructions. There are other schemes but these are the major two.

However, there's a fly in the ointment. Averaged over every possible computable problem, no one algorithm is the "best". It's called the "no free lunch theorem". The upshot is that with some clever encoding of domain knowledge into the genotypical space, you can often produce systems that outperform human-designed systems. But you cannot start from an absolutely generic genotype and necessarily arrive at the 'best' algorithm, you will always need some degree of manual intervention. Don't ask me to explain it better, I can't because I don't follow the maths.

GP has lots of interesting nooks and crannies, I refer you to A Field Guide to Genetic Programming for a fairly readable introduction: http://www.gp-field-guide.org.uk/

udnay · on Sept 28, 2011

I did something like this for my Master's, I evolved Java objects using genetic programming based on a set of unit tests. Managed to evolve a simple bank account example and a reverse polish notation class. Here's a link to the paper http://goo.gl/cgGWM

xtracto · on Sept 29, 2011

Interesting work, the problem I see with using standard "unit tests" as fitness functions to evolve objects (or their functions) is that the resulting program may only compute the correct output for the defined test cases:

Say for example we want a function that adds two integers, we create a test with:

    int x = 0; int y=0;
    for (int x=0,y=0;x<10;x++,y++){
        int result = sumIntegers(x,y);
        assert (result== x+y);
        writeFitness(x,y,(result-x-y)/(x+y));
    }

A correct function (with fitness == 0) can be:

   int sumIntegers(int x, int y) {
       if (x == 1 and y == 1) return 2;
       if (x == 2 and y == 2) return 4;
       if (x == 3 and y == 3) return 6;
       ...
   }

Which is not semantically what we wanted.

udnay · on Oct 3, 2011

True, but that's true of most data mining and search algorithms. You are in danger of over-fitting the sample data. You can take steps to avoid this by giving more consideration to correct solutions that are shorter, in the hope that brevity avoids an exhaustive case statement.

I don't want to come off as if I disagree with you, the evolved programs will only be as good/complete as the unit tests they are evolved against.

snippyhollow · on Sept 28, 2011

That has to be related somehow to mutation testing. They used that at my engineering school to test our (Java-subset) compilers, along with a lot of unit (or not) tests.

http://en.wikipedia.org/wiki/Mutation_testing

bermanoid · on Sept 29, 2011

Actually, the fact that mutation testing is useful is a sign of one of the major difficulties with automatic algorithm generation: a tiny change in a program typically destroys a piece of functionality completely, which means it's hard to follow any sort of fitness landscape to incrementally create a working program that passes all your tests. When writing an algorithm you're looking for a needle in a haystack, not tweaking parameters to find the peak of a broad mountain, and that takes a lot of the punch out of automated methods.

What this means, in part, is that brute-force search will often be just as effective as evolutionary or hill climbing methods. Where evolutionary methods start to shine is when you allow them to reuse bits of code that have already been found to be effective, esp. in other problems - that corresponds a lot more closely to the way that humans write code, we lean on our memories of the types of things that have worked before, for everything from syntax to algorithm design to system architecture.

unignorant · on Sept 28, 2011

My old research group at UVa worked on (almost) exactly that problem. See:

http://www.cs.virginia.edu/~weimer/p/p109-weimer.pdf

nydev · on Sept 28, 2011

Sure..and put the rest of us out of business? : ) Although it is interesting to think about what the make-up of the genetic material would be. Just a string of 0s and 1s? CPU opcodes? Or maybe some higher level sub-units.

jacques_chester · on Sept 28, 2011

> Just a string of 0s and 1s? CPU opcodes?

These have been tried. Historically Genetic Algorithms have represented their genotypes as bit strings. Some Genetic Programming systems use CPU instruction strings (pros: fast as heck, cons: can produce indecipherable spaghetti code that breaks when you remove those 400 NOOPs because it's adapted to the cache size of this particular CPU).

> Or maybe some higher level sub-units.

These too. Sometimes as trees of instructions, dataflow diagrams, FSMs and so on.

nydev · on Sept 28, 2011

Very interesting.."indecipherable spaghetti code that breaks when you remove those 400 NOOPs because it's adapted to the cache size of this particular CPU" - sounds a little like our "junk" DNA (though I suspect the analogy breaks down in the details.)

jacques_chester · on Sept 28, 2011

It can get pretty Rube Goldbergesque, apparently. The main rule of thumb is that if you're doing GP with CPU strings as your representation, beware of including run time in the fitness function. Because you will almost inevitably get over adaptation that breaks on any other platform.

Rusky · on Sept 28, 2011

GP would just turn all the programmers to digital shepherds :P

It would be a really interesting challenge to get broad enough coverage in the fitness function that things like the previously-mentioned "400 NOPS" don't cause problems.

Besides unit (or otherwise) tests, it should also be possible to use static analysis to measure fitness and/or use a static type system to restrict the possibilities. I'd think this would have some of the same problems, though- anybody know of any work on that?

jacques_chester · on Sept 29, 2011

> It would be a really interesting challenge to get broad enough coverage in the fitness function that things like the previously-mentioned "400 NOPS" don't cause problems.

All the branches of evolutionary computing agree that evolved systems are really good at finding flaws in your problem statements. Most of the evolved systems you see in production are not the product of the first attempt to evolve the system.

> Besides unit (or otherwise) tests, it should also be possible to use static analysis to measure fitness and/or use a static type system to restrict the possibilities. I'd think this would have some of the same problems, though- anybody know of any work on that?

Type consistency for GPs is a whole subfield in itself. See the textbook I referred to, it has a chapter on it from memory.

bermanoid · on Sept 28, 2011

For the longest time now I've been meaning to work on this, what I envisioned was to essentially a Javascript preprocessor that allowed you to insert placeholder tags that said "figure this chunk of code, with these constraints: ..., these suggestions: ..., these requirements: ..., etc.", at least to start.

The real problem with designing a system like that is not to get a GP framework running [1], that's a couple days of work at most. The difficulty is to set up a constraints and specification system that works well and is convenient, which means that it needs to be very flexible in terms of defining the spec, yet push the user to define everything carefully enough so that it can actually make progress towards a solution. Without a push in that direction it's just too easy to write crappy unit tests that either don't test enough of the input space to pick out the right implementation, or don't provide enough feedback to an optimizer to guide it towards a useful solution. This is a particularly tricky issue when it comes to GP, because programs that are close to right usually have outputs that are completely and utterly wrong, at least in my experience, so hill-climbing barely works at all when it comes to syntax trees (though it can be really useful to point a standard hill-climber at your numerical constants once a syntax tree is created, esp. in some types of mathy code). Writing code is closer to a needle-in-a-haystack problem, and it tends to confound evolutionary searches because the fitness feedback is not very helpful.

There's also always the problem of side effects, which can cause a lot of problems for GP apps, especially when you're dealing with external resources like databases or web services where you can't afford to just poke things randomly and see if they work (you might cause changes), but you need to actually touch them to develop because you rely on their responses to figure out if you're doing things right. So the user would always need to be providing additional layers of code to roll back or prevent changes in these resources, which could get messy.

A starting point for a project like this would probably be to look closely at something like Prolog, where you can do a bit of this sort of thing, at least for certain types of problem (though Prolog essentially does it only by brute force). Where I would go from there is probably to think about how to take to heart the idea of making the search strategy used at any point a core part of the interface, so that (for instance) you could either do a randomized search through programs with some bounded complexity (some combination of limits on token count, syntax tree depth, etc.), choose to run an evolutionary strategy with a set of tokens, optimize a set of constants to maximize a fitness metric, etc.

[1] I'd probably reject standard GP as the main approach - there are perfectly reasonable ways to automatically explore program space without using evolution (at least exclusively). I'd probably look toward some sort of hybrid search strategy, with a bit of random walk, a bit of brute force, and some statistical modeling of the domain (a la MOSES [http://code.google.com/p/moses/]) to inform the "evolution" part of it, as well as possibly leaning on a centralized database that automatically catalogued commonly useful code patterns and snippets.