I was happy to see that you actually were doing genetic algorithms (usually people incorrectly refer to simpler evolutionary algorithms as "genetic", which is a pet peeve of mine).
A few points of feedback though:
Typically, what you are calling "genes" are called "chromosomes". Calling them "genes" is confusing because genes typically refer to smaller hereditary units (generally an individual bit on the chromosome). It would also be helpful if you mentioned that what you call "mating" is usually called "crossover".
Also this bit is rather inaccurate in terms of theory:
>Mating alone has a little bit of a problem: in-breeding. If all you do is mate your candidates to go from generation to generation, you run the risk of getting stuck near a “local optimum”: an answer that’s pretty good but not necessarily the “global optimum” (the best you can hope for).
You're sort of looking at all of GA backwards here. Mutation is the main way that a GA arrives at new innovations, and an algorithm with no crossover will do usually pretty well, but will be prone to getting stuck in local optima. This is because it is possible to arrive at a solution where no incremental change can improve the fitness.
Crossover is somewhat helpful against local optima because it allows the algorithm to try combinations of innovations from different solutions. Depending on the fitness landscape, this can be either helpful or useless. Regardless, a GA will, on any non-trivial problem, almost certainly get stuck on a local optimum, which is a very important point for users of GAs to realize.
But what is not the case is that mutation prevents crossover from hitting a local optimum. Crossover by itself basically just won't optimize at all. It's generally best to think of a GA as being driven primarily by mutation and selection, with crossover as a tweak that can make the search more efficient.
> ...an algorithm with no crossover will do usually pretty well, but will be prone to getting stuck in local optima. This is because it is possible to arrive at a solution where no incremental change can improve the fitness.
False. There is no theoretical or comprehensive empirical study I've ever seen to suggest this is the case. At best, in some scenarios (e.g., multiplex, ordered subset selection) crossover will help speed up the search because it is effectively assuming it's separable, meaning different parts of the problem can be optimized independently. So if you're trying to get all 1s in a bit string, you could imagine separating that into two parts: maximizing the first and second halves of the bit string, then combining the best individuals you would get an optimal answer to the original problem. This is why, for instance, that crossover rarely helps in evolving neural networks; their weights' non-separable nature makes it difficult to select a meaningful subset of the network.
> Crossover is somewhat helpful against local optima because it allows the algorithm to try combinations of innovations from different solutions. Depending on the fitness landscape, this can be either helpful or useless. Regardless, a GA will, on any non-trivial problem, almost certainly get stuck on a local optimum, which is a very important point for users of GAs to realize.
Also false. I'll especially take note that this is not the case if you're using a less-than-naive EA. One great example is CMA-ES, which is particularly well suited for handling non-convex, deceptive optimization problems (and uses no crossover).
> But what is not the case is that mutation prevents crossover from hitting a local optimum. Crossover by itself basically just won't optimize at all
Again, not true. Consider multiplex. You start with a bunch of random bit strings. IF you have 1s in all indexes somewhere in the population, it's possible (though unlikely) that the algorithm will find the optimal answer. The issue that arises is when you select against the last 1 bit in a certain index in the population; it's now lost forever and thus the GA will at best reach a local optima (optima in the sense of it's the best it can now do with future populations).
>False. There is no theoretical or comprehensive empirical study I've ever seen to suggest this is the case.
It is absolutely the case. What is not established is whether or not crossover will actually help. My main point was that mutation is not there to escape local optima, although I should have been clearer about the efficacy of crossover for that purpose.
>Also false. I'll especially take note that this is not the case if you're using a less-than-naive EA.
Yes, for specific classes of problem, you can converge on a global optimum. In the general case, there is no way to ensure that you will get to a global optimum.
>You start with a bunch of random bit strings. IF you have 1s in all indexes somewhere in the population, it's possible (though unlikely) that the algorithm will find the optimal answer.
Yes that's why I said it basically won't optimize at all. Random guess and check might arrive at an optimum answer too, but that is only "optimizing" in a very pedantic sense.
> My main point was that mutation is not there to escape local optima, although I should have been clearer about the efficacy of crossover for that purpose.
Mutation is absolutely there for the primary purpose of escaping local optima. If mutation's job was only to climb the gradient, then you would just use a much more efficient gradient ascent method. By having random mutation, you are effectively saying you want to stay in the known-good region most of the time, but occasionally explore a new area even if it goes against the perceived gradient.
Bingo. So many genetic/evolutionary algorithms assume you need a large population with crossover to avoid these supposed 'local minima'.
Very often, the most efficient evolutionary search algorithm involves a population of one (or two, depending how you count them): generate a mutation, compare it to the current best. If it's worse, throw it away; if it's better or the same, keep it and throw the original away. Rinse and repeat.
Search for "neutral networks" for more information about research into determining the "shape" of the fitness landscape. (Unfortunately, Google throws in plenty of search results about neural networks and network neutrality).
There's a bit of goofiness about crossover in this thread.
So far as I know, since its inception in the 1970s, crossover has never been thought of as a mechanism for "avoiding local minima". Indeed, the early trend of crossover without mutation was what resulted in studies in so-called "premature convergence," where the population would converge to a single genome and be permanently locked in a local optimum.
Rather, crossover is a procedure for spreading good subsolutions (so-called "building blocks") rapidly through the population. This is effectively projecting the search space into many smaller spaces, so has the potential of searching much faster if -- and this is a big if -- the problem in question has very low dependance among its parameters. In the GA world, this is known as parameter "linkage" or "epistasis". Other techniques do something similar: for example, EDAs or coevolution or ant colony optimization all perform a related projection and likewise make a related assumption of low dependance.
> Very often, the most efficient evolutionary search algorithm involves a population of one (or two, depending how you count them): generate a mutation, compare it to the current best.
Not really true. There are lots of EAs which are touted as "the most efficient". A few, like CMA-ES, are by and large mutation-only algorithms. But others, like cooperative coevolution or hBOA (and other EDAs), are essentially extremes in crossover-like mixing.
> how is cooperative coevolution a crossover-dependent approach?
It's not. What I meant was that CCEAs and univariate EDAs (and most forms of ACO) make the same basic linkage assumption inherent in crossover, indeed to even more of an extreme: that you can assemble individuals from bits and pieces scattered about the space without any dependence issues.
Very interesting tutorial! I've had exposure to GAs before, but not in the context of machine learning and so it's interesting to see how my experience can apply to that area.
I had a course on design optimisation in the last year of my degree and I'm constantly surprised by just how relevant it is to a lot of different fields. It really is worth familiarising yourself with optimisation techniques if this kind of stuff interests you, because they can be applied to a very broad range of problems.
This is wonderful! Exactly what I was looking for, a practical introduction to machine learning using real-world code. Thank you so much!
I only have one request: Instead of switching to PHP for the next GA exercise, can you please keep everything in Javascript? Just because JS is a more widely used and general-purpose language, which means those of us who don't use PHP can still enjoy it :) Thank you!
I always liked visual simulations of genetic algorithms. Makes it really easy to explain to people. My favorite is this one - http://www.qubit.devisland.net/ga/ It tries to create a car, which must navigate a 2d terrain and carry a couple of weights.
Great article, it's not easy to develop a new "easy to understand" framework for GA, but we try recently in scala, using the great cake pattern to help user to define their algorithm : https://github.com/romainreuillon/mgo
I would love to solicit suggestions for other ML topics and languages to cover as part of this series (please don't suggest "SVMs in brainfuck"). Let me know what you think!
I really liked the way you have presented the topic in the article.
One suggestion I have is to just stick with one language instead of using multiple languages in the series. I understand that you are using different languages to prove the point that ML can be implemented in any language, but when you switch the language between articles, it might get a little awkward to follow. I would suggest teaching in one language which you think can better represent the logic and then provide a link to a github repo or something which has the same code in different languages.
I started a little genetic algorithm project that I haven't gotten to work on too much yet, which could make for a cool tutorial. I've got a GA that solves the traveling salesman problem in the Go programming language. It actually solves it in N isolated populations, where you tell it to use N go routines. (Optimally I believe N would be 2x the number of cores your machine has).
Since Go is a systems language, I have it running as a http server that sends back the results to webpage that renders the JSON on a HTML5 Canvas.
What I really wanted to do is experiment with "migrating" solutions from one population to another, to see if I can get any speedup that way. Right now, the isolated populations sort of converge at the same speed, but some do better than others, because they converge to different local minima.
If you're going that route, I suggest instead looking at Grammatical Evolution, which is much more popular and does a very closely related thing. Or heck, stick with plain Genetic Programming (GP).
It'd only be a toy though. Because of the high number of evaluations used by these kinds of methods, and the cost of a single evaluation, most serious work is done in engines based on faster languages, notably Java, C, or, or C++.
A few points of feedback though:
Typically, what you are calling "genes" are called "chromosomes". Calling them "genes" is confusing because genes typically refer to smaller hereditary units (generally an individual bit on the chromosome). It would also be helpful if you mentioned that what you call "mating" is usually called "crossover".
Also this bit is rather inaccurate in terms of theory:
>Mating alone has a little bit of a problem: in-breeding. If all you do is mate your candidates to go from generation to generation, you run the risk of getting stuck near a “local optimum”: an answer that’s pretty good but not necessarily the “global optimum” (the best you can hope for).
You're sort of looking at all of GA backwards here. Mutation is the main way that a GA arrives at new innovations, and an algorithm with no crossover will do usually pretty well, but will be prone to getting stuck in local optima. This is because it is possible to arrive at a solution where no incremental change can improve the fitness.
Crossover is somewhat helpful against local optima because it allows the algorithm to try combinations of innovations from different solutions. Depending on the fitness landscape, this can be either helpful or useless. Regardless, a GA will, on any non-trivial problem, almost certainly get stuck on a local optimum, which is a very important point for users of GAs to realize.
But what is not the case is that mutation prevents crossover from hitting a local optimum. Crossover by itself basically just won't optimize at all. It's generally best to think of a GA as being driven primarily by mutation and selection, with crossover as a tweak that can make the search more efficient.