For anyone that’s interested here’s an article I published in 2008–it has about 175 citations.
I’d also recommend the work by Peter Eggenberger from the late 90s.
Isn't there a an analogue to this in software engineering?
There is an extremely large space of programs that produce the exactly same output on a given input, but we humans favor programs that we can read and understand. We also favor programs that are modular and composable, because bits of those programs tend to get reused to create new programs and are easier to write than new 'blob of code' programs.
One can view programs as memetic 'self-replicators', with a selection process dominated by the preferences of human readers and writers.
If we analyzed many dependency graphs between code modules, or a graph of function call usage between snippets, I bet you'd find that actual human-written programs have the same 'sparse' connectivity that you see, because maintainability implies writing code along sane API abstraction boundaries.
TL;DR evolutionary pressure might means nature naturally abstractions within genetic code
On the other if a cell is making X then the new process can easily leverage the existence of that X when doing something new.
Net result copy paste coding + dependency hell.
I suppose this is wrong since people still find these techniques useful, but what are the advantages of these techniques compared to gradient descent? (other than the fact that you don't need your fitness function to be differentiable)
Evolutionary algorithms work by sampling a combinatorial solution space, and learning about the interactions between solution components. This is made explicit (and much more elegant) in the definition of EDAs:
By contrast, gradient descent makes the assumption that such interaction effects don't exist or don't matter, which doesn't work for most optimisation problems - you get stuck in local optima.
> doesn't work for most optimisation problems - you get stuck in local optima.
This bit is not necessarily true. In practice, and given the high dimensionality, it is (apparently) quite hard to get stuck in local optima. 
In almost every optimisation problem I've encountered in practice, you do get stuck in local optima using simple techniques like SGD. Interaction effects are everywhere, and create local optima.
How do evolutionary algorithms let you evolve the architecture? You define a model space, and explore it stochastically (with different evolutionary strategies). The model space you are exploring is linked to a specific architecture.
Regardless, there's also efforts in Deep Learning to automatically learn optimal network architectures ("AutoML").
In reinforcement learning: evolutionary algorithms work by applying peturbations in the parameters space, not in the action space. If your problem is sensitive to action-space perturbations (requires consistent strategies) it might not be possible or efficient to use "standard" RL. Evolutionary strategies (ES) are quite competitive in RL, especially paired with some dimensonality reduction technique (unsupervised VAE).
Local minima are not really an advantage. ES are nearly local - they sample the space around the current solution to approximate the gradients. They'll get stuck in local minima sometimes too (if there are any). Bayesian optimization is more interesting, as it's actually global.
Well, the really big advantage is that simple gradient descent can get you stuck in local maxima (or minima, depending on how you look at things). Many fitness landscapes are guaranteed to produce suboptimal results if you use gradient descent. By contrast, a sufficiently large random variation will eventually "get out of the local hole"/"get past the local hump".
It doesn't necessarily have anything to do with differentiability; consider the graph of 0.5x+sin(x). A gradient descent technique is going to get stuck on that one right away, while random exploration won't.
Edit to address some stuff that dragontamer brought up below:
A good genetic algorithm is mostly going to produce offspring that are close to the parents (recombination, which de facto is similar to hillclimbing/gradient descent/gradient ascent) but occasionally it's going to try something off that's completely off the chain (mutation). Some of the early work with genetic algorithms focused on mutation, but it soon turned out that recombination was better most of the time. Nonetheless, you still need some mutation, or you run the risk of getting stuck at local extrema.
I would also say that there are multiple ways to escape local optima (setting a larger learning rate, multiple random initialisations, ensembling).
Those aren't the same thing at all.
This is presuming that the problem has a continuous formulation anyways.
As long as you're clear on what the decisions and objectives are, and you're able to run the model yourself, layering evolutionary optimization on top is easy.
This may indicate you chose reduntant features though.
Here's an interesting application that provides some motivation.
If, in a particular region, your model outputs are particularly sensitive to the inputs, that region can be overrepresented in a feature map. What I mean by this is that we cannot effectively choose between the solutions in this region -- the region is small compared to our ability to control inputs. Overrepresenting such a region in a population-based search starves the rest of the problem space of the algorithm's attention.
Thanks for the link, by the way -- this is a particularly interesting application.
Simulated Annealing works because it doesn't get stuck in local maximums. Do a simple gradient ascent on this simple 2d plot, and its very, very unlikely to achieve the true global value: https://upload.wikimedia.org/wikipedia/commons/d/d5/Hill_Cli...
But if you do Simulated Annealing (aka: random walk), you get to the global maximum much more reliably. Genetic Algorithms have a similar effect as Simulated Annealing because Genetic Algorithms are strongly randomized. But they still "hill climb" explicitly, because its rare for parents to be thrown away if they were superior to the children.
I don't think that's a fair characterisation, and it could be misleading for people new to SA. I'd say SA is a hill-climbing algorithm that has a probability of accepting an inferior move, a probability which shrinks asymptotically over time.
Evolutionary strategies are doing (natural) gradient descent using an estimate of the gradient. Here's a good article: http://www.inference.vc/evolutionary-strategies-embarrassing...
But it does make sense that a random walk would be more efficient in very high dimensional problems!
When you don't know the traits that are beneficial in an environment, a significant change adversely affects all sexually reproduced algorithms to a significant degree, whereas it affects all asexually produced algorithms at varying degrees.
There is a joke to be made here.
Isn't evolvability only about robustness? What other criteria would improve evolution? Mutations just happen, so the question is how well the phenotype can deal with these mutations. If it can incorporate mutations well, it can tunnel to different useful phenotypes and therefore is robust.
I mean starting at a very primitive level, you could first aim for something like ant-level intelligence, then go to higher and higher intelligences from there.
These are mathematical models that optimise a specific fitness function, they borrow some biological concepts but they are not going to "evolve into ant-level intelligence".
How do we know until we've tried it?
I mean, at a certain level of abstraction, you can just view humans as a bunch of atoms that have been placed in a heat bath for a very long time. And yet intelligence seems to have spontaneously self-organized from it.
They are nowhere close to the complexity of a brain, and certainly do not have enough capacity to be a realistic simulation of the universe.
> How do we know until we've tried it?
Tried what, is the question? What's an ant-level intelligence? We have algorithms that can detect images/speech at super-human level (at least in some conditions). Worthy questions, but unrelated to these algorithms.
I am thinking, in particular, of an experiment done by Richard Feynman where he got ants walking across small sheets of glass. After shuffling the sheets around so the chemical trail became a loop, he was able to get the ants tirelessly marching around in a circle (which they would presumably do until they dropped).
Heck, you probably don't even need an ANN at all to do that, much less an evolutionary one.
They didn't? I'm not sure what you're saying here. Are you arguing intelligent design or something? Because the standard evolutionary theories postulate that humans did, in fact, spontaneously arise from a heat bath (the heat coming from the Sun).
That's not what I typically think of when I hear the word "spontaneous". Spontaneous emergence sounds to me like lightning hitting a swamp over a period of time, and then by a pure chance arrangement of molecules, a fully formed swamp man comes out of it.
That's way different than an evolutionary process.
> Do you think creating a 3D universe (like that being worked on by open.ai), putting the evolvable agents in it and trying to make them evolve learning and intelligence is a good idea
Basically simulating a universe, seeded with some things we think are already somewhere along the road to intelligence, then seeing if progress is made just by simulating for a long time.
The question is, what is that function?
Not necessarily. Everything points to the fact that the humain brain (at least, I don't know much about ants) does not work in any way similar to neural networks, and as such there's no guarantee that you can represent its behaviour by the current algorithms.
For example, real neurons have no supervision signal. Memory is also a big issue (see the work being done by DeepMind and FAIR on differentiable neural computers, and memory networks) and Reinforcement Learning still struggles with long-term planing, switching strategies, etc.
Yes, the brain is most likely not simply modelling one giant ANN, but it's much more plausible that ANN-like components have a role in it.
Doesn't the universal function approximation theorem provide just such a guarantee? It doesn't guarantee any algorithm will converge on that representation, but the capability to represent it is there.
An ant’s fitness function is different from its ancestor’s fitness function, which is different from its own ancestor’s fitness function, which is different from...
You don’t get from zero to complexity with a single fitness function.
The question we should ask is: How do we vary the fitness function over time in order to evolve something complex?
I have no idea what the answer is, but...
“If you know the question, you know half.”
-Herb Boyer, geneticist
Sexual reproduction basically outsources some of the selection effort to the cognitive apparatus of the species itself, thereby introducing a massive amount of additional selection signals (mainly by the much increased necessity to model other minds, namely minds of the opposite sex). Many of these signals promote traits that are useful for survival (mainly intelligence and health).
Your point about physical features made me think of how physical attractiveness plays into human development as well.
Research has shown that beauty in humans is defined as physical symmetry. So "novelty" in our case might be defined as someone who is really ugly - the elephant man.
So in this case, beauty wouldn't really fit into the robot walking example as it's neither fitness (moving the foot forward) or novelty. More a different type of fitness that increases the odds of reproduction.
Perhaps the lack of success so far has been due to the fact that the faculties of the brain we attribute to higher-order complex thought are themselves based upon brain regions that are much older evolutionary-speaking. So it's not just one system you have to evolve--it's many hierarchical subsystems. And given the amount of time it took before human-level intelligence arose in the animal kingdom it's pretty clear that randomly evolving a human-level intelligence isn't something that's going to happen very often.
It also seems to me that the only reason intelligence evolved in the first place was because of the complex environment provided by our natural world. I have a feeling that we're not going to see much in the way of intelligent learning systems unless we can provide those systems with a sufficiently complex environment to learn and evolve within.
At the point that system can unprompted change what it shows the user, it'll be interactive enough that we will presumably just have to wait for adequate computational backing for it to "wake up".
If I'm understanding this right, it seems like most approaches up to this point are focused on evolving a single "unit" or brain / person.
You have the concept of "nature" that selects which units will advance to the next generation and passing on "DNA". First based on fitness and now the new approach is novelty.
This may seem a bit naive, but has anyone explored adding a social dimension?
For the robot walking example, you have nature choosing novelty and those that made progress.
A basic social dimension could have units observing other units and sharing information.
But then there's a variety of other dimensions - a unit blocking another unit from walking (cheating), a bigger unit destroying a smaller one, maybe success via misc factors like physical symmetry / popularity.
Idk. Just curious to learn more and where the research is at.
You'd have to be running every instance simultaneously and have them all observing and reacting to each other.
Also gets into some interesting "morality" type of questions. As in if an instance cheats to get ahead, are there consequences?
What has been useful for me is just setting up experiments and seeing what happens, and observing the way things go wrong. Typically evolution will find the flaws in any fitness score (metric) that is defined, often resulting in solutions that give good scores, but only because evolution was 'cheating', or rather, exploiting the flaws in the metric. This problem comes up a lot and it is is educational to go through that process and try to think up more robust tasks and ways of measuring success on those tasks. Ideally we want metrics that are continuous, so that we can 'follow the gradient' of success rather than e.g. having a fitness score just go from zero to a perfect score in one step.
Also it's been educational seeing how networks evolve. One of the early modifications I made to NEAT was to periodically disable additive mutations (add nodes and connections), so as to strip away any redundant structure that could be slowing down evaluation of each neural net. This came from just seeing the networks grow rapidly, and not always with an fitness score gains to show for it.
In summation I don't have a good answer for you, but if you can get interested in setting up an experiment and seeing how evolution tries to solve it all by itself hen maybe you'll get the 'bug' and start to think about where the limits to this approach are, and different ways of overcoming some of the present limits. From there you might want to pick through the occasional research paper to see what others are doing - I've found the NEAT based research to be quite accessible, i.e. e.g. not requiring a a lot of maths knowledge as is the case with deep learning and the like. Although I admit I do follow that world and have had to learn quite a bit of maths to keep up - and I think that's useful to draw ideas from that area.
Hope that is of some help anyway.
There's also a social factor here, I think. Neural networks were disregarded due to over-claims and also perhaps due to some bias for machine learning methods that relied on understandable statistical models. In the same way, evolutionary computing is often looked down upon by the rest of the AI community - it's a lot of hacking and hand waving and "try it and see". It doesn't help that there's been a lot of low-quality work in EC. But this bias has perhaps starved EC of talent somewhat; much of the progress in deep learning is probably a result of increased research activity and optimism, i.e. progress is self-reinforcing.
At least from a viewpoint of number of trials, we can easily replicate the number of trials evolution has to work with using an everyday desktop. Why don't we see comparable evolution in the desktop? There is some sort of magic to evolution we have not yet fathomed.
The author doesn't seem to understand what backpropagation is - gradient descent is performed on the derivative formed from backpropagation, they are not forms of one another.