
The quest to evolve neural networks through evolutionary algorithms - hardmaru
https://www.oreilly.com/ideas/neuroevolution-a-different-kind-of-deep-learning
======
rdlecler1
I spent more than 10 years working on this problem. One major challenge is how
to represent the genotype-phenotype map so that genetic variation leads to
favorable phenotypic changes. For this you start going down a rabbit hole of
developmental evolution and the evolution of evolvability and then a seemingly
unrelated subject: the evolution of robustness.

For anyone that’s interested here’s an article I published in 2008–it has
about 175 citations.

[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2538912/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2538912/)

I’d also recommend the work by Peter Eggenberger from the late 90s.

~~~
halflings
I have always seen evolutionary algorithm as a version of gradient descent
that throws away the gradient, and instead just goes randomly in all
directions until it finds something that works (which is extremely wasteful).

I suppose this is wrong since people still find these techniques useful, but
what are the advantages of these techniques compared to gradient descent?
(other than the fact that you don't need your fitness function to be
differentiable)

~~~
Turing_Machine
"what are the advantages of these techniques compared to gradient descent?"

Well, the really big advantage is that simple gradient descent can get you
stuck in local maxima (or minima, depending on how you look at things). Many
fitness landscapes are guaranteed to produce suboptimal results if you use
gradient descent. By contrast, a sufficiently large random variation will
eventually "get out of the local hole"/"get past the local hump".

It doesn't necessarily have anything to do with differentiability; consider
the graph of 0.5x+sin(x). A gradient descent technique is going to get stuck
on that one right away, while random exploration won't.

Edit to address some stuff that dragontamer brought up below:

A good genetic algorithm is _mostly_ going to produce offspring that are close
to the parents (recombination, which de facto is similar to
hillclimbing/gradient descent/gradient ascent) but _occasionally_ it's going
to try something off that's completely off the chain (mutation). Some of the
early work with genetic algorithms focused on mutation, but it soon turned out
that recombination was better _most_ of the time. Nonetheless, you still need
_some_ mutation, or you run the risk of getting stuck at local extrema.

~~~
halflings
I think the premise (neural nets get stuck in local optima) is not trivial,
and there has been a lot of research about non-convex optimisation showing
that this is not much of an issue. I am not a researcher, but this answer [0]
points to this research.

I would also say that there are multiple ways to escape local optima (setting
a larger learning rate, multiple random initialisations, ensembling).

[https://www.quora.com/How-come-neural-networks-dont-get-
stuc...](https://www.quora.com/How-come-neural-networks-dont-get-stuck-in-
poor-local-optima-Why-are-there-so-many-high-quality-local-optima)

~~~
Turing_Machine
You say "not much of an issue" but your link says that it's an "open question
that is _probably_ being worked on in the community".

Those aren't the same thing at all.

------
Jizzle
While perhaps not immediately pertinent to the problem, the emergence of
sexual reproduction is credited as a major factor in the explosion of
variation in multicellular organisms. Part of the advantage, ironically, is
that sexual reproduction is a rather large burden compared to evolutionary
strategies that came before. The fact that sexual reproduction is so
challenging perhaps ensures that less is left to chance by filtering out
individuals that cannot meet the burden and rewarding those that tend to be
better at achieving reproduction itself. Likely there is some relation between
improved reproductive success and some novel traits that helped to achieve it.
I don't have the knowledge to weigh in on AI analogs, but I could imagine
roughly a strategy that involves co-related burdens and goals to improve the
chances of choosing the right individuals.

~~~
mannigfaltig
I think, the main effect of sexual reproduction is that, much like GANs and
competitive self-play, it creates species-internal competition: Both sexes
need to impress, which makes _cheating_ an obvious strategy (makeup, steroids,
Shakespeare quotes, LISP etc., but many such examples can be found in the
animal world), and hence both sexes also need to be able to _detect_ cheating.
Some species are rather asymmetric in that regard. For example, in humans it
is mainly women who attract (they masquerade as fruits [make up is likely a
cross-cultural phenomenon; and, well, breasts] tapping into male food
gathering circuitry); men compete in hierarchies trying to impress and women
select men from the top of the hierarchy. Complex dynamics emerging from this
likely lead to the immense growth of the human cortex.

Sexual reproduction basically outsources some of the selection effort to the
cognitive apparatus of the species itself, thereby introducing a massive
amount of additional selection signals (mainly by the much increased necessity
to model other minds, namely minds of the opposite sex). Many of these signals
promote traits that are useful for survival (mainly intelligence and health).

~~~
mfrye0
I was thinking the same thing. I think it would be interesting to explore this
area more and try to model it computationally.

Your point about physical features made me think of how physical
attractiveness plays into human development as well.

Research has shown that beauty in humans is defined as physical symmetry. So
"novelty" in our case might be defined as someone who is really ugly - the
elephant man.

So in this case, beauty wouldn't really fit into the robot walking example as
it's neither fitness (moving the foot forward) or novelty. More a different
type of fitness that increases the odds of reproduction.

~~~
mannigfaltig
My guess would be that symmetry is a simple heuristic measure of physical
fitness. Visual attraction is basically a strong regularizer that restricts
the search space to phenotypes with particular traits. Asymmetry means that
the joints wear out more quickly and muscles might not coordinate optimally
leading to less strength and a reduced ability to hunt and to fight predators.
AFAIK it is also a quite robust predictor of all kinds of diseases because it
often means that the growth signalling is out of tune throughout the system.
Visual selection basically performs environmental selection more immediately
and more effectively: an asymmetric person might still survive, but its
offspring has a lower overall chance to survive. The teaching signals of that
are much weaker.

~~~
mfrye0
Good points. Never thought of it that way.

------
wei_jok
For those new to the subject, here is a nice visual primer to Evolution
Strategies.

[http://blog.otoro.net/2017/10/29/visual-evolution-
strategies...](http://blog.otoro.net/2017/10/29/visual-evolution-strategies/)

~~~
xoroshiro
Thanks for that. My undergrad was Industrial Engineering but I always loved
the Operations Research stuff (which I'm hoping will lead me to more CS-y
stuff if I can get a job or a Masters or something). All that OR with LPs and
MIPs was a headache and but somehow Stochastic programming, GAs and NNs were
scary and I never even tried to understand them. I always thought people
describing them as simple were probably geniuses or something. Maybe one day
I'll understand the second half of that link.

------
fnl
How do evolutionary hyper-parameter optimizations compare to Bayesian ones? My
impression always was that Bayesian optimization is more targeted, and
therefore more efficient and ultimately finds optimal parameters faster.
However, evolutionary algorithms are easier to parallelize, so maybe EAs
indeed have a place in a non-research-oriented, applied DL setting?

~~~
levesque
Covariance matrix adaptation is comparable on real-valued hyperparameters, but
you're stuck outside of that. More typical genetic algorithms waste a TON of
compute time, so they might end up finding a good solution, but the
computational budget is out of reach of normal companies.

------
briga
I came up with a similar idea recently--it figures that every original idea I
ever have turns out to be not original and decades-old.

Perhaps the lack of success so far has been due to the fact that the faculties
of the brain we attribute to higher-order complex thought are themselves based
upon brain regions that are much older evolutionary-speaking. So it's not just
one system you have to evolve--it's many hierarchical subsystems. And given
the amount of time it took before human-level intelligence arose in the animal
kingdom it's pretty clear that randomly evolving a human-level intelligence
isn't something that's going to happen very often.

It also seems to me that the only reason intelligence evolved in the first
place was because of the complex environment provided by our natural world. I
have a feeling that we're not going to see much in the way of intelligent
learning systems unless we can provide those systems with a sufficiently
complex environment to learn and evolve within.

~~~
XorNot
My personal hypothesis is that strong AI will evolve when Google decide to add
live feedback to google search suggestions.

At the point that system can unprompted change what it shows the user, it'll
be interactive enough that we will presumably just have to wait for adequate
computational backing for it to "wake up".

------
mfrye0
I found this article really intriguing as someone new to this field.

If I'm understanding this right, it seems like most approaches up to this
point are focused on evolving a single "unit" or brain / person.

You have the concept of "nature" that selects which units will advance to the
next generation and passing on "DNA". First based on fitness and now the new
approach is novelty.

This may seem a bit naive, but has anyone explored adding a social dimension?

For the robot walking example, you have nature choosing novelty and those that
made progress.

A basic social dimension could have units observing other units and sharing
information.

But then there's a variety of other dimensions - a unit blocking another unit
from walking (cheating), a bigger unit destroying a smaller one, maybe success
via misc factors like physical symmetry / popularity.

Idk. Just curious to learn more and where the research is at.

~~~
rdlecler1
The is mostly in the real of Artificil Life. Check out Chris Adami’s work from
the late 90s as a starting point.

~~~
mfrye0
Awesome, thanks! I've been reading up a bit and this is exactly what I was
hoping to find.

------
deepnotderp
One thing I find interesting to note is that modern day policy gradient deep
reinforcement learning methods are incredibly similar to evolutionary
algorithms and DRL methods have been applied quite successfully to neural
architecture search.

~~~
tonmoy
I wonder why evolutionary algorithms never managed to be that successful.

~~~
uoaei
They're quite successful on complex multiobjective problems, but you need a
lot of compute power to cover the solution space sufficiently. Most problems
either 1) don't have such stringent requirements on multiobjective
optimization (i.e. the problem can be recast with a single objective function,
likely one with a gradient), or 2) more typical machine learning algorithms
are "good enough" and require less compute power. But for instance designing
complex geometries under numerous constraints is well-suited to evolutionary
algorithms. EAs also shine when the evaluation for fitness is computationally-
intensive, as you can defer the fitness evaluations until absolutely
necessary.

~~~
yters
My understanding is MOEAs are not a whole lot better than random search.

------
ponderingHplus
Nice article, lots of good tidbits and insights. The thoughts on indirect
encoding seem like an especially interesting area of research. I actually just
finished up a project that used a genetic algorithm to tune some of the
hyperparameters of an RNN [1]. I didn't evolve the architecture except which
cell to use (LSTM or GRU), but it wouldn't be much of an extension to the
proof of concept I coded. Thanks for sharing.

[1] [http://cole-
maclean.github.io/blog/Evolving%20the%20StarCraf...](http://cole-
maclean.github.io/blog/Evolving%20the%20StarCraftII%20Build%20Order%20Meta/)

------
grblovrflowerrr
I wonder how much damage(physical damage) comes into play with evolving robust
systems and creating interesting variations in nature. Using the robotic gait
example, in the real world organisms have a high enough chance of losing limbs
that having the neural circuitry to survive in the absence of one or more
limbs is advantageous. Maybe these simulations could increase their ability to
produce robust networks by modeling accidental damage(from tiny to
catastrophic) to random parts of networks and selecting those that survive in
spite of such damage.

~~~
robotresearcher
See the work of Josh Bongard and Hod Lipson, examining exactly this topic.

[http://science.sciencemag.org/content/314/5802/1118](http://science.sciencemag.org/content/314/5802/1118)

------
redcalx
Author of SharpNEAT here if anyone has any questions.

~~~
pinouchon
What would be your recommendation to learn about these topics ? Moocs, books,
having some pet project ?

~~~
redcalx
I don't know of any online teaching material, I think this is because it is a
relatively niche and under explored area of research - which is one of main
reasons I continue to maintain and develop sharpneat.

What has been useful for me is just setting up experiments and seeing what
happens, and observing the way things go wrong. Typically evolution will find
the flaws in any fitness score (metric) that is defined, often resulting in
solutions that give good scores, but only because evolution was 'cheating', or
rather, exploiting the flaws in the metric. This problem comes up a lot and it
is is educational to go through that process and try to think up more robust
tasks and ways of measuring success on those tasks. Ideally we want metrics
that are continuous, so that we can 'follow the gradient' of success rather
than e.g. having a fitness score just go from zero to a perfect score in one
step.

Also it's been educational seeing how networks evolve. One of the early
modifications I made to NEAT was to periodically disable additive mutations
(add nodes and connections), so as to strip away any redundant structure that
could be slowing down evaluation of each neural net. This came from just
seeing the networks grow rapidly, and not always with an fitness score gains
to show for it.

In summation I don't have a good answer for you, but if you can get interested
in setting up an experiment and seeing how evolution tries to solve it all by
itself hen maybe you'll get the 'bug' and start to think about where the
limits to this approach are, and different ways of overcoming some of the
present limits. From there you might want to pick through the occasional
research paper to see what others are doing - I've found the NEAT based
research to be quite accessible, i.e. e.g. not requiring a a lot of maths
knowledge as is the case with deep learning and the like. Although I admit I
do follow that world and have had to learn quite a bit of maths to keep up -
and I think that's useful to draw ideas from that area.

Hope that is of some help anyway.

------
yters
Evolutionary algorithms are perennially intriguing, but generally other
approaches are favored. Odd, considering the incredible success of evolution
with extremely limited trials from a computational perspective. I wonder why
evolutionary computation does not live up to the expectation?

~~~
cjalmeida
I'd guess limited computing resources. Nature had billions of years times
trillions of "experiments" instance to reach it's amazing current state.

~~~
jamesrom
Not only that, but a virtually unbounded population space to test. Count every
single organism alive today.

~~~
yters
Our top supercomputers can process tens of petaflops. That seems to easily
dwarf these biological time scales you mention. There appears to be some magic
in evolution that surpasses mere trial limitations.

~~~
jamesrom
I'd like to double down and say the top supercomputers today couldn't, in
reasonable time, simulate a single generation of the genetic variation in your
gut flora alone.

~~~
yters
Sure, but that means the magic does not lie in evolution, but in the extreme
amounts of information in biological organisms. Evolution is supposed to be
able to generate incredibly complex organisms from very simple beginnings, all
within a number of trials that can be simulated on today's desktop. Insofar as
we displace the problem from evolution to the organism or environment we are
saying there is nothing special about evolution. In which case it begs the
question what is so special about evolution in the first place?

------
jamez1
> At the time, its small group of practitioners thought it might be an
> alternative to the more conventional ANN training algorithm called
> backpropagation (a form of stochastic gradient descent).

The author doesn't seem to understand what backpropagation is - gradient
descent is performed on the derivative formed from backpropagation, they are
not forms of one another.

~~~
jhj
Yep, also SGD is not necessarily implied either (for example, full batch
gradient descent).

------
yters
Isn't detecting unlimited novelty impossible due to Chaitin's incompleteness
theorem?

