
Genetic algorithms for training deep neural networks (2017) - 1gor
https://eng.uber.com/deep-neuroevolution/
======
wholemoley
The article is from last year but it's still extremely valuable and
interesting.

Exploring this topic is currently my primary hobby. Specifically, I've been
using OpenAI's retro (Sonic, Contra, Mario, Donkey Kong and, more recently
FZero) and comparing the ancient NEAT with more fashionable stuff like DQN,
PPO, A3C and DDPG.

With my extremely limited experience, NEAT seems to outperform all of these
other algorithms. I believe the advantage is the potential for strange/novel
network structure.

And the best part is that NEAT doesn't require a powerful GPU.

Apologies for the shameless plug but here's a link to a series on youtube I
made about using Retro and NEAT together to play Sonic.
[https://www.youtube.com/watch?v=pClGmU1JEsM&list=PLTWFMbPFsv...](https://www.youtube.com/watch?v=pClGmU1JEsM&list=PLTWFMbPFsvz3CeozHfeuJIXWAJMkPtAdS)

~~~
i_phish_cats
You are evolving the topology, but using regular gradient descent/backprop for
any given network, correct?

~~~
jawarner
No, in NEAT both the weights and topology are evolved. It is totally gradient-
free.

------
jayro
This is a fun book (published in 2001) about how a professor and his graduate
assistant developed a world-class checkers-playing algorithm using
neuroevolution:

[https://en.wikipedia.org/wiki/Blondie24](https://en.wikipedia.org/wiki/Blondie24)

[https://www.amazon.com/Blondie24-Playing-Kaufmann-
Artificial...](https://www.amazon.com/Blondie24-Playing-Kaufmann-Artificial-
Intelligence/dp/1558607838)

(Edit) One of the funniest parts I remember is that they had to leave it
running on a Pentium III for like a month or something.

------
guybedo
For those interested, i built a python app on top of tensorflow/keras to do
neural networks architecture & hyper parameters search with genetic
algorithms.

[https://github.com/guybedo/minos](https://github.com/guybedo/minos)

~~~
antirez
Note that here the weights are evolved, not just the parameters. Gradient
descent is not used at all AFAIK.

------
BrandoElFollito
I lost it at "emerging revolution".

My PhD thesis in 2000 already used genetic algorithms for seeds and it was
hardly new then.

~~~
ur-whale
Genetic algorithms are (and were already back in 2000) a pretty decent and
-more importantly- generic solution to the problem of global optimization (as
opposed to local optimization) when the problem to optimize has some sort of
(maybe not-so-smooth) structure.

Many of the recent "AI" development very often boil down to finding a local
extremum using some sort of ski down the slope optimization program (aka
"training"). These techniques very rarely tackles global optimization, or when
they do, bundle it up under the moniker "hyper-parameter tuning".

A good example of something that falls under "global optimization" and isn't
often tackled in deep learning would be finding the correct deep net
architecture for a given problem.

The problem doesn't lend itself very well to local optimization, but might
yield to GA-type optimization.

~~~
foxes
How are genetic algorithms guaranteed in any way to give you a global optimum?
I thought it might just work better when your search space is non smooth.

~~~
siekmanj
They're not, as far as I know. In fact, that was one of the big selling points
of reinforcement learning - that it tends to reach a better minimum than GA.

------
mark_l_watson
I tried training small RNN models using GA around 1990. I had lunch with John
Koza (the genetic programming pioneer) and he suggested that it was an
interesting idea but would not scale. The Uber team, my controlling mutation,
got it to scale - good for them. I did a few years later use this as an
example in my book "C++ Power Paradigms", McGraw-Hill 1994 (Genetic
Algorithms, Neural Networks, and Constraint Programming).

------
bayesian_horse
Too many ideas, too little time! I am already thinking for a while about how
Deep Learning and Genetic Algorithms could benefit each other.

GAs allow optimization of parameters without a differentiable loss function -
a major problem with evaluating behavior of a neural model, for example.

But also GAs could benefit from ML/DL. Predicting loss functions from a
chromosome representation (to save computing time), learning to select
promising pairs and even learning cross-over operators.

------
DrBazza
This is the not-so-secret sauce when training neural nets and backtesting for
algorithmic trading. It dramatically reduces the time taken.

~~~
hacker_9
Sounds interesting, are you able to go into more detail?

------
vmchale
step 1: hook it up to a car

