Hacker News new | comments | ask | show | jobs | submit login
Genetic algorithms for training deep neural networks (2017) (uber.com)
101 points by 1gor 4 months ago | hide | past | web | favorite | 20 comments

The article is from last year but it's still extremely valuable and interesting.

Exploring this topic is currently my primary hobby. Specifically, I've been using OpenAI's retro (Sonic, Contra, Mario, Donkey Kong and, more recently FZero) and comparing the ancient NEAT with more fashionable stuff like DQN, PPO, A3C and DDPG.

With my extremely limited experience, NEAT seems to outperform all of these other algorithms. I believe the advantage is the potential for strange/novel network structure.

And the best part is that NEAT doesn't require a powerful GPU.

Apologies for the shameless plug but here's a link to a series on youtube I made about using Retro and NEAT together to play Sonic. https://www.youtube.com/watch?v=pClGmU1JEsM&list=PLTWFMbPFsv...

You are evolving the topology, but using regular gradient descent/backprop for any given network, correct?

No, in NEAT both the weights and topology are evolved. It is totally gradient-free.

Yeah, topology and weights. It's highly subject to initial conditions. You almost need another NEAT network to evolve the initial conditions. I believe it's turtles all the way down.

This is a fun book (published in 2001) about how a professor and his graduate assistant developed a world-class checkers-playing algorithm using neuroevolution:



(Edit) One of the funniest parts I remember is that they had to leave it running on a Pentium III for like a month or something.

For those interested, i built a python app on top of tensorflow/keras to do neural networks architecture & hyper parameters search with genetic algorithms.


Note that here the weights are evolved, not just the parameters. Gradient descent is not used at all AFAIK.

I lost it at "emerging revolution".

My PhD thesis in 2000 already used genetic algorithms for seeds and it was hardly new then.

Your missing the point I think. GAs aren't new, same as neural networks which were thought up half a century ago. It's the finding that GAs work at such a scale, with 100 of layers and millions of nodes and even outperform SGD, that is of interest here.

Genetic algorithms are (and were already back in 2000) a pretty decent and -more importantly- generic solution to the problem of global optimization (as opposed to local optimization) when the problem to optimize has some sort of (maybe not-so-smooth) structure.

Many of the recent "AI" development very often boil down to finding a local extremum using some sort of ski down the slope optimization program (aka "training"). These techniques very rarely tackles global optimization, or when they do, bundle it up under the moniker "hyper-parameter tuning".

A good example of something that falls under "global optimization" and isn't often tackled in deep learning would be finding the correct deep net architecture for a given problem.

The problem doesn't lend itself very well to local optimization, but might yield to GA-type optimization.

Agreed. However, there is no reason to believe that a GA can give a better result than just randomly picking points in search space and doing local optimization around those points. GA's are evo-babble with much hand waving about 'correlation' during mutation or crossover rubbish. The 'curse of dimensionality' in global optimization doesn't go away by using GA's nor can you talk about their effectiveness without characterizing the landscape you are searching (the no free lunch theorem).

How are genetic algorithms guaranteed in any way to give you a global optimum? I thought it might just work better when your search space is non smooth.

They're not, as far as I know. In fact, that was one of the big selling points of reinforcement learning - that it tends to reach a better minimum than GA.

I had a genetically trained neural network driving a car simulation* in qbasick in 1997, and it wasn't "novel" it was quite common back then to combine them.

*a very limited one, imagine 2d, no inertia, 5 sensors at 15° each other each reporting distance from the nearest "grass", with the road 20px wide, drawn by hand gray on green and fitting the screen 13 space. network was 5x5 neuron, with weights all part of the gneetic code. car never made past the fourth corner, let alone did a lap.

I tried training small RNN models using GA around 1990. I had lunch with John Koza (the genetic programming pioneer) and he suggested that it was an interesting idea but would not scale. The Uber team, my controlling mutation, got it to scale - good for them. I did a few years later use this as an example in my book "C++ Power Paradigms", McGraw-Hill 1994 (Genetic Algorithms, Neural Networks, and Constraint Programming).

Too many ideas, too little time! I am already thinking for a while about how Deep Learning and Genetic Algorithms could benefit each other.

GAs allow optimization of parameters without a differentiable loss function - a major problem with evaluating behavior of a neural model, for example.

But also GAs could benefit from ML/DL. Predicting loss functions from a chromosome representation (to save computing time), learning to select promising pairs and even learning cross-over operators.

This is the not-so-secret sauce when training neural nets and backtesting for algorithmic trading. It dramatically reduces the time taken.

Sounds interesting, are you able to go into more detail?

step 1: hook it up to a car

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact