
Multi-Task Learning in Atari Video Games with Emergent Tangled Program Graphs - sengork
https://dl.acm.org/citation.cfm?id=3071303
======
bomdo
I was a little surprised at the headline, since I expected 'outperforms' to
mean that it had better end-results, which is of course not the case. GP is
just much faster due to it's relative simplicity and the results are close
enough to those achieved with NN and deep learning.

> Finally, while generally matching the skill level of controllers from neuro-
> evolution/deep learning, the genetic programming solutions evolved here are
> several orders of magnitude simpler, resulting in real-time operation at a
> fraction of the cost.

> Moreover, TPG solutions are particularly elegant, thus supporting real-time
> operation without specialized hardware

This is the key takeaway and yet another reminder to not make deep learning
the hammer for all your fuzzy problems.

~~~
symmetricsaurus
> I was a little surprised at the headline, since I expected 'outperforms' to
> mean that it had better end-results, which is of course not the case. GP is
> just much faster due to it's relative simplicity and the results are close
> enough to those achieved with NN and deep learning.

From figure 3 in the paper it seems like it outperforms DQN on all games but
one. So, it has better end results as well.

Edit: There are other results linked in this thread that are better than the
2015 DQN results that the paper refers to.

------
smdz
One of the huge benefits of GPs over NNs is the ease of reverse engineering a
GP tree compared to NN models. Its not effortless however. Its just not
mathematically complex like NNs i.e. a programmer who isn't a mathematician
can analyze GPs with a lot of patience

EDIT: I have found GPs to be relatively slow-to-very-slow. But very likely
that is because of the lack of interest and development compared to NNs

~~~
kensoh
Would some form of computer-aided tool be able to help the programmer analyze
GPs? It sounds like there's some repetitive process when you say 'GPs with a
lot of patience'. I was thinking along the line that if such a tool is
possible, then it might be possible to let the development and evolution of
GPs iterate faster than deep learning. I found this thread really interesting,
because of the small size, such implementations can be easily done locally
without expensive and huge hardware setup. I'm definitely amazed at what DL
has achieved so far. But it looks to me like a brute force way to solve a
problem by gathering huge amounts of data, crunching on huge amounts of
hardware, to solve a very specific classification problem. Not saying that is
not good, just saying it is hard for mass participation at a high production
quality level due to lack of hardware and stuffs.

------
nivwusquorum
Those are really old results. They should compare to this one:
[https://arxiv.org/pdf/1511.06581.pdf](https://arxiv.org/pdf/1511.06581.pdf)

~~~
csl
How can the results be old when the paper is from 2017?

~~~
aqsalose
They are comparing their genetic programming results with a deep learning
paper published in 2015. [1]

[1]
[https://www.nature.com/nature/journal/v518/n7540/abs/nature1...](https://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html)

~~~
phoe-krk
What a time to live in, when papers from 2015 are "really old" in 2017.

~~~
posterboy
What is "exponential growth"!

~~~
sk0g
It's more logarithmic, isn't it? Right now we're at the beginning stage, where
there's massive discoveries and changes happening, but in say, 50 years time,
there won't be much changing year-to-year.

~~~
jcranberry
Logarithmic or logistic?

------
partycoder
The convenient thing about Atari games is that there is usually a numerical
score that can be used as input for the fitness function.

------
cshenton
This is super cool, but it doesn't outperform deep learning based RL methods.

In fact, I'm not sure how much more compute efficient than something like A3C
it would be. That can produce 4x the score of DQN in a comparable number of
hours (and on a CPU).

~~~
habitue
A3C is only ever run on one game at a time[0]. This paper gets good
performance on all games with the same agent

[0] read as: I have only seen papers with 1 agent per game for A3C

~~~
cshenton
So it can train on one game and play without training on a previously unseen
(but also atari) game? That's pretty neat, DQN and A3C certainly can't do
that.

~~~
habitue
No, in this case it is trained on all the games, but retains good scores on
all of them. If you train basic AC3 in all the games, you'll get poor
performance on all the games due to catastrophic forgetting

------
gourou
Genetic Programming seems lightweight, what are some cool applications they
have?

~~~
hchasestevens
There are actually lots of very exciting GP applications! One of my favorites
is "Fixing 55 out of 105 bugs for $8 each", in which GP is used to
automatically repair code: [https://www.cs.virginia.edu/~weimer/p/weimer-
icse2012-genpro...](https://www.cs.virginia.edu/~weimer/p/weimer-
icse2012-genprog-preprint.pdf) . They've also achieved better-than-human level
results in antenna design for NASA ( [https://ti.arc.nasa.gov/m/pub-
archive/1244h/1244%20(Hornby)....](https://ti.arc.nasa.gov/m/pub-
archive/1244h/1244%20\(Hornby\).pdf) ) and in discovering novel quantum
computing algorithms ( [http://faculty.hampshire.edu/lspector/pubs/GP-quantum-
GP98-w...](http://faculty.hampshire.edu/lspector/pubs/GP-quantum-GP98-with-
cite.pdf) ).

I'll also shamelessly hock here my GP framework for Python, in case you're
interested in experimenting:
[https://github.com/hchasestevens/monkeys](https://github.com/hchasestevens/monkeys)

------
gourou
What's a good starting point for someone interested in building game AI?

~~~
tdb7893
Current game ai is vastly different than this I think. I think there is a good
writeup on the ai from FEAR that might be a decent read

~~~
uoaei
[http://alumni.media.mit.edu/~jorkin/goap.html](http://alumni.media.mit.edu/~jorkin/goap.html)

------
nocoder
This sounds interesting. I will like someone from the field of genetic
programming on how this works and how it differs from current DL approaches.

~~~
henning
Kelly's approach involves evolving teams of programs.

His basic strategy is to have a scalable problem decomposition strategy.

So programs that process pixels and the teaming of those programs are grouped
together. The groupings (teams) themselves are co-evolved with the programs,
simultaneously.

This enables niching and specialization behavior.

This builds on earlier work on 'symbiotic bid-based genetic programming' from
other people at Dalhousie, the same university Kelly is at.

The innovation of this paper is that teams can reference other teams.

This allows for the creation of hierarchical teams. (There are rules to
prevent cycles and other edge cases.)

Everyone commenting here is probably going to just look at numerical game
score and ignore the fact that the runtime performance of Kelly's tangled
program graphs. They are 1000 times smaller than a deep neural network. That
matters for things like running on mobile/embedded devices.

~~~
Normal_gaussian
> That matters for things like running on mobile/embedded devices.

Ding ding ding. This is where the money is at, good yet cheap sensors that
sense human level actions are needed for IoT to be impactful.

------
jerianasmith
I like GP, but the problem is AST. These can get huge. But the only advantage
is ease of reverse engineering

------
99mistakes
Slightly relevant, here's a state of the art drone AI built using genetic
fuzzy systems: [https://www.forbes.com/sites/jvchamary/2016/06/28/ai-
drone/#...](https://www.forbes.com/sites/jvchamary/2016/06/28/ai-
drone/#50908d8b7081)

