
Playing Atari with Six Neurons - togelius
https://arxiv.org/abs/1806.01363
======
pjrule
As someone working on a reinforcement learning/neuroevolution problem right
now, I find this to be extremely exciting. Fewer parameters, _ceteris paribus_
, is always better—the fact that the experiments in this paper were run on one
workstation, rather than on a massive farm of TPUs à la AlphaGo, implies
quicker development iteration time and more accessibility to the average
researcher.

The staging of components in this paper (compressor/controller), where
neuroevolution is only applied to a low-dimensional controller, reminds me of
Ha and Schmidhuber's recent paper on world models (which is briefly cited)
[1]. They employ a variational autoencoder with ~4.4M parameters, an RNN with
~1.7M parameters, and a final controller with just 1,088 parameters! Though
it's recently been shown that neuroevolution can scale to millions of
parameters [2], the technique of applying evolution to as few parameters as
possible and supplementing with either autoencoders or vector quantization
seems to be gaining traction. I hope to apply some of the ideas in this paper
to multiple co-evolving agents...

[1]. [https://worldmodels.github.io](https://worldmodels.github.io)

[2]. [https://arxiv.org/abs/1712.06567](https://arxiv.org/abs/1712.06567)

~~~
giuse
You may be interested in an even older paper:
[http://www.idsia.ch/~juergen/icdl2011cuccu.pdf](http://www.idsia.ch/~juergen/icdl2011cuccu.pdf)

~~~
pjrule
Thanks so much! I read this (and a few related papers) today. Besides the
novel algorithm discussed in the new Atari paper, do you have a reference
implementation of online vector quantization you might be able to recommend? I
think I could probably figure it out from the paper alone, but sometimes it's
nice to see code other people have already optimized. :)

~~~
giuse
Uhm unfortunately I do not, I could search for some on Google but I doubt I
would fare better than you at it. I went to code my own version, it is quite
straightforward. You can find it here:
[https://github.com/giuse/machine_learning_workbench/blob/mas...](https://github.com/giuse/machine_learning_workbench/blob/master/lib/machine_learning_workbench/compressor/vector_quantization.rb)
although polluted by research's trial and error, you can easily check the
minimal code necessary to run. Here's an example of how to use it:
[https://github.com/giuse/machine_learning_workbench/blob/mas...](https://github.com/giuse/machine_learning_workbench/blob/master/examples/image_compression.rb)
Let me know if that works for you or if you have further questions!

~~~
pjrule
That’s excellent! Thanks!

------
kthejoker2
Cool article, lots to digest, one thing caught my eye:

"To the best of our knowledge, the only prior work using unsupervised learning
as a pre-processor for neuroevolution is (cite)."

Just amazing how much low-hanging fruit there still is in the space.

~~~
giuse
Author here. The idea is low-hanging indeed, several friends (including
@togelius!) commented "I always wanted to do that -- eventually". Realization
is another matter. Have a look at the mess necessary to make it work: we had
to discard UL initialization for online learning, accept that the encoding
would grow in size, adapt the network sensibly to these changes, and tweak the
ES to account for the extra weights.

------
markatkinson
I have been wolfing down RL articles, videos and publications after a intro to
deep learning via Manning's Deep Learning for some time now and while the
overall concept of RL is easy to grasp (agents, actions and state etc) some of
the finer details and processes are quite confusing.

I am tempted to blame inconsistency across terminology and implementations for
this lack of understanding but I suspect it has more to do with approaching
this field through the lens of a developer and not a researcher or academic.
Trying to understand the code without grasping the "science" of the mechanisms
completely.

Either way if you feel to be in a similar spot check out this resource:
[https://reinforce.io](https://reinforce.io) and their respective Github repo:
[https://github.com/reinforceio/tensorforce](https://github.com/reinforceio/tensorforce).

Just reading through their code, and documentation has made a lot of the
concepts clearer.

And a few more resources I found really helpful:
[http://karpathy.github.io/2016/05/31/rl/](http://karpathy.github.io/2016/05/31/rl/)
[https://www.analyticsvidhya.com/blog/2017/01/introduction-
to...](https://www.analyticsvidhya.com/blog/2017/01/introduction-to-
reinforcement-learning-implementation/)
[https://www.oreilly.com/ideas/reinforcement-learning-with-
te...](https://www.oreilly.com/ideas/reinforcement-learning-with-tensorflow)

Edit: My point that I forgot to mention was that I always feel like I am
playing catch-up to understand what is going on half the time as the amount of
new content being released exceeds what I can absorb.

------
kthejoker2
And the Github library:

[https://github.com/giuse/DNE/tree/nips2018](https://github.com/giuse/DNE/tree/nips2018)

~~~
giuse
Ruby should be perfectly legible with a Python background, but for any
question just ping me on twitter. I would be happy to build a dialog :)

~~~
mkrum
Just curious, why did you pick ruby over python? Personal familiarity?

~~~
vidarh
It's particularly interesting that they've chosen to wrap Python using Pycall.
I'd love to hear about the tradeoffs of that.

~~~
giuse
Sure! It's quite simple: works like a charm. Completely transparent. You
`import` with Python-like syntax, and you get a Ruby object that transparently
forwards any message (i.e. method calls) to the corresponding Python object on
the underlying Python interpreter.

This means that Ruby does not need to know _anything_ about the Python object:
whatever you call on the Ruby object is just forwarded to the Python one, and
whatever result is passed back to Ruby.

About the overhead, I sincerely do not know; I expected to have some so my
code does part of the image pre-processing directly in Python (`narray`) in
order to pass a smaller object to Ruby, but besides that I could perceive none
-- grain of salt advised, as that was possibly hidden from me because my
computation in Ruby was orders of magnitude more complex/time consuming than
what was going on on the Python interpreter.

Definitely ping Murata-san either on GitHub
[https://github.com/mrkn/pycall.rb/](https://github.com/mrkn/pycall.rb/) or
Twitter `@mrkn`, I will send him a link to this thread so he can contribute if
he feels like it. Personally, I am a fan of his work and elegant approach, I
owe him for enabling me to keep working in Ruby while everybody publishes code
in Python :)

~~~
vidarh
Sounds great - I also prefer Ruby very strongly, and have tended to avoid
Python code because I didn't expect to have an easy way of wrapping it, but
will definitively have to play with Pycall.

------
kabdib
... and that's three more than the average Atari marketing exec had back then.
No wonder they had trouble understanding the game industry :-)

~~~
comboy
OK, you're so grayed out but your bio says you've been programming since '79
and you've written games for Atari. So perhaps all we need is some
elaboration? They seem like a successful company, don't they?

~~~
taneq
Story time maybe? Atari were successful for a while but they crashed pretty
hard.

~~~
wmblaettler
A story from the GP:
[http://www.dadhacker.com/blog/?p=987](http://www.dadhacker.com/blog/?p=987)

~~~
taneq
Ooh, thanks - I read a bit of the blog but didn't find this one!

------
coldseattle
I can post on hacker news with only 4.

~~~
a_t48
Yeah, that's pretty apparent. :)

