
Learning Game of Life with a Convolutional Neural Network - DanielRapp
http://danielrapp.github.io/cnn-gol/
======
shaggyfrog
This is the perfect place to quote one of the hacker koans:

In the days when Sussman was a novice, Minsky once came to him as he sat
hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-Tac-Toe" Sussman
replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?", Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened.

~~~
claar
If you don't understand this (like me), see
[https://en.wikipedia.org/wiki/Hacker_koan#Uncarved_block](https://en.wikipedia.org/wiki/Hacker_koan#Uncarved_block)
for an explanation.

According to that link, it's based on a true story:

    
    
      So Sussman began working on a program. Not long after,
      this odd-looking bald guy came over. Sussman figured the
      guy was going to boot him out, but instead the man sat
      down, asking, "Hey, what are you doing?" Sussman talked
      over his program with the man, Marvin Minsky. At one point
      in the discussion, Sussman told Minsky that he was using a
      certain randomizing technique in his program because he
      didn't want the machine to have any preconceived notions.
    
      Minsky said, "Well, it has them, it's just that you don't
      know what they are." It was the most profound thing Gerry
      Sussman had ever heard. And Minsky continued, telling him
      that the world is built a certain way, and the most
      important thing we can do with the world is avoid
      randomness, and figure out ways by which things can be
      planned. Wisdom like this has its effect on
      seventeen-year-old freshmen, and from then on Sussman was
      hooked.

------
mark_l_watson
Neat example! I had not run across convnet.js before, so thanks. Karpathy was
I believe one of Hinton's students. I looked over my archived courses last
night and saved all of Hinton's videos and viewgraphs from his coursera class
in 2012. Really good stuff, and it is a real shame that COursera does not
continually re-run that course on auto-pilot (I am sure that Hinton and his
students are too busy to participate in the forums but the videos, auto-
generated homework, and the quizes would stand on their own without TA
intervention.)

Convolutional networks, deep learning, and machine learning in general are
arguably the most disruptive technologies in our lifetimes. BTW, pardon the
shameless plug, but I just started a new blog this morning to host ML
resources and my own experiments:
[http://blog.cognition.tech/?view=classic](http://blog.cognition.tech/?view=classic)

~~~
JabavuAdams
That course was fantastic. Work prevented me from doing the assignments, but
I've also archived the materials.

------
tlarkworthy
Cool, but convo nets are not the right tool though. Neural networks have an
implicit smoothness and locality prior (which is what back propagation
exploits) which this game does not really posses. Decision trees (e.g. random
forests), are able to deal with sharp discontinuities better and would be more
appropriate. I would expect less training, and perfect results.

~~~
azakai
I agree decision trees are more natural here, but even they are overkill, I
think: all that needs to be learned is a function from the 8 neighbors to the
center, which means from 8 bits to 2 bits, or in other words, there are just
256 values to be learned.

Literally any reasonable learning algorithm, even nearest neighbor, will learn
that fairly quickly. The article had to use millions of samples, which is
extremely excessive - a more efficient algorithm should only require
thousands.

What might be interesting could be to see whether the neural network learns
the underlying rule, that placement does not matter, only the sum matters, and
that the crucial value is "3". That would be cool to see.

~~~
mdpopescu
_... there are just 256 values to be learned_

This is a very good point and it's the reason why I don't find much value in
GAs. There is an inherent conflict between "an algorithm for a specific
problem" (which can be optimized to run in a short time - the above problem
can be reduced to a lookup in a 256 x 2 bit matrix and thus run in O(1)) and
"an algorithm for any problem (in a class)" which, while theoretically capable
of finding a solution, is prohibitive in time and/or space.

As far as I can tell, GAs are barely any better than a random walk and thus
not useful for any non-trivial problems. I am not yet sure if NNs are in the
same category, though I suspect they are.

------
boothead
Convolution is a comonad. [http://kukuruku.co/hub/haskell/cellular-automata-
using-comon...](http://kukuruku.co/hub/haskell/cellular-automata-using-
comonads)

I love it when there's a nugget of rigor behind things like this that can be
pulled out and applied elsewhere!

------
hughes
I guess it works for a few specific cases you mentioned, but the example
animation at the end diverges after twelve steps[1]

[1] [http://imgur.com/a/gcUCH](http://imgur.com/a/gcUCH)

~~~
karpathy
It does seem like it wasn't trained in the most optimal fashion.

\- The conv layer has only 20 relu units, I'm not sure if that suffices to
memorize the rules. The net might be "underfitting" the rule set, and
therefore making mistakes in rare pixel arrangements.

\- More worryingly it's also strange that the author uses a fully connected
layer right after the conv layer (this is implicitly added in ConvNetJS when
you specify a loss layer). This means that the output neurons are a function
of the entire preceding CONV layer activations everywhere, while the game of
life rules are local. The way to do this would be to instead use a 1x1 CONV
layer with 2 neurons on top of the first 3x3 CONV layer, and interpret it as
computing the class scores at every spatial position. However, in this case
you'd want to apply the loss on every spatial position, and this "fully-
convolutional" loss use case is not supported out of the box in ConvNetJS, but
could be written.

\- However, with a fully-convolutional loss zero-padding of 1 used around the
borders might cause trouble. Normally this is okay with images, but here this
might cause trouble because the neurons all share parameters spatially (and
hence compute the same function) and don't "know" if they are at the border on
in the middle of the image. I'm not sure how how this game of life handles
boundary conditions, but if borders obey different dynamics then you'd want to
distinguish the border pixels with a special "border" feature vector at the
input. E.g. each pixel is a 3-vector, with a 1-hot encoding for (border,
positive pixel, negative pixel).

\- And it's also strange that the author uses "regression" loss for some that
is a binary classification problem.

So, nice attempt but several funny choices, and clear why it didn't fully work
:)

~~~
JabavuAdams
Comments like this, and Hinton's Coursera course remind me that there's a
whole "Black Art" to training these systems.

Do you think a similar approach could be used to learn to generalize Navier-
Stokes by looking at fluid flows?

~~~
dave_sullivan
It's not any more of a black art than programming is. There's no magic to it,
you just have to know what you're doing. Also, you should use some kind of
scheme (like bayesian optimization) to optimize the hyperparameters of many
experimental models. That's the best way to make sure you're getting the best
results (but of course takes a while, so you do have to be selective about
_which_ parameters you optimize or what combinations you allow).

And I'll also aknowledge that "programming" is a vastly more mature field than
"deep learning" or even "machine learning". So it's fair to argue that there's
much we don't know, but there's more and more we do.

