Yeah, this is what I didn't understand about that particular Kaggle competition. It doesn't seem possible that you could learn some general set of rules that would allow you to predict previous states with much accuracy.
You might be aware of genetic programming? There may be papers already out there, but the idea is representing a programs source as a genome, and then performing the familiar processes (mutation, crossover, selection) on those.
Anyhow, that would produce something more general. A genetic algorithm is traditionally a single shot thing.
Actually been playing with Cartesian Genetic Programming (CGP) recently.
> Anyhow, that would produce something more general.
Not really, at least as far as I can tell with CGP. I was evolving a circuit to solve the 8-bit parity problem (kind of the 'Hello world' of CGP - it solves this problem surprisingly quickly). There wasn't any way to use that information to evolve a circuit to solve the 16-bit parity problem - you've gotta start again from scratch. There may be some proposed solutions to this kind of issue in CGP - I'm still reading papers.
I've not dug into CGP yet, though I've read mentions of it. Would you perhaps have a good resource for learning? I'm a little weak on the math side.
In general, I think that tends to one of the advantages of genetic algorithm/programming over neural nets/deep learning - the math is comparatively quite easy.
This is the paper that introduced CGP by Julian Miller:
I found this good survey of CGP here (from 2018): https://link.springer.com/content/pdf/10.1007/s10710-019-093...
Edit: only 1 step though, not 4, as in the OP. I couldn't get my differentiable version to converge more than 1 step into the past.
Could it be a case of  (but on a non grand scale) :P? I can list my sources of inspiration:   .
I also tried training convolutional networks, using the soft life set-up, but failed to get them to converge.
Do you have any idea why that might be? It seems like convolution would be a natural for this problem.
I had the gradients going through the soft life approximation (i.e. it was part of the model), rather than simply training a normal cnn with life boards as the inputs and outputs. But I think the approximation may not have good enough gradient signals.
Here gradient descent is used to try and predict random game of life games 
Isn't this excruciatingly slow? I remember my old 486 computer did run life at full-screen full-resolution in realtime (probably 25fps at 800x600 ?) How it has come to that after 20 years? How can a 20 year old CPU outperform a modern GPU by one order of magnitude? Is it all fault of lots of useless intermediary language layers?
The JAX implementation of GoL he used should be able to do 10 billion cells / second
> Speed of execution
> Looking into speed of execution was not a primary goal, but in case people are interested initial: study suggests this code will do about 10 billion cells / second on the Google Colab GPU runtime.
For comparision this simple numba optimized GoL propagation function achieves roughly one billion cells per second on CPU (i7-9700K):
def propagate_jit(arr, result):
h, w = arr.shape
for y in range(1,h-1):
for x in range(1,w-1):
result[y,x] = arr[y,x]
num_neighbors = (
arr[y-1,x-1] + arr[y-1,x] + arr[y-1,x+1] +
arr[y,x-1] + arr[y,x] + arr[y,x] +
arr[y+1,x-1] + arr[y+1,x] + arr[y+1,x+1]
if num_neighbors == 3:
result[y,x] = 1
elif num_neighbors < 2 or num_neighbors > 3:
result[y,x] = 0
arr = np.random.randint(0,2,(700,480))
arr2 = arr.copy()
%timeit propagate_jit(arr, arr2)
# 333 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
People have been proposing similar algorithms for the last few decades. It's really become the "Free energy is being suppressed by the man!" of compression algorithms. One of the more popular is "just identify where your value is in pi!", which fails for that reason.
One problem is that in order to analyze such algorithms you really need to count your bits carefully. For instance in the pi case people sometimes think "well, I just need two numbers, the distance into pi and the length", but those "two numbers" may need arbitrarily unbounded numbers of bits to represent.
GoL has its own problem which is that if you construct a "random" state that didn't start "naturally" the odds of it being a Garden of Eden state are essentially 1. Can you even call it an "image compression" tool when it can't represent an image of any pure color at all, let alone in a compressed format?
Then, "referencing a region" of the state is basically the same as just giving the value. The nth value is just n itself.
So clearly that doesn't give you any advantage. Why would randomizing the data sequences make it any more efficient on average?
Although if you specifically choose the ordering of the sequences so that common sequences are easy to represent and uncommon sequences are hard to represent, then you basically have Huffman coding/arithmetic coding.
Note: I'm not in any way a compression expert, so while it may be obvious to you/others, it's a little interesting for me to think of compression in this way (sending parameters to a shared process, instead of sharing the raw data).
You may be wondering, why do we even do this? The answer is simple, "the image is encrypted, but everyone can see the penguin". In a less cryptic manner, encrypting blocks simply maps them from one space to another, but is a reversible operation and, as any function does, is deterministic and returns the exact same value every time. The consequence of this determinism, is that identical blocks will always produce the same value and thus, the content is still somewhat visible and in some cases it could leak information. By combining blocks together along with a PRNG, we hide the mapping by requiring that the recipient solves the first block before decrypting further data and we don't end up leaking information.
It would be an interesting experiment to implement lossless compression, but it likely wouldn't achieve high compression ratios or be nearly as efficient as a purpose built algorithm.
I also wonder how effective this would be as a compression algorithm (just for the fun of it). Are the black&white spots too noisily distributed to make it compressable, or better than the resulting image after running GoL?
Also interesting that so little data is needed for my brain to understand it's a picture of the David statue.
My guess is it would be the opposite of compression in many cases, due to the number of required cells you need to store. Using weird, creative ways of encoding data tends to be bad for compression, and in the very best case you'd have something that competes with JPEG but is much slower.
It's an enticing idea, just like that concept of encoding arbitrary data by looking for the exact same bit pattern in decimal places of Pi and then storing that position. But reality is often disappointing, and it doesn't really work because more often than not you'll need more space to store the decimal position than the original information.
And the HN discussion from the time: https://news.ycombinator.com/item?id=23095190
If that's the case, would a nice solution be to model it as a recurrent neural network, e.g. pytorch or jax, with a single convolutional layer, with a single kernel, which you don't backprop to, and then try to minimise to loss with reference to the target image, by backpropagating to the input image?
It then becomes highly analogous to Google's deepdream, but instead of optimising the image for the output class, you optimise it to the output image.
* If the transitions are too sharp it will behave like the stepwise, non-differentiable original
* If the transitions are too soft, the optimization will converge to some middle-state compromise that will not behave as desired when binarized
But that's just speculation. Also interesting to note that in the very relevant Kaggle competition , top solutions don't mention differentiable approaches (as far as I've seen, I admit I haven't looked too deeply).
Here's a video of it in action: https://youtu.be/xgAigVfpIYc
Has anyone run GOL on "random" noise images to see if they result in any decipherable output? :)