Hacker News new | past | comments | ask | show | jobs | submit login
Finding Mona Lisa in the Game of Life (avinayak.github.io)
408 points by mseri 42 days ago | hide | past | favorite | 57 comments



Kaggle recently hosted a very relevant competition - "Conway's Reverse Game of Life" [0]. A write-up of the 1st place might be an interesting read [1].

[0] https://www.kaggle.com/c/conways-reverse-game-of-life-2020/o...

[1] https://www.kaggle.com/c/conways-reverse-game-of-life-2020/d...


Very nice! Some states have multiple ancestors, some have none, so these are "dead ends" when going backwards. But when you are just looking for an approximation of the final state, as in TFA, you can probably accept quite some perturbation, which could make it significantly easier to go backwards in a naive way and sometimes miss a few cells in the process.


> Some states have multiple ancestors, some have none

Yeah, this is what I didn't understand about that particular Kaggle competition. It doesn't seem possible that you could learn some general set of rules that would allow you to predict previous states with much accuracy.


Yeah, I think it's very interesting because there's a lot of freedom in measuring the similarity and also other aspects of the process, for example optimizing for a "simple" starting configuration as mentioned in other comments.


I read the writeup where someone applied a genetic algorithm to this problem and I guess what I found disappointing was that it wasn't a general solution that could learn any general cases or rules, it was just very specific for each example provided. It's not like you could train it on a set of examples and then give it a new example and it would be able to do it any quicker or more accurately.


Well, that's sort of the case for genetic algorithms.

You might be aware of genetic programming? There may be papers already out there, but the idea is representing a programs source as a genome, and then performing the familiar processes (mutation, crossover, selection) on those.

Anyhow, that would produce something more general. A genetic algorithm is traditionally a single shot thing.


> You might be aware of genetic programming?

Actually been playing with Cartesian Genetic Programming (CGP) recently.

> Anyhow, that would produce something more general.

Not really, at least as far as I can tell with CGP. I was evolving a circuit to solve the 8-bit parity problem (kind of the 'Hello world' of CGP - it solves this problem surprisingly quickly). There wasn't any way to use that information to evolve a circuit to solve the 16-bit parity problem - you've gotta start again from scratch. There may be some proposed solutions to this kind of issue in CGP - I'm still reading papers.


Ah, I see. I guess I meant general in the parameterization sense, not necessarily in the abstraction sense. Sorry, that could've been more clear.

I've not dug into CGP yet, though I've read mentions of it. Would you perhaps have a good resource for learning? I'm a little weak on the math side.


> I'm a little weak on the math side.

In general, I think that tends to one of the advantages of genetic algorithm/programming over neural nets/deep learning - the math is comparatively quite easy.

This is the paper that introduced CGP by Julian Miller: https://thelackthereof.org/docs/library/cs/gp/Miller,%20Juli...

I found this good survey of CGP here (from 2018): https://link.springer.com/content/pdf/10.1007/s10710-019-093...


I did the same thing, but with gradient descent. You can create a soft version of the game of life, that is differentiable. Here is my messy collab notebook: [1]

[1] https://colab.research.google.com/drive/12CO3Y0JgCd3DVnQeNSB...

Edit: only 1 step though, not 4, as in the OP. I couldn't get my differentiable version to converge more than 1 step into the past.


See also, a post from mid-2020 that does something similar with a "softened" Life: http://hardmath123.github.io/conways-gradient.html


That's a really nice write up. It's insane how similar our approaches are.

Could it be a case of [1] (but on a non grand scale) :P? I can list my sources of inspiration: [2] [3] [4].

I also tried training convolutional networks, using the soft life set-up, but failed to get them to converge.

[1] https://en.wikipedia.org/wiki/Multiple_discovery

[2] https://kevingal.com/blog/mona-lisa-gol.html

[3] https://arxiv.org/abs/1910.00935

[4] https://nicholasrui.com/2017/12/18/convolutions-and-the-game...


> I also tried training convolutional networks, using the soft life set-up, but failed to get them to converge.

Do you have any idea why that might be? It seems like convolution would be a natural for this problem.


I didn't work on it long enough to be able to draw any conclusions, but I can speculate.

I had the gradients going through the soft life approximation (i.e. it was part of the model), rather than simply training a normal cnn with life boards as the inputs and outputs. But I think the approximation may not have good enough gradient signals.


Note, skip to the bottom to see the resulting plots.

Here gradient descent is used to try and predict random game of life games [1]

[1] https://colab.research.google.com/drive/1NKWRarxM-ar18x1ON71...


> Running ~1000 iterations for a 483px wide Mona Lisa on the google colab GPU runtime only takes around 40 seconds!. Compared to the CPU version which takes several hours to do the same for a smaller image

Isn't this excruciatingly slow? I remember my old 486 computer did run life at full-screen full-resolution in realtime (probably 25fps at 800x600 ?) How it has come to that after 20 years? How can a 20 year old CPU outperform a modern GPU by one order of magnitude? Is it all fault of lots of useless intermediary language layers?


Regarding the speed of GoL itself:

The JAX implementation of GoL he used should be able to do 10 billion cells / second

http://www.bnikolic.co.uk/blog/python/jax/2020/04/19/game-of...

> Speed of execution

> Looking into speed of execution was not a primary goal, but in case people are interested initial: study suggests this code will do about 10 billion cells / second on the Google Colab GPU runtime.

For comparision this simple numba optimized GoL propagation function achieves roughly one billion cells per second on CPU (i7-9700K):

    @numba.jit
    def propagate_jit(arr, result):
        h, w = arr.shape

        for y in range(1,h-1):
            for x in range(1,w-1):
                result[y,x] = arr[y,x]
                num_neighbors = (
                    arr[y-1,x-1] + arr[y-1,x] + arr[y-1,x+1] +
                    arr[y,x-1] + arr[y,x] + arr[y,x] +
                    arr[y+1,x-1] + arr[y+1,x] + arr[y+1,x+1]
                )
                if num_neighbors == 3:
                    result[y,x] = 1
                elif num_neighbors < 2 or num_neighbors > 3:
                    result[y,x] = 0

    arr = np.random.randint(0,2,(700,480))
    arr2 = arr.copy()
    %timeit propagate_jit(arr, arr2)
    # 333 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    700*480*1e6/333
    # 1009009009.009009
Btw: The not numba optimized implementation will run in 806ms on my system. The optimization gives a speedup of over 2400.


Not entirely sure what @numba.jit is doing here as I haven't used numba - is it compiling to native CPU code? Another approach to speed up GoL would be to use a lookup table. 2^9 entries. It's kind of like having the sum pre-calculated. Using the lookup table approach with @numba.jit would probably be even faster yet.


That's not 1000 iterations of GoL. That's a thousand iterations of trying to find the best seed.


I stand corrected. The numbers were too way off to make sense at all!


probably 99% of time shuffling data around, 1% of time calculating


The parenthetical "(Consider a loaf of bread, with each slice being dithered Mona Lisa)." is one of my favorites of any technical article


As a next step: how simple can you make the initial state. Can you have something that occupies much less space, but then grows to something that resembles the target image.


Isn't this the goal of an image compression tool? Can we have efficient lossless compression with GoL? That would work with any data. Would it be better than bzip2 or png?


The problem in general with this approach is that the amount of bits it takes to identify the correct decoded value exceeds the amount of bits it takes to simply write down the correct encoded value.

People have been proposing similar algorithms for the last few decades. It's really become the "Free energy is being suppressed by the man!" of compression algorithms. One of the more popular is "just identify where your value is in pi!", which fails for that reason.

One problem is that in order to analyze such algorithms you really need to count your bits carefully. For instance in the pi case people sometimes think "well, I just need two numbers, the distance into pi and the length", but those "two numbers" may need arbitrarily unbounded numbers of bits to represent.

GoL has its own problem which is that if you construct a "random" state that didn't start "naturally" the odds of it being a Garden of Eden state are essentially 1. Can you even call it an "image compression" tool when it can't represent an image of any pure color at all, let alone in a compressed format?


Exactly, I know first hand that it's easy to believe in such admittedly creative ideas. But thinking it through was fun and cleared up my misunderstanding; in the end the information itself cannot be compressed, one can only reduce redundancies. And an arbitrarily chosen algorithm tends to do a really bad job with that on average.


Haha that's really creative though. Makes me wonder if there is anything useful "stored" in pi that could be represented with, say, 2 64-bit integers.


Further "compression" -- you only want the index of the first occurrence of a given string. Thus, you create a table of offsets for each string length, where index 0 is the first unique string of length n, index 1 is the second, etc. This allows you to elide the issue that indices can be arbitrarily large. Of course, the table is hard to compute for large n, and the resulting "compression" has indices at worst the size of the original data... so it's gonna be super efficient at compressing the bits of pi, and no worse than unity on average.


I've often thought of that, a compression thing that says "start with this small board state, run for so many generations, go sample this area of the board, and maybe apply this run-length-encoded binary patch" but computing that initial state is, uhhhhh, a lot. A huge amount. Too freaky. But maybe there's a cellular automaton that is easier to run in reverse to find such a thing? This becomes a question for the real computer scientists.


This is a fascinating idea. Using some kind of sufficiently chaotic (and deterministic) process shared among two people, one could simply reference a time and region of the state of that process that contains the data to be sent.


Imagine the "sufficiently chaotic process" is simply listing out all the possible data sequences in order, for example 0 1 10 11 100 etc.

Then, "referencing a region" of the state is basically the same as just giving the value. The nth value is just n itself.

So clearly that doesn't give you any advantage. Why would randomizing the data sequences make it any more efficient on average?

Although if you specifically choose the ordering of the sequences so that common sequences are easy to represent and uncommon sequences are hard to represent, then you basically have Huffman coding/arithmetic coding.


Because you can bias the process to be more likely to represent data relevant to your domain.

Note: I'm not in any way a compression expert, so while it may be obvious to you/others, it's a little interesting for me to think of compression in this way (sending parameters to a shared process, instead of sharing the raw data).


This is part of modern encryption. To be more specific, we have blocks and a deterministic cryptographically secure rng, we seed the rng with a public key that is broadcasted, we then encrypt blocks and then combine them together with the number produced by the rng and create the next block of encrypted data.

You may be wondering, why do we even do this? The answer is simple, "the image is encrypted, but everyone can see the penguin". In a less cryptic manner, encrypting blocks simply maps them from one space to another, but is a reversible operation and, as any function does, is deterministic and returns the exact same value every time. The consequence of this determinism, is that identical blocks will always produce the same value and thus, the content is still somewhat visible and in some cases it could leak information. By combining blocks together along with a PRNG, we hide the mapping by requiring that the recipient solves the first block before decrypting further data and we don't end up leaking information.


This is same as giving starting digit in π, right? Not usable for compression, and encryption folks would file it under security-by-obscurity.


This is I think what cryptographic hashes are meant to achieve.


The approach described here is very lossy, and very inefficient since you have to search a huge number of permutations to find reasonable results.

It would be an interesting experiment to implement lossless compression, but it likely wouldn't achieve high compression ratios or be nearly as efficient as a purpose built algorithm.


This comment may be too meta but this post just made me appreciate Hacker News so much! Classic creative hacking guided by nothing but pure curiosity. Love it!


A similar post can be found here (2020, implemented with backsearch):

https://news.ycombinator.com/item?id=22552006

https://kevingal.com/blog/mona-lisa-gol.html


Gotta be used for some capture-the-flag or so, where after running it a clue to the next step is revealed.

I also wonder how effective this would be as a compression algorithm (just for the fun of it). Are the black&white spots too noisily distributed to make it compressable, or better than the resulting image after running GoL?

Also interesting that so little data is needed for my brain to understand it's a picture of the David statue.


> I also wonder how effective this would be as a compression algorithm

My guess is it would be the opposite of compression in many cases, due to the number of required cells you need to store. Using weird, creative ways of encoding data tends to be bad for compression, and in the very best case you'd have something that competes with JPEG but is much slower.

It's an enticing idea, just like that concept of encoding arbitrary data by looking for the exact same bit pattern in decimal places of Pi and then storing that position. But reality is often disappointing, and it doesn't really work because more often than not you'll need more space to store the decimal position than the original information.


Similar post from mid-2020, using PyTorch instead of JAX, and using a "continuous" (gradient-based) hill-climbing: http://hardmath123.github.io/conways-gradient.html

And the HN discussion from the time: https://news.ycombinator.com/item?id=23095190


So my understanding is that the Game of Life is an undecidable system, so you basically can't write a solvable system of equations that will tell you that your initial state will produce a Mona Lisa. Even after reading the article I don't really understand what he's doing to make this happen. And especially after stating that most states are Garden of Eden states! Can anyone ELI5?


It's a decidable problem to evolve the Game of Life fowrard a fixed number of generations. The undecidable thing is to determine the eventual fate of a pattern, e.g. 'does it ever die out completely?'. Even then you can answer this question for large classes of patterns, just not all of them.


Moreover, it is just like any other code you write. It's undecidable whether a function you write will ever stop in the general case, but I'm pretty sure you are able to easily prove most of your functions are going to finish.


Okay well you can 'simulate' the GoL forward and see what it's going to be in a few fixed number of generations. That's basically how you do anything with these undecidable systems. I'm not clear on how he got some state that looks like a Mona Lisa when most or almost all states that look like a Mona Lisa are Garden of Eden states, and then from there worked backwards (I'm not clear on if/how you can go backwards in GoL).


For a similar challenge: https://en.wikipedia.org/wiki/Proof_game


My understanding is Conway's game of life can be done by simply convolving a 3x3 array of all ones (except a central 0) through every pixel, and then looking at the results of that convolution to see if the pixel is dead or alive, e.g. https://stackoverflow.com/questions/46196346/why-does-my-gam...

If that's the case, would a nice solution be to model it as a recurrent neural network, e.g. pytorch or jax, with a single convolutional layer, with a single kernel, which you don't backprop to, and then try to minimise to loss with reference to the target image, by backpropagating to the input image?

It then becomes highly analogous to Google's deepdream, but instead of optimising the image for the output class, you optimise it to the output image.


That's an interesting approach. The problem is that the function to be applied after the convolution is binary, hence not differentiable. You could replace it with a soft variation, my (wild) guess is it would still be difficult to converge:

* If the transitions are too sharp it will behave like the stepwise, non-differentiable original

* If the transitions are too soft, the optimization will converge to some middle-state compromise that will not behave as desired when binarized

But that's just speculation. Also interesting to note that in the very relevant Kaggle competition [0], top solutions don't mention differentiable approaches (as far as I've seen, I admit I haven't looked too deeply).

[0] https://www.kaggle.com/c/conways-reverse-game-of-life-2020/d...


What you are suggesting is the idea behind StyleTransfer. Given two images, A, B, and a neural network F, you repeatedly modify B using ∂ L(F(A),F(B))/∂B, where L is the loss function, A contains the style and B is the input image.


It never fails to amaze me what an infinite number of monkeys with keyboards are able to achieve if you give them enough time :)


Did you consider something like the following to speed up the calculation? This method can go forward _really_ fast. Not sur e if it can also go backward...

https://pzemtsov.github.io/2015/04/24/game-of-life-hash-tabl...


In fact I wanted to point to the following... Googling 'hash' put me to the wrong page.

https://en.wikipedia.org/wiki/Hashlife


Some years ago I did a similar thing using genetic algorithm. I was researching it's use in generative art.

Here's a video of it in action: https://youtu.be/xgAigVfpIYc


I suspect slightly better (or faster) results would be achieved if instead of comparing against a specific dithering pattern nominated by the author, they instead compared downscaled candidate patterns against the greyscale target image.


This could be useful using least significant bit steganography to embed the start state and maybe a QR code as the end state, or something else able to withstand the noisy output.



A very interesting form of steganography.

Has anyone run GOL on "random" noise images to see if they result in any decipherable output? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: