If you want more performance, falling sand simulators can further be made parallel by implementing them using Margolus Neighbourhoods, as in Falling Turnip: https://github.com/tranma/falling-turnip
The idea is that a single iteration divides the world into 2x2 squares and then applies effects sequentially within each square, but not between the squares. This means each square can be processed independently. In the next iteration, the division into squares shifts right and down by one cell each direction. This does mean you need more steps than in a sequential implementation, but I found it to be quite a principled approach to parallelizing cellular automata when I first read about it. One interesting side effect of this design is that falling particles end up being separated by blank space, as shown here: https://futhark-lang.org/static/falling-sand-2016.12.04.webm I wonder if that is fixable. (Although at larger scale it's not particularly visible: https://sigkill.dk/junk/diving-beet.webm)
Another way to do this (albeit without simple support for fluids) is to use the Moore neighborhood and use a left and right "bias" to decide which direction a grain should fall towards if it can't fall straight down. This works pretty well as a shader, and has the same side effect with the horizontal lines.
I think the ultimate in performance and parallelism would be to run it on a GPU. As long as you are able to define your algorithm in such a way that the new value of each pixel can be calculated just from itself and its neighbours it will work very quickly.
For fun, I've written a Game Of Life implementation in GLSL and even though it is naively implemented it will run at thousands of iterations per second at 4K resolution on a 6 year old desktop gpu.
A GPU more likely would give performance that’s independent of the pattern and that has constant stepping time, but given the existence of Hashlife (https://en.wikipedia.org/wiki/Hashlife), I don’t think it’s a given that a GPU would provide the ultimate in performance.
If the 2x2 cell is moving horizontally and vertically by 1 block, then wouldn't that mean you miss the 2 corners you don't pass over for the cell you're currently calculating?
Unfortunately the demo video in the link you posted seems to be private, so I might be misunderstanding.
Yes, but the next iteration will take care of those interactions. You're not really "moving" anything; that was poor imagery on my part. The illustration on the Wikipedia article shows it well: https://en.wikipedia.org/wiki/Block_cellular_automaton
There are some diagonal interactions that are not directly expressible in this way, but they can propagate through the non-diagonal neighbours instead.
Do you know if this is a technique they're using in Noita? I've implemented falling sand simulators but I'm baffled how you can make a game like that performant.
Here's a talk from the creator of Noita about the underlying tech. From 9:20 he talks about their multithreading implementation. https://www.youtube.com/watch?v=prXuyMCgbTc
Typically a cellular automata simulation will have some edge condition like wrapping or mirroring an adjacent cell.
A nice optimization trick is to make the cell buffers 2 cells wider and taller (or two times whatever the neighborhood radius is), and then before each generation you update the "gutter" by copying just the wrapped (or mirrored) pixels. Then your run the rule on the inset rectangle, and the code (in the inner loop) doesn't have to do bounds checking, and can assume there's a valid cell to read in all directions. That saves a hell of a lot of tests and branches in the inner loop.
Also, you you can define the Margolus neighborhood in terms of the Moore neighborhood + horizontal phase (even/odd column x) + vertical phase (even/odd row y) + time phase (even/odd time). With that information the rule can tell if it's at an even or odd step, and which of the four squares of the grid it's in.
That's how the CAM6 worked in hardware: it used the x/y/time phases as additional bits to determine the Margolus neighborhood lookup table index.
Here's how my CAM6 emulator computes the CAM6-hardware-compatible Margolus lookup table index, based on the 9 Moore neighbors + phaseTime, phaseX, and phaseY:
When I made it with WebGL (with no clue what I was embarking on, I just wanted to try it), I ended up with 8 substeps. Looking at it now it confuses the hell out of me, but IIRC it had to do with with checking each cardinal direction both ways ("both ways" meaning both "do I want to swap positions with this other pixel", and "does this other pixel want to swap position with me").
it's not visualized in the video, but the pixels also have temperature, and materials a sort of stickyness (basically sand that takes longer to form into a pyramid), that's about it.
The Moveable Feast Machine is kind of like a cellular automata, but it has a larger neighborhood extending out in a radius of several cells, it has read/write access to every cell in that neighborhood (not just the center), and it's non-deterministic in the order the rules are applied to the cells. It's embarrassingly parallel because it executes randomly chosen non-overlapping neighborhood regions in parallel.
>the division into squares shifts right and down by one cell each direction.
The MFM non-deterministic equivalent of the Margolus neighborhood's alternating x,y offset is to randomly execute non-overlapping neighborhoods in parallel. That has the same effect of incrementally diffusing information evenly in all directions over time, just not perfectly, synchronously, symmetrically, and deterministically like the Margolus neighborhood does.
Moveable Feast Machine rules make it much easy to implement particle based simulations than with traditional cellular automata rules, which compute just the center cell. Instead, the MFM rules can simply "pick up and move" any number of particles within the wider neighborhood.
They can implement all kinds of interesting chemical and biological behaviors between locally interacting particles, like holding molecules of different particles together with atomic bonds, and constructing membranes like cell walls that separate and contain and transport other particles, and implementing magical computational "force fields" and "tractor beams" and "traffic lights" that route other particles around. They're even good for simulating computational DNA-like behaviors:
>Programming the Movable Feast Machine with λ-Codons
>λ-Codons provide a mechanism for describing arbitrary computations in the Movable Feast Machine (MFM). A collection of λ-Codon molecules describe the computation by a series of primitive functions. Evaluator particles carry along a stack of memory (which initially contains the input to the program) and visit the λ-Codons, which they interpret as functions and apply to their stacks. When the program completes, they transmute into output particles and carry the answer to the output terminals (left).
The Moveable Feast Machine is similar to cellular automata, but different in some important ways, that make it extremely robust and fault tolerant:
Robust programs running on massively parallel unreliable hardware can actually tolerate hardware failure and repair themselves. The Demon Hoard Sort algorithm is an inherently robust sorting algorithm for the Moveable Feast Machine.
The "Distributed City Generation" video demonstrates a Movable Feast Machine rule that builds a self-healing city that fills all available space with urban sprawl, with cars that drive between buildings, and city streets that adaptively learn how to route the cars to their nearest destinations, and the city even repairs itself after disasters!
Nice work, Jason. I enjoyed a lot a talk [1] at GDC by Petri Purho (Nolla Games and before that Kloonigames) that explains the "Falling Everything Engine" used in their game Noita (that I would love to play on my mac...). Highly recommended if you want to dive deeper into the topic.
Thank you! What an enjoyable talk! Adding solid bodies and Box2D physics into the mix would certainly complicate things. It does sound fun to work on...
One more step: self-organized criticality... At a 'critical angle', any additional grains of sand can trigger an avalanche. (Be careful though, this is a deeply fascinating rabbit hole!)
I love this. As a visual learner this helps get the ideas across extremely clearly. It’s a lot of fun, too.
As you noted at the bottom to reach out if we want to see more like this - yes please, I think that would be great.
Coincidentally I’ve been curious about writing something like this since playing a game with very primitive materials and physics with my son a few years ago. This made me realize we could actually build something quite easily, so thanks!
In the post you wrote, as some people already mentioned, the sand distributes first to one side and then the other. It would be cool to get something non-deterministic happening instead.
Performance improvements are always very interesting to me, regardless of the context. That’s a definite yes.
Otherwise I suppose other materials would be great. How could you make a vine grow? Or a tree? Far too complex? I’m not sure but I’d enjoy the hell out of learning how.
I see the final result here has a left bias, just press and hold to see it stacking towards left before right. So the next fix could be a random choice.
It does indeed! It's a great note. As you say, randomly choosing between left and right if they both exist would fix the bias.
Another issue is that it forms a perfect diagonal, which isn't the most pleasing to look at. It can be addressed by randomly allowing for stacks of two granules and allowing spreading further than just one position to the left or right.
I figured where I left it was appropriate for this post, but I'm happy to write follow-ups for other particle types and the kind of improvement you mentioned!
Choosing between left and right if they're both open fixes the bias when there is only a single pixel falling, but not when a circle is used. This is because the update function runs from right to left across the screen, so the falling pillar of sand behaves like a solid wall as the update has not had the chance to move the other side of the circle yet. The best example of this is dropping a pillar on the right slope of an existing cone, a solid line will form on the right side of the falling segment, but a slope will form against the left side of the pillar. When you try this with the left side of a cone, the behavior is correct.
One fix I found is relatively straightforward:
class FallingGrid2 extends FallingGrid {
// Draw from the end of the list to the beginning
update() {
+ const direction = Math.random() > 0.5 ? 'ltr' : 'rtl';
for (let i = this.grid.length - this.width - 1; i > 0; i--) {
- this.updatePixel(i);
+ if (direction === 'rtl') {
+ this.updatePixel(i);
+ } else {
+ const offset = i % this.width;
+ const flipped = this.width - offset;
+ this.updatePixel(i - offset + flipped);
+ }
}
}
}
Cheers for making the source so easily hackable :)
Interesting, I solved it differently, by just randomly selecting the order in which belowLeft and belowRight are checked, inside the updatePixel function:
updatePixel(i) {
const below = i + this.width;
const belowLeft = below - 1;
const belowRight = below + 1;
// flag to indicate if we checked the left side already
let checkedLeft = false;
if (this.isEmpty(below)) {
swap(i, below);
// check the left side first with 50 % prob (ltr)
} else if (Math.random() <= 0.5 && this.isEmpty(belowLeft)) {
swap(i, belowLeft);
// indicate we checked the left side already
checkedLeft = true;
// check the right side first with 50 % prob
} else if (this.isEmpty(belowRight)) {
swap(i, belowRight);
// if we haven't checked the left side earlier, check it now (rtl)
} else if (!checkedLeft && window.sandgrid[belowLeft] == 1)
swap(i, belowLeft);
}
I may be testing wrong, but I believe that approach causes dropping a steady stream of sand on the right side of a cone to produce very strange falling behaviour as compared to the left, because the stream is more than one pixel wide. Looks like you changed some other bits around so you may have addressed this some other way.
Ah! I see now! Yes, we should alternate the direction of the rowwise pass, or pick randomly, as you demonstrate. This would be a great step to start a next post with.
It's true! This has to do with how I wrote the "belowLeft" and "belowRight" variables. I don't ensure they are in the same row. You can do this by taking a pair of indices, dividing by width and flooring and checking for equality.
It's because the left pixel is checked before the right pixel (the settling behaviour section). This is common in games where you handle movement input (left, right arrow keys) - if you press both of them you'll always go one way, because that way is checked first for movement.
Enjoyable demo, but Chrome is suddenly taking high CPU when dropping sand and my laptop is sounding like a plane about to take off. Why are canvas demos always so inefficient in the browser?
These are usually iterating over a screen-sized buffer every frame. There's no nice way to slice that, just a bunch of work for the machine to constantly do.
EDIT: I'm out of quota. For games like this it's pretty hard. You can move it into a fragment shader or maybe mark certain rectangles in the canvas as "empty" (there's a very fast algorithm for cellular automata which pretty much has the same issue.) There are no "easy" solutions though.
I was thinking - maybe JS is not the way to do this? Maybe to write similar things one should do it in WebAssembly. Thoughts? I mean, even with paused simulations, I don't see why an i7 CPU should be working at 100% just to draw something on a tiny window in a browser. It seems like a flaw in canvas and JS to me.
Just saw a familiar word there that feels like a hundred years old: Fidonet. A 1984 BBS network, it seems. I only got to experience the final years of BBSes but it feels very nostalgic still.
P5 is such a fun library.
I'm constantly impressed at how quickly you can get an idea from paper to screen with it. I can't speak to it's viability in production but for research, prototype and just fun there are few alternatives out now.
It reminds me a bit of old basic on Amiga/commodore systems where getting something drawn on the screen and making a beep was trivial and built into the environment.
In 1990s we were writing a virus at school which applied this algorithm to all characters on the DOS screen after some period of inactivity. Destroying Norton Commander on a computer used by a teacher or sysadmin was quite fun (the virus had to be manually installed, did not survive reboot and screen was restored on keypress event, so it was quite harmless). Nobody punished us for this kind of jokes.
Love this. I also was wondering how to do this cleanly, and in my mind it seemed to make sense to apply the rules to the sand, but onto a blank canvas so as not to have the sand interfere with the sand below.
Doing it from the bottom up solves this problem, however I think in a general sense (eg. with different sources of gravity it would become more tricky!
I presume the answer is that we need to iterate the other direction but what I'm wondering is how we can mix these. Do we need to iterate the grid twice?
This was great. Before Powder Game, there were several games in this genre, and a very active Falling Sand Game BBS forum. I got my start in programming by writing mods for one of those games, Burning Sand. I wonder if anyone from those forums is around here...
I was, and my story was very very similar to yours!
I remember fighting burning sand 2 triggers to make a little cursor that would follow your mouse and totally failing at it.
And writing my own little planetary simulator- not one of the popular ones but I gave it my best shot.
I'd love to see a new "game engine" like BS2, but modernized and cleaned up and easier to code on. Powder toy is awesome but not nearly as hackable and isn't super performant.
>The Abelian sandpile model (ASM) is the more popular name of the original Bak–Tang–Wiesenfeld model (BTW). BTW model was the first discovered example of a dynamical system displaying self-organized criticality. It was introduced by Per Bak, Chao Tang and Kurt Wiesenfeld in a 1987 paper.
>Three years later Deepak Dhar invented that the BTW sandpile model indeed follows the abelian dynamics and therefore referred to this model as the Abelian sandpile model.
>The model is a cellular automaton. In its original formulation, each site on a finite grid has an associated value that corresponds to the slope of the pile. This slope builds up as "grains of sand" (or "chips") are randomly placed onto the pile, until the slope exceeds a specific threshold value at which time that site collapses transferring sand into the adjacent sites, increasing their slope. Bak, Tang, and Wiesenfeld considered process of successive random placement of sand grains on the grid; each such placement of sand at a particular site may have no effect, or it may cause a cascading reaction that will affect many sites.
>Dhar has shown that the final stable sandpile configuration after the avalanche is terminated, is independent of the precise sequence of topplings that is followed during the avalanche. As a direct consequence of this fact, it is shown that if two sand grains are added to the stable configuration in two different orders, e.g., first at site A and then at site B, and first at B and then at A, the final stable configuration of sand grains turns out to be exactly the same. When a sand grain is added to a stable sandpile configuration, it results in an avalanche which finally stops leading to another stable configuration. Dhar proposed that the addition of a sand grain can be looked upon as an operator, when it acts on one stable configuration, it produces another stable configuration. Dhar showed that all such addition operators form an abelian group, hence the name Abelian sandpile model.
>The model has since been studied on the infinite lattice, on other (non-square) lattices, and on arbitrary graphs (including directed multigraphs). It is closely related to the dollar game, a variant of the chip-firing game introduced by Biggs.
>Luis David Garcia-Puente discusses sandpiles, and how they produce amazing "fractal zeroes". Dr Garcia-Puente is an associate professor at Sam Houston State University and was interviewed while attending an MSRI-UP summer program.
The idea is that a single iteration divides the world into 2x2 squares and then applies effects sequentially within each square, but not between the squares. This means each square can be processed independently. In the next iteration, the division into squares shifts right and down by one cell each direction. This does mean you need more steps than in a sequential implementation, but I found it to be quite a principled approach to parallelizing cellular automata when I first read about it. One interesting side effect of this design is that falling particles end up being separated by blank space, as shown here: https://futhark-lang.org/static/falling-sand-2016.12.04.webm I wonder if that is fixable. (Although at larger scale it's not particularly visible: https://sigkill.dk/junk/diving-beet.webm)