Nice! I'd suggest embedding the simulation in the blog. I had to scroll up and down for a while before finding a link to the actual simulation.
(You might want to pick a value that runs reasonably well on old phones, or have it adjust based on frame rate. Alternatively just put a some links at the top of the article.)
See https://ciechanow.ski/ (very popular on this website) for a world-class example of just how cool it is to embed simulations right in the article.
(Obligatory: back in my day, every website used to embed cool interactive stuff!)
--
Also, I think you can run a particle sim on GPU without WebGPU.
That's one of the best examples of an explanatory blog that I've ever seen.
I wish that this would become the standard of which information was shared - if it's worth sharing, it's worth making it easy to understand.
I have done a few blog posts with interactive doodads like this. It takes a lot (like really a _lot_) more time to do, but I think it's the right way to go. There is so much noise on the internet caused by people casting their 2 cents into the void.
Interactive thingywotsits may slow down individuals making posts, but there are a lot of individuals out there.
The "hard" way is often the simple way with these sort of things. What makes it easier is while building out your code, you make little pieces of UI to visualize what you're doing. Think of them like unit tests or test driven development. Then you can take those, clean them up a little and publish them.
p5.js is a great medium. I did a short series in this style - you can inspect it to see full source (non minified / obfuscated) with some comments here and there.
I do agree about embedding. I thought about embedding each version but was worried about having too many workers all going at once. I'll update the article to include the final version embedded at the end. Thanks for the feedback.
That blog is amazing. Each example is so polished. I love it.
edit: I tried adding an embedded version but the required headers didn't play well with other embeds. The older versions are all still stuck in codesandboxes.
Woah, it works with multiple fingers! This is wild for pure JS. Interestingly, more fingers means more lag, I guess more stuff being sent between threads.
Random question (genuine, I do not know if it's possible):
> I decided to have each particle be represented by 4 numbers an x, y, dx, and dy. These will each be 32-bit floating point numbers.
Would it be possible to encode this data into a single JS number (53-bit number, given that MAX_SAFE_INTEGER is 2^53 - 1 = 9,007,199,254,740,991). Or -3.4e38 to 3.4e38, which is the range of the Float32Array used in the blog.
For example, I understand for the screen position you might have a 1000x1000 canvas, which can be represented with 0-1,000,000 numbers. Even if we add 10 sub-pixel divisions, that's still 100,000,000, which still fits very comfortably within JS.
Similar for speed (dx, dy), I see you are doing "(Math.random()*2-1)*10" for calculating the speed, which should go from -10,+10 with arbitrary decimal accuracy, but I wonder if limiting it to 1 decimal would be enough for the simulation, which would be [-10.0, +10.0] and can also be converted to the -100,+100 range in integers. Which needs 10,000 numbers to represent all of the possible values.
If you put both of those together, that gives 10,000 * 100,000,000 = 1,000,000,000,000 (1T) numbers needed to represent the particles, which still fits within JS' MAX_SAFE_INTEGER. So it seems you might be able to fit all of the data for a single particle within a single MAX_SAFE_INTEGER or a single Float32Array element? Then you don't need the stride and can be a lot more sure about data consistency.
It might be that the encoding/decoding of the data into a single number is slower than the savings in memory and so it's totally not worth it though, which I don't know.
I also did some experimenting with number packing and ended up creating a QuickSet implementation[0]. However, it turned out that operating on TypedArrays proved more performant, which I settled on in the end. I've collected some related packages here:
Of note is FastIntSet, which uses the technique you described, but I think is only able to store 4 unsigned integers as one JS value (I might be wrong).
As you stated, Encoding/decoding would kill your performance.
Float16Array would immediately halve your memory requirements
Another possibility would be to have separate precision arrays.
eg. Float16Array for x,y and even Int8Array for dx/dy, but, in both cases you will would get some motion artifacts, especially for Int8 from the clamping and aliasing of dx/dy.
> Javascript does support an Atomics API but it uses promises which are gross. Eww sick.
With the exception of waitAsync[1], the Atomics APIs don't appear to use promises. I've used Atomics before and never needed to mess with any async/promise code. Is it using promises behind the scenes or is there something else I'm missing?
The videos look awesome but the "try it out here" codesandbox links don't work for me on MacOS Chrome desktop. I get 'Uncaught ReferenceError: SharedArrayBuffer is not defined' and some CORS errors: 'ERR_BLOCKED_BY_RESPONSE.NotSameOriginAfterDefaultedToSameOriginByCoep'.
You have to open the previews in a dedicated tab as codesandbox's inline editor blocks the header from being set. It also may get blocked if you are using a privacy focused browser.
I'll try to include embedded examples in the future.
The problem is that idiomatic JS and blazing fast JS are diametrically opposed to each other, in practice the latter is more like a bad C dialect. You're not allowed to allocate GC objects in fast JS but the language doesn't have good non-allocating alternatives. Nobody is actually going to make a complex JS app where all memory allocations are pointers into a giant ArrayBuffer, it's easier to just switch to WebAssembly at that point.
If JS had typed structs (like they have type arrays) it would definitely be more convenient.
However, that's not where the problem starts. A lot of web sites are slow because they simply run too much code that doesn't need running in the first place and allocates objects that don't need to be allocated.
We don't need lower level constructs if we can simply start by removing cruft and be more wary of adding it. Go back to KISS/YAGNI.
"Too bad we cant just rely on JS only and have to involve a bunch of DOM operations, which is usually the slow part of the UIs we create"
No? With WebGL and soon WebGPU, or in this case here with writing to a imagebuffer and just passing that to canvas, you don't have to use the DOM anymore since quite a while.
(but then you don't get all the nice things html offers, like displaying and styling text etc)
Great article and very relevant for me since I'm building a game in JavaScript based on "falling sand" physics, which is all about simulating massive amount of particles (think Noita meets Factorio - feel free to wishlist if you think it sounds interesting).
My custom engine is built on a very similar solution using SharedArrayBuffers but there are still many things in this article that I'm eager to try, so thanks!
It depends on the display type. When run on something with low per pixel lighting it can flicker a bit due to how quickly the average light changes frame to frame. Anything with local dim zones may struggle. I looked at ways to fix this but could not come up anything other than running a blur filter which ends up looking terrible.
About caches, the main important thing is to know they exist. Which you do know now :) The general idea of cache is exactly how he explains it in the article, and is useful to know about as a general concept. Note that the very hardware specific bit of info that the M1 chip has a "chungus big" cache is not mentioned until very late in the article, which I didn't know yet either.
I'm not super skilled at the chrome profiler either, it seems to be suited better for certain tasks than others, but I might just be doing it wrong ...
Love this. Enjoyed riding your train of thought from challenge conception through each performance pass to the final form. Surprisingly fun to play around with this sim too. Looking forward to more posts!
This was prototyped on codesandbox before they nuked their product. Each link goes to a specific version which you can test by running bun http.ts in the terminal which serves the content. I updated the article to include this info.
In the future I will keep everything self hosted to avoid this issue. I appreciate the patience.
there's also numpy and scipy in the webassembly python distro (pyodide). but the "kinda not" part more refers to first class scientific/numerical computing support. it's possible, but the libraries are all disjoint or are webassembly ports, etc.
(You might want to pick a value that runs reasonably well on old phones, or have it adjust based on frame rate. Alternatively just put a some links at the top of the article.)
See https://ciechanow.ski/ (very popular on this website) for a world-class example of just how cool it is to embed simulations right in the article.
(Obligatory: back in my day, every website used to embed cool interactive stuff!)
--
Also, I think you can run a particle sim on GPU without WebGPU.
e.g. https://news.ycombinator.com/item?id=19963640