Therein lies SIMDs advantage. The instruction pointer is all synchronized, provi...

titzer · 2024-12-20T15:21:57 1734708117

SIMD and thread-level parallelism solve completely different problems.

dragontamer · 2024-12-20T15:44:59 1734709499

Ish.

We use threads to solve all kinds of things, including 'More Compute'.

SIMD is limited to 'More Compute' (unable to process I/O like sockets concurrently, or other such thread patterns). But as it turns out, more compute is a problem that many programmers are still interested in.

Similarly, you can use Async patterns for the I/O problem (which seems to be more efficient anyway than threads).

--------

So when we think about a 2024 style program, you'd have SIMD for compute limited problems (Neural Nets, Matricies, Raytracing). Then Async for Sockets, I/O, etc. etc.

Which puts traditional threads in this weird jack of trades position: not as good as SIMD methods for raw compute. Not as good as Async for I/O. But threads do both.

Fortunately, there seem to be problems with both a lot of I/O and a lot of compute involved simultaneously.

titzer · 2024-12-20T16:00:03 1734710403

It's not just I/O, it's data pipelining. Threads can be used to do a lot of different kinds of compute in parallel. For example, one could pipeline a multi-step computation, like a compiler: make one thread for parsing, one for typechecking, one for optimizing, and one for codegening, and then have function move as work packages between threads. Or, one could have many threads doing each stage in serial for different functions in parallel. Threads give programmers the flexibility to do a wide variety of parallel processing (and sometimes even get it right).

IMHO the jury is still out on whether async I/O is worth it, either in terms of performance or the potential complexity that applications might incur in trying to do it via callback hell. Many programmers find synchronous I/O to be a really, really intuitive programming model, and the lowest levels of the software stack (i.e. syscalls) are almost always synchronous.

vacuity · 2024-12-20T20:22:34 1734726154

The ability to directly program for asynchronous phenomena is definitely worth it[0]. Something like scheduler activations, which imbues this into the threading interface, is just better than either construct without the other. The main downside is complexity; I think we will continuously improve on this but it will always be more complex than the inevitably-less-nimble synchronous version. Still, we got io_uring for a reason.

[0] https://news.ycombinator.com/item?id=42221316

dragontamer · 2024-12-20T17:37:58 1734716278

Fair. It's not like GPUs are entirely SIMD (and as I said in a sibling post, I agree that GPUs have substantial traditional threads involved).

-------

But let's zoom into Raytracing for a minute. Intel's Raytrace (and indeed, the DirectX model of Raytracing) is for Ray Dispatches to be consolidated in rather intricate ways.

Intel will literally move the stack between SIMD lanes, consolidating rays into shared misses and shared hits (to minimize branch divergence).

There's some new techniques being presented here in today's SIMD models that cannot easily be described by the traditional threading models.

CyberDildonics · 2024-12-20T15:52:00 1734709920

Optimizing isn't either SIMD or large scale multi-threading. You need both, which is why CPUs and GPUs both use both techniques.

dragontamer · 2024-12-20T17:33:22 1734716002

Fair enough.

GPUs in particular have a very hyperthread/SMT like model where multiple true threads (aka instruction pointers) are juggled while waiting for RAM to respond.

Still, the intermediate organizational step where SIMD gives you a simpler form of parallelism is underrated and understudied IMO.