"but compiling "high-level" (i.e. C-level or above) code into SIMD is an open research problem (AFAIK)"
In general, "taking a language never written for X and applying X on it" is an open problem. See also, for instance, trying to statically type a program in a dynamically-typed language. Of course it's hard to statically type a language that was designed to be dynamically typed. Of course it's hard to take C and make it do SIMD operations. You'd have to write a higher-level language designed to do SIMD from the beginning, if you really want it to be slick.
"C matches NUMA and caching just as well as assembly does."
I'm going to make a slightly different point, which is that C does not support caching as well as a language could. For one particular example, C still lays structs out as rows, and has no support for laying them out in columns. (It doesn't stop you, but you're going to be rewriting a lot of code if you change your mind later. Or doing some really funky things with macros, which is also rewriting lots of code, too.)
Of course C doesn't support this crazy optimization... when it was written the order-of-magnitude difference between CPUs and memory was much smaller, indeed outright nonexistent on some architectures of the time (though I can't promise C was on them). Computers changed.
(I'll give another idea that may or may not work: Creating a struct with built-in accessors designed to mediate between the in-RAM representation and the program interface, which the compiler will analyze and determine whether to inline into the memory struct or not. For instance, suppose you have two enumerations in your struct, one with 9 values and one with 28. There's 252 possible combinations, which could be packed into one byte, but naively, languages won't do that. This language could transparently decide whether to compute the values upon access, or unpack them into two bytes.)
Of course, virtually nothing does support this sort of thing, which is why I said C isn't really uniquely badly off... almost no current high-level languages are written for the machines we actually have today. (I'm hedging. I don't know of any that truly are, but maybe one could argue there's some scientific languages that are.) C is the water we all swim through, even if we hardly touch it directly, and ever since GPUs become standard-issue the mismatch between our programming languages and our hardware has become comically large.
Assembly supports everything, but by so doing so, supports nothing. It will of course permit you to lay your structs out in rows or columns or anything in between, but it doesn't particularly support any of them, either.
Incidentally, unlike some of my age, I don't actually complain about this; it is what it is, there are reasons for it, and what we have in hand is still pretty darned powerful. But, again, I'd suggest that if you are a language designer and you're looking to write the Next Big Thing, you could do worse than figure out how to start integrating this stuff into a programming language that can cache smarter and use the GPU in some sensible manner and all these other things. It's one way you might actually be able to create a language that will not merely tie C, but straight-up beat it.
I don't know of any way to compile non-array-ish, non-explicitly-SIMD-ish code into efficient SIMD code. Thinking about it, this seems to be the general case with parallelism - you can take fairly naïvely-written code, maybe add some memoization and hash-table-dictionaries, and it turns into not-too-inefficient sequential code, but I don't know of a way to do an equivalent with parallel code (especially if you add the memoization) - STM promises, but (AFAIK) still doesn't deliver.
Of course, the best way to have an SoA representation is to use an ECS :-).
C's main claim to fame is being "an HLL" (in the classical sense) but still being able to do essentially everything Assembly does. Also, having relatively-simple semantics surely helps (C is the only imperative language to have completely formalized semantics that I know of).
@qznc:
I don't know of a way to do it natively in Rust (you must write accessors).
"I don't know of any way to compile non-array-ish, non-explicitly-SIMD-ish code into efficient SIMD code."
Which is why I'm proposing that somebody creating one might get some traction, after all...
"C's main claim to fame is being "an HLL" (in the classical sense) but still being able to do essentially everything Assembly does."
And I discussed at length the things that assembly can do today that C can not, without callouts to assembler. I mean, I know the party line, I've heard it for like twenty years now, and my entire point is that it's not true anymore. C is not a "high level assembler" for a 2015 machine. It's a high-level assembler for a PDP-11. Which is still useful enough, thanks to backwards compatibility, but it's high time for it to get out of the way and stop being "the high level assembler", just like it's high time for it to get out of the way and stop being "the systems language".
C certainly can do SIMD just as well as assembly (via intrinsics). It does not allow high-level code to be compiled to SIMD, but there is no known way to compile high-level general-purpose code to SIMD. Finding such a method is an Open Research Problem AFAIK (of course, solving it would be Very Welcome).
It's possible to do manually in all those languages, but I'm not sure it can be done without programmer intervention: C++ and Rust both allow interior pointers/references to point to fields, which inhibits automatic application of many of the craziest layout changes.
I had to add that naively, languages won't do that, because of course there's no trick to it otherwise. Even Javascript can do it, it just won't be any faster. It's a simple example to fit a simple paragraph of text. One could imagine other possible optimizations to harness the fact that it's faster to do work on what you have than to pull more stuff in from RAM, like perhaps a string type that transparently compresses itself with a fixed dictionary (or optional dictionary) or something depending on runtime performance heuristics.
Of course anything I say is possible in an existing language, with enough work, enough assembler, and enough compromises, but it's not what languages are based around.
In general, "taking a language never written for X and applying X on it" is an open problem. See also, for instance, trying to statically type a program in a dynamically-typed language. Of course it's hard to statically type a language that was designed to be dynamically typed. Of course it's hard to take C and make it do SIMD operations. You'd have to write a higher-level language designed to do SIMD from the beginning, if you really want it to be slick.
"C matches NUMA and caching just as well as assembly does."
I'm going to make a slightly different point, which is that C does not support caching as well as a language could. For one particular example, C still lays structs out as rows, and has no support for laying them out in columns. (It doesn't stop you, but you're going to be rewriting a lot of code if you change your mind later. Or doing some really funky things with macros, which is also rewriting lots of code, too.)
Of course C doesn't support this crazy optimization... when it was written the order-of-magnitude difference between CPUs and memory was much smaller, indeed outright nonexistent on some architectures of the time (though I can't promise C was on them). Computers changed.
(I'll give another idea that may or may not work: Creating a struct with built-in accessors designed to mediate between the in-RAM representation and the program interface, which the compiler will analyze and determine whether to inline into the memory struct or not. For instance, suppose you have two enumerations in your struct, one with 9 values and one with 28. There's 252 possible combinations, which could be packed into one byte, but naively, languages won't do that. This language could transparently decide whether to compute the values upon access, or unpack them into two bytes.)
Of course, virtually nothing does support this sort of thing, which is why I said C isn't really uniquely badly off... almost no current high-level languages are written for the machines we actually have today. (I'm hedging. I don't know of any that truly are, but maybe one could argue there's some scientific languages that are.) C is the water we all swim through, even if we hardly touch it directly, and ever since GPUs become standard-issue the mismatch between our programming languages and our hardware has become comically large.
Assembly supports everything, but by so doing so, supports nothing. It will of course permit you to lay your structs out in rows or columns or anything in between, but it doesn't particularly support any of them, either.
Incidentally, unlike some of my age, I don't actually complain about this; it is what it is, there are reasons for it, and what we have in hand is still pretty darned powerful. But, again, I'd suggest that if you are a language designer and you're looking to write the Next Big Thing, you could do worse than figure out how to start integrating this stuff into a programming language that can cache smarter and use the GPU in some sensible manner and all these other things. It's one way you might actually be able to create a language that will not merely tie C, but straight-up beat it.