AoS is useful for things where you want to effectively have fast (e.g. no cache misses) random access to many fields; the common OO approach. SoA is useful when you want fast iteration across only a couple fields of many objects; the common simulation loop approach. Effectively what Jai allows is deciding which to use when you declare a chunk of memory.
A lot of concerns I have can all theoretically be fixed in user space, on the other hand the same can be said for any language, and hence sort of defeats the purpose of a keyword. The point of the keyword is that it can inspect the fields of a type and decide how many arrays of what size to allocate. Here are some issues which would be annoying to implement in user space with AoS/SoA being merely an orthogonal keyword (e.g. limited operator overloading and without Turing Complete templates or macros):
* Alignment and padding so certain fields can be loaded into specific registers. This effects the layout of memory differently in each mode.
* Multiple buffers for some fields. A heavily used field (say position) could have a number of buffers which are rotated in SoA mode (e.g. render-posistions, last-frame-step, next-frame-step; allowing for reader/writer systems), and are simply an array of fields in AoS mode. Where as other fields retain their singular arity.
* Concurrency concerns. Partitioning arrays in SoA so cores don't fight over the lower caches. Aligning AoS on cache lines so whole objects can be locked at once without causing cache problems with neighboring objects.
* Sparse fields. If a field is generally empty, space is still allocated for it (e.g. in SoA using a dictionary rather than an array), or a cache miss due to usage of a pointer.
* But to answer the actual question it comes down to data structures: If I wanted to iterate across objects in an arbitrary quad-tree node and it's children, there are efficient ways to layout memory so that that and similar useful operations are fast (e.g. minimal cache misses). Yes I could put the data-structure in a different array but then I am looking up values in the data array randomly (e.g. cache misses). Many useful data structures have efficient systems for laying out their values and their structure in the same memory.
To elaborate on the last one more, maybe I have a structure that has 4 fields, two are keys that can be structured (e.g. string name, and x,y pos), and two are are actual data members. Regardless of `AoS` or `SoA`, I would have to write custom data structures to maintain string (trie) and pos2 (quad tree) indices, that would also be doing slow lookups. Alternatively, if I could set the layout of an array to `Trie.name` (Trie using name field) I would be saying, layout this container to be as fast as possible (e.g minimal cache misses) for string based lookups by making it the primary layout, without having to change any code that uses it (you could even go the next step with something like: `SoA | Trie.name | QuadTree.pos` to allow reordering a single line of code to effect the performance of all code referencing it, but for the semantics of the line to act like normal bit flags).
tl;dr: Think database indices on columns, but improving runtime performance by rearranging the data itself into the index so as to better fit the algorithm and CPU requirements (e.g. rather than more abstract decisions in databases, where the model of a piece of hardware isn't usually a concern).
I mean I'm not worrying about it at that level, just that it's something relevant to the idea of layout. C++'s templates are a powerful language feature for doing layout optimization however, and it's sort of the thing to beat in power.
A lot of concerns I have can all theoretically be fixed in user space, on the other hand the same can be said for any language, and hence sort of defeats the purpose of a keyword. The point of the keyword is that it can inspect the fields of a type and decide how many arrays of what size to allocate. Here are some issues which would be annoying to implement in user space with AoS/SoA being merely an orthogonal keyword (e.g. limited operator overloading and without Turing Complete templates or macros):
* Alignment and padding so certain fields can be loaded into specific registers. This effects the layout of memory differently in each mode.
* Multiple buffers for some fields. A heavily used field (say position) could have a number of buffers which are rotated in SoA mode (e.g. render-posistions, last-frame-step, next-frame-step; allowing for reader/writer systems), and are simply an array of fields in AoS mode. Where as other fields retain their singular arity.
* Concurrency concerns. Partitioning arrays in SoA so cores don't fight over the lower caches. Aligning AoS on cache lines so whole objects can be locked at once without causing cache problems with neighboring objects.
* Sparse fields. If a field is generally empty, space is still allocated for it (e.g. in SoA using a dictionary rather than an array), or a cache miss due to usage of a pointer.
* But to answer the actual question it comes down to data structures: If I wanted to iterate across objects in an arbitrary quad-tree node and it's children, there are efficient ways to layout memory so that that and similar useful operations are fast (e.g. minimal cache misses). Yes I could put the data-structure in a different array but then I am looking up values in the data array randomly (e.g. cache misses). Many useful data structures have efficient systems for laying out their values and their structure in the same memory.
To elaborate on the last one more, maybe I have a structure that has 4 fields, two are keys that can be structured (e.g. string name, and x,y pos), and two are are actual data members. Regardless of `AoS` or `SoA`, I would have to write custom data structures to maintain string (trie) and pos2 (quad tree) indices, that would also be doing slow lookups. Alternatively, if I could set the layout of an array to `Trie.name` (Trie using name field) I would be saying, layout this container to be as fast as possible (e.g minimal cache misses) for string based lookups by making it the primary layout, without having to change any code that uses it (you could even go the next step with something like: `SoA | Trie.name | QuadTree.pos` to allow reordering a single line of code to effect the performance of all code referencing it, but for the semantics of the line to act like normal bit flags).
tl;dr: Think database indices on columns, but improving runtime performance by rearranging the data itself into the index so as to better fit the algorithm and CPU requirements (e.g. rather than more abstract decisions in databases, where the model of a piece of hardware isn't usually a concern).