> high-performance code was “work on structs of arrays, not arrays of structs”
Wikipedia's article "Array of Structure (AoS) and Structure of Arrays (SoA)" explains the trade off between performance (SoA) and intuitiveness/lang support (AoS): https://en.wikipedia.org/wiki/AoS_and_SoA
They also get into software support for SoA, including data frames as implemented by R, Python's Panda's package, and Julia's DataFrames.jl package which allows you to access SoA like AoS.
I’d push back on this by writing the Pandas and Polars etc of the world could be a lot better if they supported generic types. The fact the DataFrame is not connected to your struct / types is a big problem because everyone everywhere has to write at least two functions and probably more like 14 functions to enable basic CRUD operations on Struct of Arrays correctly given Structs.
For example, you need a StructOfArraysToArrayOfStructs function and ArrayOfStructsToStructOfArrays function and StructToRow and RowToStruct at least. Making sure those work for all the various types of thing in a programming language like python is no easy feat. Not having those functions built in, means everyone has to make sure those functions work on their own.
The downside is losing the conceptual understanding of why one or the other organization is the fastest for the data in question. From the POV of a general table data structure how does it decide which kind of memory organization is the most performant?
Looking at the examples referenced it looks like sugar over SoA so things look comfortably like normal but SoA isn't necessarily the most performant layout in memory depending on how things are being accessed.
Isn't SoA basically always more performant though? You're either traversing individual "columns" and leveraging cache locality, or you're not and then you can traverse multiple columns in parallel to rebuild the structs on the fly.
I could only see this degrade or be suboptimal if you have a very small struct that still all in-cache (like an array of 3d coordinates or something)
If you’re always accessing all the members then no. SoA shines when you want to access a different set of a few members of a struct in different contexts. Then there’s also the option of building temporary arrays of just the data you need then spinning through it which can be faster for some things. For example rendering pipelines. Even more so if you cache the results and can use them next frame due to temporal locality. There’s no silver bullet you’ve got to sit down and look at your access patterns and profile.
Wikipedia's article "Array of Structure (AoS) and Structure of Arrays (SoA)" explains the trade off between performance (SoA) and intuitiveness/lang support (AoS): https://en.wikipedia.org/wiki/AoS_and_SoA
They also get into software support for SoA, including data frames as implemented by R, Python's Panda's package, and Julia's DataFrames.jl package which allows you to access SoA like AoS.