I think you skimmed part of the original post because I mention using sparse arrays (linear arrays with ‘empty’ slots) and the benefits/trade offs.
This archetype based approach is used in quite a few big ECS projects. Unity’s ECS and Bevy amongst them.
As with anything performance related though, particularly when considering the underlying principles of data oriented design you should be analysing the performance of your approach on the target hardware.
Or is CPU cache really so slow it can literally only look at one stride of memory at a time?
I'm skeptical this kind of optimization is necessary.