Hmm I was separating the concept of "thread-per-core" from sharding. I would argue that typical cooperative task schedulers (e.g. work-stealing) get the performance benefits of thread-per-core without requiring any static partitioning.
But if thread-per-core is fundamentally tied to the idea of sharding, then I think I see what you're saying.
But if thread-per-core is fundamentally tied to the idea of sharding, then I think I see what you're saying.