x86 (generalized: an ISA) doesn't have an inherent cacheline size, a given imple...

jeffbee · 2023-05-29T15:58:42

True, but the answer of the asked question is if you target a 64B cache line and run with a 128B line you may get more sharing with the consequent performance and scalability problems. Of course, if you are all that sensitive to such matters why are you running on an emulated machine? And Apple Silicon doesn’t offer many hardware threads anyway, so contentions problems are never very severe.

mhh__ · 2023-05-29T15:59:10

I'm not aware of any widespread X86 chips with a line size bigger than 64 bytes. The ISA also practically does have a minimum line size implied by the memory model, hands waving.

My point is that it varies at runtime but the type in the standard is constexpr so you can't actually rely on it unless you actually control where it executes.

cmrdporcupine · 2023-05-29T16:34:48

It does sort of bring up the general question of why this is a compile time and not runtime constant. I doubt x86 will double its cache-line size any time soon, but if it did -- and people are running binaries with cache-padding at 64-bytes -- expected behaviour is going to differ. Not in a way that's going to make anybody lose their minds, mind you, but this kind of micro-optimization will just cease to be effective.

EDIT: naturally I understand that compile makes sense for e.g. statically sizing array sizes etc.

aw1621107 · 2023-05-29T19:19:18

Raymond Chen discussed that in a relatively recent blog post [0]. The main points are that those constants are typically used to influence struct layouts/alignments, and those must be decided at compile time. The alternative is to generate multiple versions of a struct/other code with different alignments/layouts and choose among them at runtime, but that comes with other tradeoffs.

[0]: https://devblogs.microsoft.com/oldnewthing/20230424-00/?p=10...

gpderetta · 2023-05-30T10:18:45

99% of the time {constructive,destructive}_interference_size is used as a parameter to alignas, which necessarily takes a constant value. It would simply replace a lot of const size_t cache_size = 64 from user code making it slightly moreportable. Having a runtime value can be sometimes useful but it is beyond the scope of the feature.

C++11 std::atomic had similar scope creep where it had atomic::is_lock_free as a runtime parameter. Nobody ever used it as is simply not something you care at runtime. So C++17 added is_always_lock_free as a compile time query which can be actually actionable.

jeffbee · 2023-05-29T16:53:51

> I doubt x86 will double its cache-line size any time soon

Well, why not? They doubled it from 32B to 64B between the Pentium III and Pentium IV.

cmrdporcupine · 2023-05-29T16:55:23

I don't know enough about CPU design to say really but Pentium IV is a long time ago now. And since then, I suspect that a lot of assumptions about 64B line sizes have been baked in.

Apple could go to 128 because they were rolling out a whole new ISA, so were breaking compat anyways.

That said, they have amazing performance on the M1, and I wonder how much of that has to do with the wider L1 size.