Svnapot is a poor solution to the problem. On one hand it means that that each p...

aseipp · 2024-08-23T19:53:16 1724442796

> What you really want is for the arch to be able to disallow 4KB pages like on apple silicon which is the main piece that allows their giant 128LB and 192KB L1 caches.

Minor nit but they allow 4k pages. Linux doesn't support 16k and 4k pages at the same time; macOS does but is just very particular about 4k pages being used for scenarios like Rosetta processes or virtual machines e.g. Parallels uses it for Windows-on-ARM, I think. Windows will probably never support non-4k pages I'd guess.

But otherwise, you're totally right. I wish RISC-V had gone with the configurable granule approach like ARM did. Major missed opportunity but maybe a fix will get ratified at some point...

wren6991 · 2024-08-24T00:55:33 1724460933

> This means that practically only the untranslated bits of the address can be used by the set selection portion of the cache lookup

It's true that this makes things difficult, but Arm have been shipping D caches with way size > page size for decades. The problem you get is that virtual synonyms of the same physical cache block can become incoherent with one another. You solve this by extending your coherence protocol to cover the potential synonyms of each index in the set (so for example with 16 kB/way and 4 kB pages, there are four potential indices for each physical cache block, and you need to maintain their coherence). It has some cost and the cost scales with the ratio of way size : page size, so it's still desirable to stay under the limit, e.g. by just increasing the number of cache ways.