Can you share details? What modern platforms/environments does this not work on? Are you saying the intersection of available bits on all platforms is just 6? Or are there platforms that actually use 58 bits?
It wasn’t anything clever. A couple years ago I did a dive into x86 and ARM literature to determine what bits of a pointer were in use in various environments or were on a roadmap to be used in the future. To be honest, it was more bits than I was expecting.
Note also that this is the intersection of bits that are available on both ARM and x86. If you want it to be portable, you need both architectures. Just because ARM64 doesn’t use a bit doesn’t mean that x86 doesn’t and vice versa.
Both x86 and ARM have proposed standards for pointer tagging in the high bits. However, those bits don’t perfectly overlap. Also, some platforms don’t fully conform to this reservation of high bits for pointer tagging, so there is a backward compatibility issue.
Across all of that, I found six high bits that were guaranteed to be safe for all current and future platforms. In practice you can probably use more but there is a portability risk.
> Note also that this is the intersection of bits that are available on both ARM and x86. If you want it to be portable, you need both architectures. Just because ARM64 doesn’t use a bit doesn’t mean that x86 doesn’t and vice versa.
Your mask/tag doesn't need to use the same bits on x86 and ARM to be portable, though.
If one platform uses the upper 56 bits and another uses the lower 56 bits that doesn’t mean you have 0 bits available for tagging. It means you have 8 bits and have to go through a conversation when moving from one platform to another. This is perhaps annoying but perfectly fine.
Kinda weird to materialize pointers across architectures rather than indices.
But in any case surely the relevant consideration is “fewest number of free pointer bits on any single platform”. And not “intersection of free bits across all platforms”. Right?
Eh? The values you can store in the tags are absolutely dependent on the number of available bits. That’s a simple type safety problem. This requirement is architecture independent.
You can’t cram 8 bits of tag in 7 bits if the latter is all the architecture has available. Hence why you have to design for the smallest reliable target.
How about taking the advantage of max_align_t pointer alignment guarantees, which on x86-64 Linux (glibc) is 16-bytes? This would leave you with the 4 lowest bits to be used.
Unfortunately, low bits vary considerably from a large number to zero. If you need a minimum number of bits to be reliably available then you have to look at the high bits of a pointer. Naturally, implementations use the low bits of alignment makes them available.
Hm, what I have seen so far is that pointers returned by system malloc are usually aligned either to 8-byte boundary (windows) or 16-byte boundary (linux). I think jemalloc interprets the C standard guarantees a bit differently and will return the 8-byte aligned pointer for allocations whose size <= 8. But even this still leaves us with 3 LBSs to use.
Page tables can optionally consume a very large number of bits on x86 (57?). Not every platform enables it but your code may run on a platform that uses it. There are a bunch of proposals from Intel, AMD, ARM, et al about officially recognizing some set of high bits in 64-bit pointers as tags in user space, with implementations. Unfortunately, these “standards” don’t agree on which high bits can be safely reserved for tagging.
IIRC, the 6 high bits I mentioned was the intersection of every tag reservation implementation and/or proposal. In other words, it was the set of bits that Intel, AMD, and ARM agreed would be safe for tagging for the foreseeable future.
Fewer bits than I would like and can probably exploit, but nonetheless the number I can reasonably rely on. If a consistent standard is ever agreed upon, the number of bits may increase.
If you include ARM then PAC and MTE will consume a few of those precious bits. Don’t think any platforms use PAC for pointers to allocated objects though unless they’re determined to be exceptionally important like creds structure pointers in task structures in the kernel.
Would be great to hear some actionable details.