This is pretty well known technique (tagged pointers themselves go back forever, and using tags for this purpose it's easy to find papers from the 2000's).
In fact, it was already being implemented in ASAN on architectures with hardware support :)
Where there is no hardware support, the real world overhead is 35%+ (which this paper finds as well), and you end up with other issues to contend with (you have to de-tag the pointers in a bunch of cases, most kernels don't like them, etc), so software-only approaches have mostly not been bothered with.
Paper author here, thanks for the interest :). I just wanted to elaborate on the overhead numbers you mentioned.
I am not specifically familiar with the 35%+ numbers, but you should take into account the cost of any defense on top of the cost of putting tags in pointers, because that makes all the difference.
We found in our experiments on SPEC CPU2006 that, without static analysis for optimizations, the cost of pointer tagging itself (tagging + masking) is 22% geomean, which is almost entirely due to masking on loads/stores. This is with a 64-bit masking constant (0x80000000ffffffff) which is unique to our approach and is relatively inefficient on x86. With 32-bit masking (0xffffffff constant), which is more common, the number is about 14%.
This only covers metadata management and is not useful on its own, so you need to implement some kind of defense on top of this which is usually costly. The hardware support used by HWAsan only removes masking overhead by having the (ARM) hardware ignore the upper byte of the pointer, which is a nice way to not having to pay for masking.
"I am not specifically familiar with the 35%+ numbers, "
This is confusing to me
Last sentence of second paragraph of your paper:
"We show that Delta Pointers are effective in detecting arbitrary
buffer overflows and, at 35% overhead on SPEC, offer much better
performance than competing solutions."
:)
"
This only covers metadata management and is not useful on its own, so you need to implement some kind of defense on top of this which is usually costly. The hardware support used by HWAsan only removes masking overhead by having the (ARM) hardware ignore the upper byte of the pointer, which is a nice way to not having to pay for masking."
It really is a clever hack to catch a bounds error by forcing a segfault.
Where to put the tag is a problem for replacing pointers generically, but I think it's still pretty reasonable to use the 64-bit pointer version with just a minimum of language support. The trickiest buffers are often under 64K and we usually want to keep them small to live in cache, so if you have a "safe_alloc" function, all your IO loops, parsers, etc. that are dealing with user input could benefit from this.
I still don't understand why CPUs don't provide support for 'fat pointers'.
ARM has 'load pair' to update two registers in one go, it would just need a 'load/store from array' instruction to check that a pointer is inside the pair address to have a very efficient mechanism (efficient from an instruction count point of view at least, of course the data bandwith and cache impact are still here).
"Fatter than normal" pointers have a lot of disadvantages (it's been explored extensively as part of CHERI, etc) for the upsides they give you. For the purpose of use after free/buffer overflow finding, it's IMHO, not a great fit (there are other uses, of course).
Gwydion Dylan used a two-word representation for values - a full word as a type tag, and then a word for the value itself (whether pointer or int). It was an interesting experiment, but multicore basically killed it. Once your basic value representation is more than a word, then every load/store/manipulation of a value requires a lock around it, to prevent corruption if the thread is preempted when the type has been written but the value has not. The locking overhead kills performance.
Or you can go with a GIL (but for compiled code) the way Python and early Ruby/JS implementations did, but that hasn't worked out terribly well for them either.
No, by fatter than normal I mean, say, 96 or 128 bit pointers.
I'm aware of SPARC ADI :)
It does not have fatter-than-normal pointers, it uses 4 bit tagging of the memory address by reusing bits 63-60 of the pointer.
The IBM iSeries, or whatever its latest names is, has 128bit software pointers, with 65bit HW pointers, where the extra bit indicates whether the value has been tampered with in userland. AFAIR, checks are still all implemented in software, by relying on a runtime trusted code generator.
In the paper, the authors claim that the additional pointer arithmetic consists only of register operations & so one wouldn't expect any overhead. Then, it is surprising that the paper mentions 35% overhead on integer benchmark. wondering where that comes from...
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizer...
Where there is no hardware support, the real world overhead is 35%+ (which this paper finds as well), and you end up with other issues to contend with (you have to de-tag the pointers in a bunch of cases, most kernels don't like them, etc), so software-only approaches have mostly not been bothered with.