Delta Pointers: Buffer Overflow Checks Without the Checks [pdf]

DannyBee · on April 24, 2018

This is pretty well known technique (tagged pointers themselves go back forever, and using tags for this purpose it's easy to find papers from the 2000's). In fact, it was already being implemented in ASAN on architectures with hardware support :)

https://clang.llvm.org/docs/HardwareAssistedAddressSanitizer...

Where there is no hardware support, the real world overhead is 35%+ (which this paper finds as well), and you end up with other issues to contend with (you have to de-tag the pointers in a bunch of cases, most kernels don't like them, etc), so software-only approaches have mostly not been bothered with.

zniperr · on April 25, 2018

Paper author here, thanks for the interest :). I just wanted to elaborate on the overhead numbers you mentioned.

I am not specifically familiar with the 35%+ numbers, but you should take into account the cost of any defense on top of the cost of putting tags in pointers, because that makes all the difference.

We found in our experiments on SPEC CPU2006 that, without static analysis for optimizations, the cost of pointer tagging itself (tagging + masking) is 22% geomean, which is almost entirely due to masking on loads/stores. This is with a 64-bit masking constant (0x80000000ffffffff) which is unique to our approach and is relatively inefficient on x86. With 32-bit masking (0xffffffff constant), which is more common, the number is about 14%.

This only covers metadata management and is not useful on its own, so you need to implement some kind of defense on top of this which is usually costly. The hardware support used by HWAsan only removes masking overhead by having the (ARM) hardware ignore the upper byte of the pointer, which is a nice way to not having to pay for masking.

By the way, source code for Delta Pointers is online since today! https://github.com/vusec/deltapointers

DannyBee · on April 26, 2018

"I am not specifically familiar with the 35%+ numbers, "

This is confusing to me

Last sentence of second paragraph of your paper:

"We show that Delta Pointers are effective in detecting arbitrary buffer overflows and, at 35% overhead on SPEC, offer much better performance than competing solutions."

:)

" This only covers metadata management and is not useful on its own, so you need to implement some kind of defense on top of this which is usually costly. The hardware support used by HWAsan only removes masking overhead by having the (ARM) hardware ignore the upper byte of the pointer, which is a nice way to not having to pay for masking."

I'm aware :) There are thoughts and plans here.

zniperr · on April 26, 2018

Ah, I thought you were referring to HWAsan with the 35%, hence the confusion.

ben509 · on April 25, 2018

It really is a clever hack to catch a bounds error by forcing a segfault.

Where to put the tag is a problem for replacing pointers generically, but I think it's still pretty reasonable to use the 64-bit pointer version with just a minimum of language support. The trickiest buffers are often under 64K and we usually want to keep them small to live in cache, so if you have a "safe_alloc" function, all your IO loops, parsers, etc. that are dealing with user input could benefit from this.

renox · on April 24, 2018

I still don't understand why CPUs don't provide support for 'fat pointers'. ARM has 'load pair' to update two registers in one go, it would just need a 'load/store from array' instruction to check that a pointer is inside the pair address to have a very efficient mechanism (efficient from an instruction count point of view at least, of course the data bandwith and cache impact are still here).

DannyBee · on April 24, 2018

"Fatter than normal" pointers have a lot of disadvantages (it's been explored extensively as part of CHERI, etc) for the upsides they give you. For the purpose of use after free/buffer overflow finding, it's IMHO, not a great fit (there are other uses, of course).

"Tagging within existing sized pointers", ARM does have some hardware support for (see http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....)

It won't verify tags, but it has explicit support for ignoring bits, which is very helpful.

nostrademons · on April 25, 2018

Gwydion Dylan used a two-word representation for values - a full word as a type tag, and then a word for the value itself (whether pointer or int). It was an interesting experiment, but multicore basically killed it. Once your basic value representation is more than a word, then every load/store/manipulation of a value requires a lock around it, to prevent corruption if the thread is preempted when the type has been written but the value has not. The locking overhead kills performance.

Or you can go with a GIL (but for compiled code) the way Python and early Ruby/JS implementations did, but that hasn't worked out terribly well for them either.

binarycrusader · on April 24, 2018

If by "fatter than normal" you mean using bits of the memory address to do interesting things, SPARC does so with ADI in hardware:

https://lazytyped.blogspot.it/2018/02/libcmalloc-meets-adihe...

DannyBee · on April 24, 2018

No, by fatter than normal I mean, say, 96 or 128 bit pointers.

I'm aware of SPARC ADI :) It does not have fatter-than-normal pointers, it uses 4 bit tagging of the memory address by reusing bits 63-60 of the pointer.

That is "tagging within existing pointers"

binarycrusader · on April 25, 2018

As I said it depends on what you mean by “fatter than normal”.

pjmlp · on April 25, 2018

Many do, all the way back to 1961.

https://en.wikipedia.org/wiki/Burroughs_large_systems

passing by the Xerox PARC machines, Lisp Machines and many others, including the ill fated iAPX 432.

https://en.wikipedia.org/wiki/Intel_iAPX_432

If you could afford a Sparc M7, C on Solaris has been more type safe since version 11

https://www.youtube.com/watch?v=krOhcjF5Fsw

http://www.oracle.com/technetwork/server-storage/softwareins...

https://docs.oracle.com/cd/E53394_01/html/E54815/gqajs.html

The main issue is not if the CPU provide support, rather forcing the devs to actually bother to use them.

But since many want speed at any cost, it will only get there with a new generation, more open to trade performance for actually secure code.

puzzle · on April 25, 2018

The IBM iSeries, or whatever its latest names is, has 128bit software pointers, with 65bit HW pointers, where the extra bit indicates whether the value has been tampered with in userland. AFAIR, checks are still all implemented in software, by relying on a runtime trusted code generator.

AshishNITTrichy · on April 27, 2018

In the paper, the authors claim that the additional pointer arithmetic consists only of register operations & so one wouldn't expect any overhead. Then, it is surprising that the paper mentions 35% overhead on integer benchmark. wondering where that comes from...

limericky · on April 24, 2018

Hmm.

> User-space pointers in Linux are 47 bits. We limit this to 32 bits to support 32-bit tags.

MrBuddyCasino · on April 24, 2018

c‘mon, most vulnerable networked software don’t need no 4GB - just because it doesn‘t work 100% don’t mean its worthless