
Pointer Authentication - mpweiher
https://github.com/apple/llvm-project/blob/apple/master/clang/docs/PointerAuthentication.rst
======
ComputerGuru
I’m guessing this was developed by or at the behest of Apple and ARM, based
off the supported hardware and languages? Are any versions of the iOS or macOS
kernel (or even user lands) utilizing this “across the board” now? I’d read
papers and theory on strong pointer authentication to mitigate control flow
attacks a very long time ago but I did not realize this was now “mainstream”
in a consumer compiler (with support for multiple C-like languages to boot!);
but it certainly is a tough sell without hardware support, both for security
and (perceived) performance benefits. (I say perceived because it turns out
that a lot of runtime safety checks are actually virtually undetectable as
they are perfect fits for speculative execution and branch prediction, and
very easily guessable with high success rates, as demonstrated by the rust
benchmarks with and without runtime safety checks enabled having such close
performance on modern x86_64 architectures.)

ARMv8.3 shipped with the instructions needed for this implementation of signed
pointers in 2016, and presumably Apple played a good role in contribution this
feature to the Clang codebase as no other hardware-accelerate authentication
scheme is supported, per the document. I wonder if there are any plans to
bring this to the desktop, by either of Intel or AMD. AMD is now in a position
to actually develop new extensions rather than largely playing catch-up to
Intel’s extensions (in recent years). (Then again, AMD remains the only one to
really offer hardware acceleration for SHA [0], and that doesn’t seem to have
really motivated developers to take advantage of that code.)

[0]: [https://neosmart.net/blog/2017/will-amds-ryzen-finally-
bring...](https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring-sha-
extensions-to-intels-cpus/)

~~~
rurban
> AMD remains the only one to really offer hardware acceleration for SHA

Intel added SHA with Goldmont, for smaller laptops. But it's only SHA1 and
SHA2, which are already outdated. Good enough for ptrauth though.

~~~
ChrisSD
SHA-2 isn't outdated if used correctly (e.g. combine with HMAC if length
extension attacks are a threat to your protocol). SHA-3 is the "break glass in
case of emergency" hash.

~~~
loeg
Doesn't even need HMAC to break length extension. You can just truncate the
result ("SHA-512/256"), or use SHA_d(x) = SHA(SHA(x)) or SHA(0^B || SHA(x)).
(Although SHA_d starts to look a lot like HMAC.)

~~~
ComputerGuru
Absolutely but it’s all besides the point for this particular application,
anyway. If you fix the pointer size (e.g. payload (i.e. both the secure and
raw pointers) must be 64 bits, no less, no more) then length extension is a
non-issue.

(Rebuttal: Some languages have fat pointers or maybe some architectures
support 32-bit binaries.. I think having the key tied to the payload size
would address the theoretical weakness there? In practice signed pointers are
already fat pointers and the contents of fat pointers are not actually
addresses in the first place, and neither are supported in the current
implementation.)

------
SCHiM
Although nice in theory, contemporary implementations of the same scheme leave
a lot to be desired. Take Windows' control flow guard (CFG) as a practical
example. The scope of valid indirect branch targets is so vast that even with
CFG enabled it's almost always possible for an attacker to craft a ROP chain
to achieve arbitrary code execution.

That doesn't invalidate the concept of pointer authentication as a whole, but
does reduce the number of situations in which you should consider to apply it.
If you have a large codebase 'bolting it on' will make pointer authentication
much less effective. And when you're starting a new process, then why not
start it in a language that offers stronger memory safety for a start, such as
Rust?

~~~
olliej
The point of PAC is that you can’t create an arbitrary return pointer - it’s
not based on ahead of time knowledge of what they fill CFG state is. The
return address is signed as a byproduct of making the call in the first place,
so to return to a different location you need to be able to rop through a
pointer with a forged signature.

------
geocar
If you don't use the standard library, and you don't need JIT, you can simply
not use pointers to callbacks. You can still have something like qsort() but
you need to have statically defined:

    
    
        typedef void*(*callback)(...);extern const callback callbacks[256];
    

and qsort() takes an index instead of a raw pointer to a callback.
"Validating" a callback is cheap: Just make sure it's <256 (how many do you
need anyway?). If you don't do an unchecked call* or a jmp* then you don't
have anything an attacker can exploit, and I find it hard to believe a cached
load is going to be slower than something like this.

~~~
choeger
Neat. But how do you make this modular and still safe? When the code invoking
qsort() does not know about how many callbacks there are, can you still deal
with it?

~~~
geocar
Our qsort() caller is an application with at most 256 different comparators.
Let us call that an ABI limit. The addresses are in read-only memory so they
cannot be changed- new addresses simply cannot be introduced at run-time, so
there is no risk of qsort being tricked into “jumping into the middle of a
function”. The implementation of my qsort() does not need to know if there are
fewer- the entry will be null and the program will crash.

~~~
zozbot234
This is a whole-program transformation though, it doesn't seem to be possible
to make it modular. Unless you manage to map some module-specific dispatch
table whenever you're running code from that module - in a way that cannot
easily be subverted by exploit code! Not sure how to do this however.

~~~
yuubi
Have a section of callback pointers and check against the bounds of the
section? G++ uses this mechanism for static constructors, but it's general
purpose (Linux uses it to collect lists of drivers to initialize, for
instance, with macros like IRQCHIP_DECLARE that populate a section for each
type of entity, which then gets scanned at boot time).

You would need a linker script to collect the callbacks into a section and
provide a symbol for the end, and define variables something like

int ( _my_sort_callback_ptr)(void_ , void *) __attribute__((section,
"sort_callbacks"))=my_sort_comparator;

You would, of course, use a macro for that.

~~~
geocar
gcc (ld actually) makes __start_sort_callbacks and __end_sort_callbacks for
you, so you shouldn't need a custom linker script.

------
totorovirus
There are so many mitigation techniques out there to protect against any
execution takeover attacks.. and I wonder how exploiters survive through them.

I forecast it would get impossible to hack a personal computer in 10 years.
Maybe there would be some vulnerability left on IOT devices

~~~
badrabbit
I have to disagree with that. In my experience, attackers have been relying
much less on exploits. A software exploit is one "initial access" tactic ,
there are those that use target exploits, zero-day exploits and most commonly
exploit kits but setting those aside phishing is the number one technique, web
driveby downloads are common too (e.g.: "flash update for your pc" ) in other
words social engineering attacks, there are also logical
flaws,misconfigurations, bad architecture (e.g.:rdp is exposed on the intranet
and it so happens another compromised host can reach the computer).

Let's say you have really good security hygeine, apps and sites are
whitelisted, no exploits are possible, things can't execute from removable
drives,etc... What happens when someone you know sends you an email containing
a link to a whitelisted service (say onedrive,dropbox,etc...) and that link
downloads a zip file with a malicious jar,javascript,mshta,macro enabled
document basically any thing that uses a whitelisted app to run some code?
Let's say your email security is top notch, are you gonna ban peoplr from
accessing their personal email? Let's say you do,what if a whitelisted site
has XSS used to inject JS that tells the user "you need to download and
install this font to view this site" (something I have seen) even if you
whitelist everything there are bypass techniques,code signing certs get
compromised, a new technique to use some existing known app to run code may
exist,etc...

I think initial access will get a lot more difficult but not impossible. Up to
the point someone can run code,it will be very difficult to lock down well,but
there is a lot that is being done to harden systems and monitor events to
catch when someone does something afterwards.

I personally think endpoint software and technology continues to get more and
more complicated. I can see big companies being resilient to many types of
attacks but consumers in general are too denseless.

Take something as simple as a usb worm, a company might make a calculated
decision to block usb exexution but what laptop will ship with that turned
off? A windows shortcut (lnk) running a whitelisted executor will continue to
be abused for at least 5 more years but I dare not speculate as far as 10
years.

------
ngneer
Wow. Incredible to see the new heights of complexity that the von Neumann
architecture has led to. Corruption of data leading to corruption of control
leading to control flow guarantees through the addition of cryptography. Wow.

------
EddieCPU
Is it technically possible to design a MMU that prevents a process reading or
writing to a region of memory that don't belong to it?

~~~
moonbug
that is exactly what an MMU is for.

~~~
saagarjha
MMUs work on much larger regions than what is useful for many classes of
memory safety issues. Luckily, ARMv8.5 adds support for memory tagging at a
more granular level.

~~~
twic
Aha, a chance to mention my favourite dead processor design, the 432, which
had an MMU designed to work at much finer, object-sized, granularity:

[https://en.wikipedia.org/wiki/Intel_iAPX_432#Object-
oriented...](https://en.wikipedia.org/wiki/Intel_iAPX_432#Object-
oriented_memory_and_capabilities)

------
bla3
Does anyone here know how this compares to hwasan?

~~~
saagarjha
Hardware Address Sanitizer is intended to protect against memory corruption in
general, while pointer authentication helps ensure code flow integrity. (And I
think it's mutually incompatible with the implementation that iOS uses because
they both use TBI).

~~~
rjmccall
They can be compatible. Memory tagging does use the TBI bits. Pointer
authentication uses an arbitrary number of bits, and the kernel configures the
width and whether the TBI bits are preserved. So you can use both, it just
costs you 8 bits of signature.

Moreover, this can be configured independently for code and data pointers. iOS
turns off TBI on code pointers to get 8 more bits of signature. That's not a
problem for memory tagging because memory tagging isn't particularly useful
for code pointers anyway.

~~~
saagarjha
> Moreover, this can be configured independently for code and data pointers.
> iOS turns off TBI on code pointers to get 8 more bits of signature.

Ooh, this is cool. Does iOS currently use different signature sizes? Can I
write an application that uses the top bits of data pointers?

~~~
rjmccall
> Does iOS currently use different signature sizes?

Code and data live in the same address space, and the address-space needs of
the system are the main input to the basic signature width, so the basic
signatures widths are currently the same, and the only difference is TBI.

You could imagine a system where code was always loaded into a restricted
subset of the address space and so code pointers could use wider signatures.

> Can I write an application that uses the top bits of data pointers?

Apple's ABIs actually consider the top 8 bits of data pointers to be outside
the addressable range on all its 64-bit targets, including x86_64. ARM64 TBI
just means that you don't need to explicitly mask off those bits before doing
loads and stores. But there are caveats:

\- ARMv8.5 memory tagging uses bits 56-59, so you should probably stick to
just the top four bits in case Apple ever uses memory tagging.

\- IIRC the first ARM64 iOS release didn't enable TBI, so if your deployment
target goes really far back, you do still need to mask.

\- The ABI for pointers expects those bits to be clear on normal ABI
boundaries. This means you need to mask before handing pointers off to other
code; on the upside, however, you don't need to worry about those bits being
set when you receive a pointer.

