
Task_t considered harmful - rivert
https://googleprojectzero.blogspot.com/2016/10/taskt-considered-harmful.html
======
0x0
Copying my comment from the earlier submission that didn't gain much traction
here:

What an absolutely amazing tour-de-force of a devastating design flaw in all
versions of macOS and iOS and tvOS and watchOS!

The negotiations detailed in the bug report timeline about meetings between
"senior apple and google leadership" for keeping this secret past the general
deadline really underlines that.

~~~
blinkingled
Yeah - the failed mitigations followed by a "long term" fix was interesting as
well. Apple literally had to change execve() this late in the OS's development
cycle to allocate new task and thread structs(that's two extra allocations and
copies in hot path!) to fix it for good. That this design problem lingered
around for so long doesn't look good for Apple - it's one thing for a use
after free bug in obscure piece of code to linger but bad design affecting a
ton of their own frequently used code to stand so long is somewhat brown bag
category!

I wonder what other fallout we might experience from this.

~~~
blumentopf
Business as usual with macOS. The other day I was browsing the ocspd source
code. Turns out it calls openssl using system(). So openssl is officially
deprecated on macOS and yet they're using it internally to handle
certificates?! And there's an enlightening comment:

    
    
        /* Given a path to a DER-encoded CRL file and a path to a PEM-encoded
         * CA issuers file, use OpenSSL to validate the CRL. This is a hack,
         * necessitated by performance issues with inserting extremely large
         * numbers of CRL entries into a CSSM DB (see <rdar://8934440>).
    

[http://opensource.apple.com/source/security_ocspd/security_o...](http://opensource.apple.com/source/security_ocspd/security_ocspd-55128.40.1/server/ocspdServer.cpp)

ocspd was introduced with 10.4. A decade ago. And that's really the problem
with macOS: There's no refactoring of old hacks, but rather just bolting on of
ever more new stuff.

~~~
wyager
Apple needs to take a bit of those tens of billions of dollars they have
sitting around and spend it on starting from scratch with something that's not
horrifically crufty. The quality of their software is lagging so far behind
the quality of their hardware right now. Realistically, I think we may just be
at the point where operating systems and all the stuff the companies put on
top of them are too complicated to keep developing in the traditional way with
traditional tools. Formal verification might be the cheapest way forward at
this point.

~~~
tptacek
So far as the current state of the art in computer engineering goes, we don't
know how to completely rewrite a system as complicated as XNU without creating
fresh batches of implementation errors. So this is a little like suggesting
Apple use its hundreds of billions of dollars to build an iPhone battery that
only needs to be recharged once a month.

We may someday get an XNU rewrite, but probably not until software engineering
produces a new approach to building complex systems reliably that works at the
scale (here: number of developers and shipping schedule) Apple needs.

~~~
notalaser
This is so, so true, that I wish there were enough beer in this world to gift
you with. There's a lot of cruft in XNU, and there's even more of it in the
rest of the system, _but_ all this heap of hacks isn't just useless cruft that
we'll be better off without. That heap of code also contains almost twenty
years' worth of bugfixes and optimizations from more smart engineers than
Apple can hope to hire and get to work together in a productive and meaningful
manner. All this unpleasant cruft is what keeps the system alive and well and
the users happy enough to continue using it.

More often than not, systems that get boldly rewritten from scratch end up
playing catch-up for years. Frankly, I can't remember a single case when a
full rewrite with an ambitious timetable wasn't a full-scale disaster. The few
success stories, like (what eventually became) Firefox have taken a vastly
different approach and took a lot more than users would have wanted.

A lot of idealistic (I was about to write naive) engineers think it's all a
matter of throwing everything away. That's the easy part. Coming up with
something _better_ is the really hard part, and it's not achieved by just
throwing the cruft away. If you innocent souls don't believe me, come on over
to the Linux side, we have Gnome 3 cookies. You'll swear you're never going to
touch anything that isn't xterm or macOS again.

~~~
sedachv
> This is so, so true, that I wish there were enough beer in this world to
> gift you with. There's a lot of cruft in XNU, and there's even more of it in
> the rest of the system, but all this heap of hacks isn't just useless cruft
> that we'll be better off without. That heap of code also contains almost
> twenty years' worth of bugfixes and optimizations from more smart engineers
> than Apple can hope to hire and get to work together in a productive and
> meaningful manner. All this unpleasant cruft is what keeps the system alive
> and well and the users happy enough to continue using it.

This whole premise is a false dichotomy. Apple does not have to throw away Mac
OS X, and it does not have to keep piling crap on without fixing things. If
you stop the excuses and rationalizations and commit to code quality you can
ship an operating system with quality code and minimal bugs. The OpenBSD
project has been doing this for two decades with minimal resources. There is
no valid excuse other than "we are too lazy and incompetent."

~~~
notalaser
I was (obviously...) responding to this:

> Apple needs to take a bit of those tens of billions of dollars they have
> sitting around and spend it on _starting from scratch with something that 's
> not horrifically crufty_.

They certainly don't have to throw everything away. Not having thrown
everything away is one of the reasons why OpenBSD is a good example here.
Remember all that quality code that was in place before Cranor's UVM? (Edit:
actually, the fact that UVM is an improvement over it should say something,
too...)

And, at the risk of sounding bitter, in my experience, very few companies have
the capability to "commit to code quality", and I don't think Apple is one of
them.

Edit: BTW, I really like your blog. You should write more often :-).

~~~
sedachv
> Remember all that quality code that was in place before Cranor's UVM?

So much before my time I was not even aware of it. For the uninitiated:
[https://www.usenix.org/legacy/events/usenix99/full_papers/cr...](https://www.usenix.org/legacy/events/usenix99/full_papers/cranor/cranor.pdf)

> Edit: BTW, I really like your blog. You should write more often :-).

Thank you. :) Just this week I started thinking of getting back into it.

------
a-no-n
Ever since installing 10.12.1, I've been having a bunch of processes randomly
entering a quasi-paused SIGSTOP-ish state (neither closable, apps not
"bouncing" (loading) and just not responding. Running Instruments, correlating
logs and such doesn't identify any clear cause. I'm having to `sudo kill -CONT
-1` in order to get things moving again. I'm wondering if it's related to XNU
mitigations or just some spurious "system configuration entropy" on my box.

~~~
cormacrelf
I did exactly this when my Mac ran out of memory yesterday. Safari hung with a
'your computer is running out of memory' warning (168 tabs open!) and I didn't
want to lose them all by force quitting. But the Safari process itself wasn't
"Not Responding" and we were back to 0% CPU.

So I quit everything else, SIGCONT'd Safari, and it started responding again,
so I tried unsuccessfully to close some tabs. Of course, Safari somewhat
isolates pages in separate processes, so I ran `ps aux | grep WebContent |
grep -v grep | cut -d' ' -f11 | xargs kill -SIGCONT` as well.

It all sprang back to life, and all the tabs I'd shut in vain zipped away. Got
that one saved for later. It's probably easier just to use -1 now I've learned
what that is!

I do wonder what's suspending these processes indefinitely. I should have done
more inspection to see what state they were in. I'm not familiar with how
WebKit content threads communicate though, so that's for another day.

~~~
a-no-n
I have 16 GiB and 0 GiB was occurring.

------
softawre
Interesting timeline stuff here:

[https://bugs.chromium.org/p/project-
zero/issues/detail?id=83...](https://bugs.chromium.org/p/project-
zero/issues/detail?id=837)

~~~
tptacek
It's funny, but I think it's written that way on purpose, not just as snark.

It's a little tricky to keep track of what happened here. There are 4 bugs in
this post, and (I think) 2 different timelines: the UAF timeline for the first
bug, and the TOCTTOU timeline for the 3 subsequent bugs. What's important to
understand about the three TOCTTOU bugs is that there's a "right" fix for that
bug, and a series of wrong fixes that delay the inevitable. Ian Beer and GPZ
probably go into this whole process knowing what the right fix is, and with
predictions on how they'll defeat any of the wrong fixes.

So it looks like GPZ reported a bug and then found flaws in the mitigations,
but really all three of the flaws they found were known, at least
conceptually, when GPZ reported the TOCTTOU race to Apple.

In the TOCTTOU timeline, Apple got an extension. Subtextually, it sounds like
Tim Cook called Sundar Pichai. GPZ does not want to give extensions. They have
a 90 day disclosure timeline, it's very well known, and probably the
healthiest disclosure process in the industry. It's problematic for GPZ to
give extensions because next time Tavis Ormandy finds a vulnerability in
Norton Antivirus, Symantec is going to try to play chicken, and GPZ doesn't
want to be at day 89 having to decide whether to drop zero-day versus being
held hostage by a patch schedule.

But if a bug escalates all the way to Tim Cook, GPZ is probably pretty OK just
with the degree to which that raises the profile of their bug --- it's hard to
look at that and think Apple isn't taking your bug extremely seriously. So
they'll trade the raised profile for the 5 week extension.

So they include a bunch of fuck-yous to Apple in the disclosure timeline,
messaging to other vendors that GPZ is not going to budge even if your dumb
original fix turns out to have a flaw that Ian Beer will notice and exploit.
If you want the extension, you'd better have a Tim Cook.

Or maybe they're just having fun. Either way, a good read!

~~~
yuhong
I wonder if it was really the CEO or was it someone else, and who it is
probably is here.

~~~
tptacek
It could have been Craig, too.

------
drinchev
I'm still with 10.11. I don't plan to update soon, since the benefit of Siri,
Photos and the other major features is quite small, compared to the risk that
I might loose working days if something goes wrong ( I'm a freelancer ).

As far as I read in the article there will be 10.12.1 ( the final fix ) which
will have that part of the kernel refactored. I hope Apple will also support
10.11 and issue an update with the same fix.

~~~
gulpahum
I got an update to 10.11 El Capitan yesterday, which probably fixes the
vulnerability. You can see the fix in Apple's support page, have a look at the
bottom of the page about "System Boot":

[https://support.apple.com/en-us/HT207275](https://support.apple.com/en-
us/HT207275)

It looks like even 10.10 Yosemite got a fix. Here's a full list of security
fixes Apple has made to its software:

[https://support.apple.com/en-us/HT201222](https://support.apple.com/en-
us/HT201222)

~~~
0x0
Sure looks like it, 10.10 + 10.11 + 10.12 is listed under the "System boot"
CVE-2016-4669 entry. That must have taken quite the effort to backport!

------
amluto
I would argue that the original underlying problem here is the idea that
having execve() increase privilege is acceptable. It's necessary for legacy
reasons (sudo, anyone?), but even then, it's barely necessary. "sudo foo"
could be implemented by asking a privileged daemon to run foo and handing off
access to the console to the daemon.

On Linux, you can do PR_SET_NO_NEW_PRIVS to turn off this type of privilege
gain, and it's even required for certain purposes. I would _love_ to see
someone develop a distribution that enables no_new_privs for all processes.

~~~
navaati
> "sudo foo" could be implemented by asking a privileged daemon to run foo and
> handing off access to the console to the daemon.

FYI that exists, it's called pkexec (yes, from polkit) :)

------
empath75
Can someone explain what this means for the end user?

~~~
tptacek
It's a pattern of privilege escalation bugs. If you run untrusted code on your
machine, that code can obtain root or alter the kernel, potentially even if
it's running as nobody.

There is a relatively long sequence of attempts to band-aid the bug, all of
which failed, because Ian Beer found a systemic flaw, not just a single point
flaw. So, the other implication for users is a general sense of foreboding.

~~~
Jerry2
So Apple still has no real solution to this bug?

~~~
thenewwazoo
The real solution (refactoring how execve creates tasks) was release with
10.12.1 (and updates to prior versions of OS X too).

------
tptacek
Bug 1: Many XNU drivers save task_t's on the heap without bumping their
refcount.

    
    
        1. Attacker creates process A and B
        2. B->A send task port Bt
        3. A->XNU request IOKit framebuffer client for Bt
        4. A ditches Bt, retains client
        5. Kill B; Bt in client now dangling
        6. Trigger creation of privileged C, unrelated to A & B
        7. C inherits memory once used by Bt
        8. A use retained framebuffer client to write C's memory
    

What's important to understand is that this is not just a single UAF, but a
pattern of UAFs scattered throughout XNU.

Fix: at step 3, check to make sure the task being given to IOKit is owned by
the task making the IOKit request.

Bug 2: IOKit drivers cache task details on their stack; the lifetime of that
cached task is the lifetime of the IOKit kernel object, not of the program
that made the request. In particular: if you execve() an SUID, the task_t is
repurposed.

    
    
        1. Attacker creates process A and B
        2. B->XNU request IOKit framebuffer for Bt, Bc
        3. B->A send client Bc
        4. B execve /bin/su. B is now running as root.
        5. A use retained framebuffer client to write B's memory
    

The tricky thing here is that this isn't just one bug, but a pattern of bugs:
every place where a driver stashes a task_t on the heap and exposes
functionality through a passable object is a place where colluding processes
can potentially take advantage of SUIDs to raise privileges.

Fix: Lifetime of IOKit clients now tied to lifetime of creating process.

Bug 3: Even if a driver doesn't save a task_t on the heap, they're saved on
the stack during the servicing of system calls and kernel mach message
handlers, so there are race conditions.

    
    
        1. Attacker creates process A and B
        2. B->A send task port Bt
        3a. A->XNU task_threads(Bt), retrieving thread ports for Bt
        3b. (simultaneously) execve /bin/su. B is now running as root.
        4a. task_threads converts Bt to a task_t
        4b. execve modifies the same task_t to replace thread ports
        4c. task_threads retrieves the (now privileged) thread ports.
        5. A uses thread ports to overwrite registers and take control of B.
    

Fix: Kernel objects now check to see if a task_t has been touched by execve
before returning them to userland. Even if you win the race, that failsafe
prevents the kernel from giving you privileged objects.

Bug 4: You don't need the kernel to give you a privileged object directly; all
you need is to be able to influence a privileged object.

    
    
        1. Attacker creates process A and B
        2. B->A send task port Bt
        3a. A->XNU task_set_exception_port(Bt), wiring A to B's exceptions
        3b. (simultaneously) execve /bin/su with rlimited stack. B is now running as root, briefly.
        4a. task_set_exception_port converts Bt to a task_t
        4b. execve modifies the same task_t to replace thread ports
        4c. task_set_exception_port rewrites the exception port.
        5. stack access in B, running /bin/su as root, causes a SEGV
        6. XNU generates an exception message, passing with it the thread ports, to A    
        7. A uses thread ports to overwrite registers and take control of B.
    

Fix: table flip. Rewrite execve so it generates entirely new task_ts when
loading binaries, rather than repurposing old task_t.

This is all pretty magnificent. What's best about it is that it totally
justifies the title of the post: pretty much every place in XNU where they
save a task_t creates a TOCTTOU bug.

~~~
0x0
In particular about "pattern of UAFs scattered throughout XNU", there was
missing memory management of task_t references in the _sample code for kext
drivers_. So it wouldn't be enough to just add the missing retain calls in the
Apple XNU kexts, because there may be an unknown number of third party kexts
out there. Perhaps not as many as windows has device drivers, but it's still
the same type of thing. Can you imagine if every windows device driver turns
out to have copy-pasted privesc bugs?

In fact I think there was a similar bug-in-the-templates requiring a world-
wide recompile in Microsoft MFC/ATL, perhaps it was this one
[https://blogs.msdn.microsoft.com/vcblog/2009/08/05/active-
te...](https://blogs.msdn.microsoft.com/vcblog/2009/08/05/active-template-
library-atl-security-updates/)

------
saynsedit
The sad reality is that black hats have been exploiting this class of bugs for
_years_.

~~~
45h34jh53k4j
GPZ finds a entirely new class of vulnerability, Apple takes 4 months to patch
and resolve. And you claim this has been exploited for years. There is 0
evidence of this, and such a claim demands proof.

I would be happy to apologise if you could find one example of exploitation
prior to a few days ago when it became public.

~~~
saynsedit
The point of the 0-day black market is to not reveal these attacks publicly.
If there were public proof of this in the past it would have been fixed in the
past.

Take my word for it when I say there are upper echelons of black hats that are
stockpiling unknown 0-day exploits like this and presently using them in the
wild.

Or dismiss me as irrational and continue with the belief that all bugs are
unknown until white hats share them with Apple.

~~~
rictic
There is a middle ground between "all bugs are known on the black market" and
"no bugs are known only on the black market."

