Hacker News new | past | comments | ask | show | jobs | submit login
Linux kernel: CVE-2017-6074: DCCP double-free vulnerability (local root) (seclists.org)
182 points by QUFB on Feb 22, 2017 | hide | past | web | favorite | 94 comments

  $ echo "install dccp /bin/true" >> /etc/modprobe.d/dccp.conf
  $ sudo rmmod dccp   # in case it's already loaded
This is a good idea for any modules you don't expect to ever need. In my case:

  $ cat /etc/modprobe.d/disabled_modules.conf
  install appletalk /bin/true
  install bluetooth /bin/true
  install cramfs /bin/true
  install dccp /bin/true
  install firewire-core /bin/true


  install tipc /bin/true
  install udf /bin/true
  install usb-storage /bin/true
  install vfat /bin/true
The original -- to me -- recommendations for this were found in some "hardening guide" for RHEL (CIS, NSA, etc.), although I don't remember which.

See also the "modprobe.blacklist=" kernel parameter, which you'll have to use for "modules" that are compiled into the kernel itself (i.e., they are not actually loadable kernel modules).

15 years ago, when building your own kernels was a normal everyday thing, I simply built my kernels with everything compiled in and modules disabled. This (would have) prevented attacks such as kernel-level rootkits.

In addition, "one neat trick" was that you could halt (not poweroff) the machine (!) -- such as in the case of a Linux box simply acting as a router/firewall -- and the kernel would still be running. Good luck compromising that! :-)

This is a useful post. I'm going to nerd out at you though and point out that disabling LKMs doesn't really do anything to disable kernel-resident rootkits; any kernel memory corruption vulnerability can be used to create an ad-hoc module loader.

If you Google for amodload, you can find (uuencoded, in a Bugtraq message --- it's 2017 and I just uudecoded something!) Josh Daymont's old SunOS 4.1.3 modloader hack, which overwrote a kernel socket callback and used socket buffers to hold module text.

> This is a useful post. I'm going to nerd out at you though and point out that disabling LKMs doesn't really do anything to disable kernel-resident rootkits; ...

I really hate to question you but... are you sure?

The particular case that I'm thinking of was probably ~15 years ago. There was some 0-day or worm or something (details are a little vague now, obviously) that would install a kernel module and then loaded it to hide itself.

I can recall a discussion at the time and this was a recommendation. It wasn't just disabling individual modules -- it was actually disabling support for loading kernel modules. That is, whatever the kernel itself needed in order to support dynamic loading of modules at run-time was needed; the kernel was not able to load a module.

Like I said, it's been probably 15 years ago and I'm sure I'm not remembering everything 100% correctly so you may very well be correct... but I'm almost certain you could entirely disable loading of kernel modules (the kernel itself didn't support loading them).

Googling "amodload" now, thanks for the pointer...

Surely if you can write to /dev/mem, it doesn't matter whether your kernel has support for modules or not. You could just create a rootkit which is not in the form of a loadable module.

EDIT: Here's an example of this very thing. https://www.blackhat.com/presentations/bh-europe-09/Lineberr...

Check out amodload; that was what was so interesting about it at the time: it built its own module loader back into a kernel compiled without loadable modules. If course: table stakes for modern rootkits!

Thanks for the replies and pointers, Thomas. You've given me even more to read and learn about.

The kernel has function pointers and dynamic memory. You can "build" a module if you need to.

Interesting. Thanks.

Great advice. Using /bin/false also works, as I found on the Arch Wiki. Do you know what the difference is?

  modprobe dccp #with /bin/false fails with
  modprobe: ERROR: Error running install command for dccp
  modprobe: ERROR: could not insert 'dccp': Operation not permitted

/bin/true sets the return code to 0, signifying success to the modprobe process, /bin/false sets it to 1, signifying failue, and producing said message.

Due to the way modprobe may be hooked by various things, its best to use /bin/true to ensure less breakage.

Like throwaway2048 mentioned, the only difference is the return code.

I think the original documentation described them something like this: /bin/true does nothing, successfully. /bin/false does nothing, unsuccessfully.

Sounds like your referring to STIG.

Among others, yes.

Isn't VFAT required for UEFI?

Wouldn't it only be required if you wanted to mount the UEFI system partition? (Which is formatted in a variant of the FAT FS, IIRC.) If you don't want to mount the system partition, you shouldn't need the module. (Although, my system puts kernel images there. IDK if this is typical of UEFI systems. IIRC, Grub is capable of loading from ext FSs, so this shouldn't be a hard-and-fast requirement.)

Well, even if kernel images didn't go there, you would still need to access it for grub updates. So it might not be a good recommendation for everyday users.

No idea, that particular host doesn't use UEFI.

This is a good reason for systems running untrusted code to disable module automatic loading. Almost nobody uses DCCP, and as a result, almost nobody looks at the DCCP code, writes bad userspace apps that trigger kernel bugs that get debugged, etc. We rarely see double-frees in the TCP or UDP implementations.

On my Debian kernel, CONFIG_IP_DCCP is set to "m" (in /boot/config-`uname -r`), which means that DCCP support is built as a module. The code isn't loaded until the first program tries to call socket(...IPPROTO_DCCP). At that point, the kernel will look at /proc/sys/kernel/modprobe and run that program, /sbin/modprobe by default, to load dccp.ko.

Automatic module loading is great when e.g. udev runs and detects what hardware you have, but it's probably not something you'd ever need once a system has completed boot. A very simple hardening measure for machines running untrusted unprivileged code is to echo /bin/false > /proc/sys/kernel/modprobe, late in the boot process (e.g., in /etc/rc.local).

The downside is that system administrator won't be able to run tools that require loading modules, of which iptables is probably the most notable one. A better option than /bin/false is a shell script that logs its arguments to syslog, e.g., `logger -p authpriv.info -- "Refused modprobe $*"`. The sysadmin can manually run modprobe on whatever module name got syslogged (or temporarily set /proc/sys/kernel/modprobe back to /sbin/modprobe). And you can alert on that syslog line to see if there's an attack in progress.

(Does anyone know if it's possible to disable module auto-loading for a tree/namespace of processes, e.g. a Docker container, but keep it working for the base system?)

...or just roll your own kernel with CONFIG_MODULES=n.

If you only need a specific set of modules, you might as well just build them in and just forget LKMs altogether.

It's not very convenient when you run into some obscure hardware/fs/protocol, but depending on your use case, that might not be an issue (and even in the rare case where it does come up, it's not the end of the world – a minimal kernel build only takes a couple of minutes on a reasonably fast machine).

The only case where LKMs really seem necessary (besides in generic distro kernels) is for module development/debugging, but you're probably not going to do that with a production kernel anyway.

The last thing I want is to be manually rolling package updates for kernels. That's one of the reasons I use a distro in the first place, to offload testing to a dedicated team of people. If I was still interested in that, I would run Linux From Scratch again.

As soon as you've administered 10+ servers at the same time, your patience for variation from the norm and manual patch building goes way down.

Once you've administered 100+ servers at the same time, any thoughts of doing that go right out the window. along with the person that brought it up.

At that point, any changes are diffs applied to the original package (or a brand new package from scratch) that you put in your local repo. Kernels actually get updates fairly often, which means you either have a lot of work, or you ignore the non-essential updates. Neither are ideal.

Yes. Updated packages for this CVE are already out, and you can just `apt-get dist-upgrade && reboot`. Unless you're prepared to invest organizational effort in a kernel build process, the amount of security you gain from running a custom kernel, in exploits that don't affect you, is vastly outweighed by the amount of security you lose when an exploit does affect you and you have to get around to doing a local kernel build. Good luck if the person who usually does it is sick that day.

It's certainly possible to build infrastructure to automate compiling, testing, and pushing out a new kernel, but very few organizations are going to justify that much development effort just for security reasons. If you're already building your own kernels because you have other technical reasons for it, and therefore have already put this effort in, then yes, just turn off CONFIG_IP_DCCP and call it done.

I can see why it might be a burden when you're dealing with hundreds of servers, but on a personal machine, where the occasional disruption won't get you sacked, it's pretty straightforward (though I say this as someone who follows kernel development as a hobby and has the time to track down bugs).

On the other hand, for a personal machine, caring about local root exploits is almost certainly outside your threat model. You have a https://xkcd.com/1200/ architecture, where everything other than software updates is running as uid 1000. The things you actually care about, your emails, your IMs, your tax documents, etc., are all accessible to uid 1000. Any random malware you might download will run as uid 1000. root is honestly a less interesting target. And the non-root account runs sudo often enough that an attacker with access to your account can get to root with a bit of patience, anyway, no kernel exploits required.

I used to maintain a laptop with two user accounts, one of which I used for running sudo and doing important work, and one of which had the Flash and Java plugins enabled and was used for Pandora, YouTube, etc. It sorta worked, but it was a pain, and I eventually gave up on it. If you do have a setup like this, then caring about local root exploits starts to make a bit of sense.

I now have a Chromebook, which sandboxes any attacker-controlled executable code on the machine. If you actually care about the security of your personal computer, do that, or get Qubes or something—and just use the vendor's provided OS and keep it up-to-date.

Multiple accounts can be quite usable if you get the separation right. Separate uids for personal emails/banking and porn browsing should be a given, at least.

At that point, why stop at separate uids and not just use separate virtual servers? It's a bit more costly in space and RAM (when running), but it's pretty good at reducing the attack surface. If you really want to be paranoid, restore from a snapshot every time you start the VM, and occasionally start it just to update it and create a new snapshot. Even if an attacker does get a local account, they have to do something useful with it before you close the virt, essentially destroying anything local they've set up.

firejail [0] looks interesting, though I haven't played with it yet.

[0]: https://firejail.wordpress.com/

Honestly, this is all completely a red herring if you ask me. The real solution is to neutralize the attack vectors by identifying the means -- not to waste countless cycles playing whack-a-mole by recompiling and rebooting your kernel all the time.

The best back-of-the-napkin calculation for "How many bugs does my software have" can be calculated as: a linear function of your LOC. You have 2x as many LOC as you did yesterday? Constant factors aside, it's reasonable to assume you have 2x as many bugs as you did yesterday.

Now you are dealing with a system (Linux) that has almost no self-protection features to stop attack methods. You are also dealing with a language in which a single error like a double free is not only very easily accomplishable, but can now lead to full system compromise. The whole system is tens-of-millions of LOC. It is not hard to see that this approach of playing "recompile every week" is going to scale very poorly overall, and it cannot be done by everyone (due to logistic and cognitive overhead).

Furthermore, stopping actual attack vectors has massive bonuses compared to the whack-a-mole game. For one, it protects against current threats only discovered in the future. This vulnerability is a decade old. There are likely on the order of _millions_ of machines with affected kernels. Many of these machines likely will never get upgraded again (vendor kernels, EOL, whatever).

You won't be able to set CONFIG_MODULES=n on the 8 year old Linux-based router/firewall some SMB has running (after a shitty Outlook server employee password gets hacked, and someone downloads a still-active VPN certificate in their corporate email and logs in to begin pivoting -- and it isn't long till they can persist on something like that if they see it).

Second, along the same lines -- whack-a-mole does nothing to prevent against targeted attacks. A theoretical targeted attacker is going to be vastly more capable than someone running a pre-canned exploit -- you cannot assume they are somehow both A) competent enough to pull off a deeply targeted attack, yet B) too stupid to fall to basic measures like `CONFIG_MODULES=n`

A recompiled kernel is not going to stop a dedicated attacker from finding a stable exploit in the other N-million lines of code inside Linux. You're playing an unwinnable game in this scenario, where the adversary only needs time. So both of these methods fail: in the small scale, it will fail for the vast majority of already-affected systems. On the large scale for targeted attacks, it will fail to truly competent attackers.

Sitting around and investing in massive kernel building infrastructure can be completely obviated by doing one thing: running grsecurity. Or improving Linux's real self-protection/security features. Only one of those is viable at the moment.

Then, none of these attacks will work. While there is the aspect that exploits may not be tuned to attack grsecurity -- it also fundamentally makes many attack vectors impossible. This should be the goal -- to make exploits almost pointlessly difficult, even with a bug at hand. Even with 5 bugs it should be maddeningly difficult. For example, it completely stops refcount based overflows. PAX_MEMORY_SANITIZE would stop the most major attack vector of this particular bug -- the ability to write or invoke a function pointer through a UAF on the affected, allocated block. UDEREF and KERNEXEC stop almost every major userland/kernel-land cross execution attack, especially the most trivial ones which you see relatively often -- and it works on every platform you can think of (where SMAP/SMEP-like features are limited to people who have new computers, Intel only).

We're clearly trying the whack-a-mole game now. CONFIG_MODULES=n is just another version of it. It isn't working.

I'm familiar with grsec (having an interest, albeit not as an expert, in some of the issues it addresses, and as a long-time Hardened Gentoo user), but that doesn't override my point that 'CONFIG_MODULES=n' seems neater than 'echo /bin/false > /proc/sys/kernel/modprobe'. If you only need a predefined set of modules and have no need to load them after boot, why not just disable the facility altogether?

Because nobody is actually going to set CONFIG_MODULES=n as any kind of "security measure", first off. Or really ever do it at all. If CONFIG_MODULES=n is a really completely separate issue to security and it's about "don't use what you don't need" -- why did we ever bring it up in this thread at all? Clearly there is some connection - the idea less code is better, right? Don't allow flexibility where none is needed. Principle of least power. But if your point is not "overridden" by the existence of grsecurity -- a clearly security focused piece of kit -- why are we talking about it at all in this thread? We might think these are mostly unconnected because "They are obviously doing different things, so they must be unconnected, they're just kinda like peas-in-a-pod!" But they are very much intertwined, I think, about the way we think of these things. Dig on that.

My argument is that it seems like doing this helps. It obviously means less LOC, right? So clearly it's strictly less attack surface. That's clearly better, no questions asked, right? I mean, it's not anti-memory-corruption. But that's why they aren't the same! So they aren't connected in that sense, right? But it isn't attacking the real root problem. The root problem is extremely important in this case, because millions and millions of computers are impacted by it. You can't kick-the-can forever.

You're saying "Why not just disable it", but you need to ask yourself another question first: will anyone actually do that, anyway? I mean, aside from massive nerds like us - with free time. Another question is still: Can the problem be solved without this, without requiring more than is necessary while covering every perceivable use? The answer is, aside from people running on their laptops, nobody at scale -- unless they have very specific needs or resources -- are going to do this. This ship is very long gone and sailed. And you can definitely solve this without hacking your kernel config.

It also does not address the root issue. So we all turn off CONFIG_MODULES in every distro. What's next? The next major weak point in Linux's infrastructure? Then everyone sets CONFIG_FOO=n in order to avoid everything until Linux rolls in 10-million more lines of code, then there's CONFIG_BAZ=n to set? This is why it's whack-a-mole. BTW: remember the other millions of devices _for which you cannot turn off CONFIG_MODULES_? You won't be able to turn off CONFIG_BAZ in 5 years either, I'm afraid. Because you won't be able to on your shitty router.

Let me put it this way: if Linux had a vulnerability in its ethernet stack and it seemed to be a source of problems continuously -- is the answer turning off the ethernet stack for everyone? No. It is to find the root cause.

If browsers are attacked relentlessly through vulnerabilities -- do we stop using the web forever and everyone just deletes their browser? No. Do we tell every user they're doing it wrong by "trusting the browser" when they should "know" software has flaws? No- we instead engineered browsers to be as resilient as possible in the face of a very, very hostile internet. Chrome is an example that we have made real progress here. There is more to do. Why do this? Because this is simply the way most people use the internet: with a browser. With an ethernet stack. With CONFIG_MODULES=y.

Try using the "5 W's" here:

Why are so many vulnerabilities here? What trigger mechanisms do people use to attack it? Where are they introduced? When are they exploitable?

The reality is: a lot of the answers to these questions have very little to do with module autoloading.

Maybe there are lots of vulnerabilities because the code was not extensively hardened, or designed around the scenarios it is used now (user namespaces are a good example). Perhaps people very often use the same trigger mechanisms for a payload (for example, overwriting a function-ptr struct, such as the one that takes care of module callbacks, like read() syscalls, etc) It might have been they were introduced long ago. It might be actively and reliably exploitable right now, or maybe it is very difficult to exploit in any realistic scenario.

These are all very common attack scenarios for Linux exploits, historically, over the past few years. Global function pointers overwritten trivially (because of no `__ro_init` until 4.8+). Simple payloads because there wasn't even a way to block `commit_creds` for a long time. Exact same triggers, like refcounts, UAFs, double frees.

This is beyond autoloading. It is a process problem.

Again, "rebuild your kernel with a config option changed" does not logistically scale for the vast majority of people. It's basically a non-starter. People will go with their distro kernel instead (as they very likely should, to be honest).

Just to be clear: I am not saying CONFIG_MODULES=n is bad -- by all means, turn it off. If it makes you feel good, speeds up your kernel builds -- whatever. But it does not really address the root problem here, so its suggestion is a red herring; simply turning off this stuff is just a bandaid, it doesn't address actual attack vectors. If you're bring up CONFIG_MODULES=n not as a security measure but as a way to just "reduce attack surface via LOC" -- these are the same thing! It's just not an actually good security measure. It doesn't scale. Again, it is a herring if it is not actually a "security measure".

I suppose my point is there's a bigger gap here. We think very "simplistic" things like this are good, because they "strictly must improve everything and are justified by that" but often they treat symptoms and not the real disease. But they don't really, actually stop the attacks. They "shake the table" a little, to quote someone on Twitter recently. They can even increase complexity and make real defenses more brittle.

FWIW: Here's an example of that, where these kinds of "soft" mitigations actually backfire - that SLAB randomization code is only a small obstacle for a skilled attacker (and most kernel attackers _will_ be skilled): https://github.com/torvalds/linux/commit/c4e490cf148e85ead0d... -- so, given this: why exactly are we paying the cost of these regressions, of half-baked fixes that do not stop SLAB-based exploits and real attackers? If my kernel is going to have a shitty, bizarre regression, I might as well get actual security out of it.

There's a patchset called "Timgad" floating around to do exactly this.

Not what you ask, but you can blacklist that module with a fake install.


> Does anyone know if it's possible to disable module auto-loading for a tree/namespace of processes, e.g. a Docker container, but keep it working for the base system?

Block CAP_SYS_MODULE or filter init_module syscalls. Container software should have this enabled as sensible default.

That is the default in Docker.

This is module auto-loading, i.e., you don't need CAP_SYS_MODULE, nor do you make an init_module syscall. The only syscall you make is socket(...IPPROTO_DCCP), and the kernel asks /sbin/modprobe (which it runs with full privileges) to load the appropriate module. The userspace program does not have any privileges.

Does Docker actually block that? amluto's comment about the Timgad LSM implies that such functionality doesn't exist yet. (And the replies on LKML show the same confusion between explicit module loading, which requires privilege, and auto-loading, which does not. https://lkml.org/lkml/2017/2/6/279)

The easy way to try this is to lsmod and look for dccp and dccp_ipv4, `modprobe -r` them if they're already there, run

    python -c 'from socket import *; socket(AF_INET, 6, 33)'
in your container, and lsmod again and see if they're present.

Yeah true. Docker master has a seccomp profile that blocks this socket family though.

Simpler: compile everything you know your hardware requires in static. No modules

    make menuconfig && make dep && make clean && make bzImage 
And build a package from your new shiny kernel...

That's really good advice, thank you.

root can undo that with echo /sbin/modprobe > /proc/sys/kernel/modprobe at any time so you are better off using:

echo 1 > /proc/sys/kernel/modules_disabled

If you're root, a local privilege escalation isn't going to get you any more privileges than you already have.

If you're root in a container, but not root on the outside system, you shouldn't be able to write to /proc/sys/kernel/modprobe, no?

I just wanted to mention that the path to modprobe is something reversible (containers aside) if the sysadmin wants autoloading. /proc/sys/kernel/modprobe is not writable from a container.

Disabling module loading is not reversible, you need to reboot.

Does this work in a container that shares a kernel with the host?

No, /proc/sys/kernel/{modules_disabled,modprobe} are not writable from a container. Tested with LXD on Ubuntu 16.04.

The diff from the patch:

  - goto discard;
  + consume_skb(skb);
  + return 0;
One of the rare cases in the wild where a goto really was considered harmful! ;-)

Rare? What about goto fail?

in 'goto fail', the problem looked like:

    if (foo() != 0)
      goto fail;
      goto fail;
That extra line of code could just as likely have been a return statement or anything else. The problem was not with the goto statement itself.

Edited to add: 'goto fail' is a valid construct and can be used to handle finalization in C functions. Consider:

    foo(char *bar, int baz) {
      sometype_t *obj = NULL;
      int fd = -1;
      if ((obj = sometype_new(bar, baz)) == NULL) {
        goto fail;
      if ((fd = open(sometype_path(obj), O_RDONLY)) < 0) {
        goto fail;
      /* ... */
      return 0;
      if (fd < 0) {
      if (obj != NULL) {
      return -1;
There's nothing inherently wrong about that.

It is often much clearer to use goto than using a wrapper and switching on enum'ed return value to destruct a failed transaction.

"rare", not "non-existent". "goto fail", while well-known, is only one occurrence.

That was 3 years ago. And kernel code is covered in gotos.

Tbf, the return acts like a goto.

Here's an overview of DCCP I wrote if anyone is interested: https://www.anmolsarma.in/post/dccp/

The below should be followed at the users own risk. Perhaps someone can confirm if the following is sane.

For Ubuntu versions:

  Ubuntu 12.04.3 LTS
  Ubuntu 14.04.5 LTS
The following should get you patched up, you will need a reboot though:

  sudo apt-get update
  sudo apt-get install linux-headers-3.13.0-110 linux-headers-3.13.0-110-generic linux-image-3.13.0-110-generic
After reboot:

  uname -a
  Linux hostname 3.13.0-110-generic #157-Ubuntu
Any Digital Ocean users may need to power down their droplet and switch the kernel to the following version:

  DigitalOcean GrubLoader v0.2 (20160714) Ubuntu
Without this the new kernel may not be used.

Digital Ocean hasn't required that you change the kernel through the UI for some time now. Are you sure that's necessary?

It's only necessary for older servers (e.g. Ubuntu 12.04 and 14.04) where people haven't done this already.

you can also use the API for this https://developers.digitalocean.com/documentation/v2/#change...

for example for precise the kernel ID is 7515 for the "DigitalOcean GrubLoader v0.2 (20160714) Ubuntu" kernel. If you do this be sure to double-check the id by listing the kernels using the API though. You can also do this with libcloud using 'ex_change_kernel' from the 'DigitalOcean_v2_NodeDriver'

RedHat: https://bugzilla.redhat.com/show_bug.cgi?id=1423071

Edit: I'm looking for something for CentOS but I can't even find a tracker where they put CVE's?

CentOS should get their stuff when RedHat is finished, as is usually the case, I think.

Yeah, just look for a kernel package the same name (including minor version numbers) as the upsteam redhat one. You can find info on Redhat Errata here[1]. CentOS looks to have some form of errata listing (possibly non-official) here[2].

You could theoretically locate teh SRPM for the kernel and build from that, but I haven't done that in years and the last I heard RedHat was going to make that harder because they weren't happy Oracle was copying their work wholesale and then selling it as a competitor, so that may be harder now?

Redhat is finished, CentOS is probably just lagging a little bit.

1: https://rhn.redhat.com/errata/

2: http://centoserrata.nagater.net/list.html

I use RHEL6 at work, so we already get errata notifications :).

Already fixed on ubuntu

linux (4.4.0-64.85) xenial; urgency=low

  * CVE-2017-6074 (LP: #1665935)
    - dccp: fix freeing skb too early for IPV6_RECVPKTINFO

 -- Stefan Bader <stefan.bader@canonical.com>  Mon, 20 Feb 2017 10:06:47

Let me see if I got this right, This can be used to DOS a system by consuming all free memory?

In the CVE it almost hints that this specifically is UDP related.

Am I right in thinking this?

This is a double-free vulnerability. In userspace, this would basically be

    void *x = malloc(16);
When malloc makes the allocation, it actually allocates a bit more than 16 bytes so it knows how much to free. Usually what it'll do is store a structure right before the pointer it returns; for instance, it might allocate 24 bytes, use the first 8 bytes to store the number 24, and then return the rest of that allocation. Then, when free is called, it subtracts 8 from what it was given, and frees that many bytes. (More complicated malloc implementations will have more complicated structures, possibly including pointers to other, shared structures.)

If you call free a second time on the same pointer, a lot of things could happen. In particular, if there was another call to malloc() in between, it might reuse that same location, and you might free that. Or if some other sequence of allocations happened, there might be a completely nonsensical number stored at x-8.

If the second call "frees" data that's still in use, two parts of the program will be writing to the same memory. Maybe one thing is storing the return value of a call, and the other thing is storing the current UID. So when the first thing writes 0, now you're UID 0.

Alternatively, if the second call frees something nonsensical, you might "free" memory that was never the allocator's in the first place - and cause the next call to malloc() to return a pointer to something important elsewhere in memory, perhaps even the program's stack frame. If the program then reads attacker-controlled data into that newly-allocated memory, the attacker takes over the program's control flow.

DCCP is a protocol that is kind of like UDP, in that it's message-oriented, but kind of like TCP, in that it's reliable and implements congestion control. It's not very common, and uncommon networking protocols have been a historic good source of vulnerabilities in the Linux kernel.

> but kind of like TCP, in that it's reliable and implements congestion control

DCCP implements congestion control but is not reliable.

I assume the reason why this can happen is because the default implementation of malloc/free doesn't check if the pointer has already been free'ed right? And the reason it doesn't check is because of performance reasons?

You can take a variety of approaches to fighting this but basically, TL;DR yes, it'll cost you somewhere.

OpenBSD's malloc implementation for example actually will let you enable an option that instead marks unused pages in the freelist maintained by malloc as PROT_NONE, i.e. reuse of a page will cause a segfault.

However, there are just various tradeoffs here. For example, OpenBSD's malloc does not guarantee that a `free` will immediately cause a freelist entry to become unused (AFAICS reading the man page -- option `F` vs `U`). So it will not immediately be marked as PROT_NONE, but only once a number of unused entries get 'flushed'. This is (presumably) an optimization so malloc can keep unused chunks around for quick-reuse, without having to do remappings.

In the case of the Linux kernel, obviously it is using a different allocator, but the same basic ideas apply.

For a protection against this particular exploit, you can look to grsecurity. In particular, I believe PAX_MEMORY_SANITIZE nullifies the major attack vector of this bug, immediately -- as it frees/erases all unused slab/memory pages immediately. If a page or object that is freed contains a pointer (i.e. the traditional mode of attack requires some pointer dereference, instead pointing to attacker-controlled code) -- this will trip PAX_MEMORY_SANITIZE as use of a freed pointer, and kill the kernel task.

Other techniques for exploiting this are probably handled by similar defenses.

(So I think this is like, kernel exploit #93,234,893 stopped immediately by a grsecurity feature?)

EDIT: mikeash also points out another good bit I forgot, which is that the basic defenses don't immediately catch a kind of 'ABA-style' double free. e.g. you free, something else gets the allocation instantly after, then you cause another free again. You'll only crash later.

But nullifying the attack vector is what's really important here, which is: you need to stop this from extending into an (almost arbitrary) read/write vector.

I'm not sure it would catch all the double free errors anyway. Imagine a case where we have

    x_1 = malloc(n)
    x_2 = malloc(n) // malloc returns the same address as x_1, ie x_1 == x_2
How would the runtime know the difference between a double free of x_1 and freeing x_1 then later x_2? And now we think x_2 points to valid data when it doesn't, which can lead to a use after free error.

How do you detect a "pointer that has already been freed"?

You could keep a separate list of all live allocations. If free() is called with a pointer that's not on the list, it's an error.

This won't immediately detect the case where you free, something else mallocs in that exact spot, and then you free again. It will eventually catch it when the something else tries to call free. It will catch cases where the second free call happens before the memory is reused, or when the memory is reused but the new chunk starts earlier.

I assume this isn't done because maintaining and checking the list would cost too much.

If the bookkeeping weren't so massive you could cycle through the entire 64-bit address space before re-issuing the same memory address. That would be similar to a bump allocator; once all the allocations on a page were freed you could unmap it so except for pathological cases you couldn't explode the number of page table entries too horribly.

At 1 TB/sec of allocations it would take 500 years to reach the end of the 64-bit address space in the kernel. (It's actually 544 years ignoring leap years and leap seconds, but subtract 44 years for reserved address ranges).

The benefit would be making it impossible to use-after-free because the same address would never be re-used.

The problem isn't the bookkeeping. It's the massive implicit memory leak that you get when there's only one 4-byte allocation left on your 4-KB page.

That's what I meant by "except for pathological cases". Existing memory allocators have the same problems with fragmentation.

One might reasonably describe the 4kB granularity of virtual memory as bookkeeping overhead, although it's probably not what people would usually think of if they heard that.

The answer to every problem in computer science is another level of indirection.

Pointers point to descriptors that point to allocations. Of course, you can't dynamically manage descriptors without reintroducing the same problem, but the answer to every problem in computer science...

You overwrite the page you just freed (e.g. point it entirely to a fixed page of junk bytes) or mark it as PROT_NONE immediately upon free().

It says in the post that you can execute arbitrary code in the kernel. So, full compromise of the system. (In the case of Docker, perhaps you can even escape to the root system?)

AFAIK Docker is still not considered ready to run untrusted code on shared machines. If you want to run untrusted code, you should just use a VM anyway.

Even with a VM you need to be somewhat careful as hypervisor breakouts are not unknown.


That said, the Linux kernel is a helluva lot more attack surface than the relatively few hypercalls provided by most Type 1 hypervisors.

Should be able to escape docker with this. Docker provides host kernel access, so if you get the kernel you're good to go.

DCCP is something even a basic hardening should already have taken care of... but of course many people don't do those.

Quick solution: "echo "install dccp /bin/true" >> /etc/modprobe.d/modprobe.conf"

what would be included in a "basic hardening"?

The "Center for Internet Security" Benchmarks [0] -- such as the one for Ubuntu 16.04 (PDF) [1] -- address your question.

The DISA IASE [2] publishes hundreds of "Security Technical Implementation Guides" (STIGs) for various operating systems (including several Linux distributions [3]), software applications, networking devices, and so on.

The OpenSCAP [4] Security Guides [5] are also a good reference. They are primarily aimed at compliance w/ security requirements (C2S, PCI-DSS, USGCB) -- for example, the "U.S. Government Commercial Cloud Services (C2S)" [6].

I don't 100% fully implement any of these, but I have a lot of RHEL and CentOS boxes publicly accessible on the Internet and they don't become accessible until they've had the majority of these recommendations implemented.

P.S. Look around. There are tons of shell scripts, Ansible roles, Puppet modules, etc., that will take care of 90% of this for you. There's no excuse for having public-facing machines that aren't locked down.

[0]: https://benchmarks.cisecurity.org/downloads/benchmarks/

[1]: https://benchmarks.cisecurity.org/tools2/linux/cis_ubuntu_li...

[2]: http://iase.disa.mil/Pages/index.aspx

[3]: http://iase.disa.mil/stigs/os/unix-linux/Pages/index.aspx

[4]: https://github.com/OpenSCAP/scap-security-guide/

[5]: https://www.open-scap.org/security-policies/choosing-policy/

[6]: http://static.open-scap.org/ssg-guides/ssg-rhel7-guide-C2S.h...

Thanks, very interesting reads.

Makes one wonder why these things aren't default on a linux distro...

As an embedded software developer it's a bit inconvenient that it's all focused towards the big distributions (ubuntu, RHEL, ...), but still interesting input for securing an embedded linux device :-)

While these guides are aimed at specific distributions, many (most?) of the concepts "translate" quite easily to other flavors of Linux. sysctl's, for example, might be configured in a different file but they will still exist and work the same (disclaimer: usually!).

syzkaller strikes again!

Does the Gentoo hardened vulnerable too?

It looks legit.

I can't wait untill the linux kernel is ported to Rust ^^

Its never going to happen

In the meantime, there's Redox OS:


Why isn't there any Cryptocurrency for fuzzing well-known softwares? https://security.stackexchange.com/questions/152036/why-isnt...

Cryptocurrencies need a relatively stable coin generation scheme. Hits from fuzzing is typically very bursty, often produces nothing, and changing the fuzzing scheme slightly can reveal completely different results.

It's just too variable.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact