Hacker News new | past | comments | ask | show | jobs | submit login
The status of Linux kernel hardening (lwn.net)
214 points by cheiVia0 on Nov 10, 2016 | hide | past | favorite | 75 comments

(I'm on the Mill team :) )

Its good to see classic ISAs moving away from memory protection 'rings' towards arbitrary 'zones', even if retrofitting it e.g. SMEP/SMAP gives horrendous APIs and a nightmare to keep checked and balanced! ;)

The Mill comes at this from the other direction, starting with 'zones' (termed "turfs" in Mill jargon) and emulating 'rings' (if your kernel wants that) with overlapping access rights between turfs.

On the Mill you can have lots of turfs that may or may not have disjoint memory access, and you move between turfs synchronously with a special kind of indirect function call termed a "portal". There are provisions for passing across specific transient rights to memory in these calls, so you can pass a pointer to a buffer and other aspects that facilitate the 'usercopy()' mentioned in the article but with full hardware rather than software protection.

We have tightened the portal/turf concept extensively since the Security talk http://millcomputing.com/docs/#security but it does give a gentle high-level intro to turfs and portals.

These days, we have facilities for passing buffers without exposing memory pointers and other niceties to make it easy to write correct yet efficient code. They can now all be made public but oh so little time, and I'm hoping to get a white paper out about it by the end of this month. Watch this space ;)

Happy to elaborate if anyone has Mill or general questions :)

PS an example of 'zoning' is http://elfbac.org/ , which is not getting enough attention. Its another way facilitate memory separation, albeit by abusing the classic MMU and with inherent runtime cost. Elfbac is userspace, but the hardware could be abused to protect kernels on classic CPUs too. Well worth everyone reading :)

Can you give us any hint about how much time is there until we can buy a Mill to play with?

From that security presentation, I've left with the idea that you wouldn't want a Linux kernel on the Mill. You'll more likely want something smaller, with a Linux virtualization layer for device drivers. That's because your security layer is extremely flexible, making it possible to push a lot of kernel space code into some less powerful context while keeping performance the same.

So are you working on a Linux port for it? (Maybe breaking it in pieces on the process?) Or do you intend on starting with something else? (Maybe building up from a microkernel?)

I was excited about the Mill when I first read about it 5 years ago. But at this point is there any reason to believe I'll ever get to write code for one? (Even via an emulator or something like that)

(I can't/won't watch videos, so will have missed anything that was only in videos)

We are completely serious about building it. We are working hard towards that goal. The team is 90% industry vets with very long careers designing and making real chips in devices that really ship and are really still used in their billions today, so we know what we're doing and just how long its going to take. We're bootstrapped so we can't complete with chipzilla on timescales, but we really desperately want to get Mill goodness out there as quick as humanly possible :)

we know what we're doing and just how long its going to take

So... how long is it going to take, specifically?

Chisel isn't that hard.

Can we test it out on an FPGA?

The official AARCH64 emulator is a Linux binary-only application. I would be happy to play with a Mill equivalent.

>(I can't/won't watch videos, so will have missed anything that was only in videos)

There's something like 15 hours worth of talks on the Mill. If you don't want to watch any of them then you're missing out on most of the design of the Mill.

Even if the Mill never gets to see the light of day, the talks were quite interesting IMHO.

It's not a Mill-specific. I'm happy/frustrated to hear that these talks are good but that doesn't change anything. I figure/hope that for anything important/valuable enough someone will eventually write it down.

It would help if the Mill people wrote a peer reviewed paper and produced a lower case ultra small font mVP that people could actually use.

I've been hearing Mill stuff since 2012. They had a weird non LLVM/GCC sorta kinda compiler port. Today in 2016, the toolchain support for way Out Of The Box stuff is great. They could do an LLVM port and a Chisel simulation and have Linux on top of that. But instead we get

  The team is 90% industry vets with very long careers
And no, I'm not going to watch 15 hours of Ivan Godard whinging on. Get off the pot and produce something.

I'm rooting for you guys, I want to buy one! Maybe do a kickstarter once you have a stable chip that could be manufactured?

Looks pretty cool. Out of interest, have you considered adding support for something like the enclaves provided by Intel SGX, e.g. for cloud computing use cases?

Yes, although we put that into our "virtualization" bag which we haven't begun mapping out in detail yet. I do try and keep up with the SGX roadmap and exploits; there have been some truly awesome attacks on it via hyperthreading, for example :) It will be interesting to see what AMD Zen does in this direction too. When the Mill is made, there will be all these orthogonal systems needed for these kind of use-cases available, but they won't pollute the portable ISA proper.

The reality is that the majority of systems out there use distribution vendor supplied kernels. If you are in this camp, note that one of the best things you can do for kernel security in production is run a custom kernel with all of the features you don't need removed.

If you go this route, definitely consider grsec as well.

Reasonably tuning your kernel can also offer speed (eg. via more specific CPU targeting), size and - critically for embedded environments - startup time improvements.

> critically for embedded environments - startup time improvements.

This is so true. Back in the day when I was involve in embedded Linux development, the quickest bootup time was about 40secs. This was booting a v2.6 kernel in a minimum configuration on an ARM7 system over SPI NAND FLASH. Hopefully, the bootup time is in the subseconds by now. Are we getting to these speed yet?

"Hopefully, the bootup time is in the subseconds by now. Are we getting to these speed yet?"

You can do that[0] with minimal system. Actually, it's not that difficult to get booting time down to few seconds on most embedded systems. But in real world boot time highly depends on modules and drivers that must be loaded to make system up and running.


I'm working on a system which some decently advanced functionality booting a linux 3.14 kernel on an ARM. Currently I'm getting around ~5 seconds, from turning the power on to having the system fully loaded and ready.

I could possibly get it lower but it meets spec and any additional shaving off would probably require plenty of work while sacrificing debug-ability etc. so for now I'm fine where it is :)

> This is so true. Back in the day when I was involve in embedded Linux development, the quickest bootup time was about 40secs. This was booting a v2.6 kernel in a minimum configuration on an ARM7 system over SPI NAND FLASH.

The older NAND file systems were very slow. I've had a similar case where repartitioning the huge NAND into a smaller partition and only using a small partition reduced the boot time to 10 seconds without any other change. Modern UBIFS systems apparently don't have this problem.

As for subsecond boot times: no chance with Linux.

I've booted Linux in sub-second times using User Mode Linux so there's nothing actually preventing it other than the need to probe hardware in various slow ways, which could be improved.

So this is going from connecting power to having a full system ready to go?

The question doesn't really make sense since there's no "power" involved -- the kernel neither has nor lacks power... but if I understand the nature of the question, then the answer is "yes" -- the system goes from not existing to being at a bash prompt in under 0.3s on my laptop.

Just looked up what UBIFS is, and saw the reference to JFFS2. We tried JFFS, it was painfully slow. JFFS2 was emerging and tried that too. In reality, it did not really worked for booting up. Once it was up, it was pretty good. Unfortunately, we were developing a hand hand consumer electronics device, so boot up times were absolutely critical.

Since you mentioned majorities, how many Linux kernel systems out there belong to parties which can afford to maintain a custom-built kernel with things like grsec integrated?

Oh, wouldn't things be better if we had all that candy upstream?

There's Alpine Linux, which ships with a grsec-patched kernel. It is doing a great job and in fact has become due to its size the officially preferred Docker container base distro. I really wish more companies got behind it.

One thing I routinely do is compiling a monolithic kernel without module support (and exactly the modules the hardware needs). This way injecting a module into the kernel should be a little harder.

On a standard kernel you can still set kernel.modules_disabled=1 after everything you need is loaded. There should be no big difference in behaviour in both cases. Not that the module injection is common anyway...

I urge everyone to use linux-grsec, and avoid the security hole that vanilla kernel is.

This stuff has been there in grsecurity patchsets for more than 10 (ten) years already.

From the article section on grsecurity Linus added that this kind of problem is exactly why he has never seriously considered merging the grsecurity patch set; it's full of "this kind of craziness."

How do you reconcile that with suggesting people run this patch? If it were good, Linus would merge it. For me, the fact that it has existed for 10 years and _not_ been merged does not speak highly to it's quality.

I feel that any non-kernel dev applying a patch to their kernel is the opposite of a good security recommendation. I'm not nearly as qualified about the tradeoffs between performance and security or even code quality as Linus and the kernel team. That's why I delegate the decision about what code goes in my kernel to them.

> If it were good, Linus would merge it.

Linus hasn't ever been security-minded, in fact half of the article is about Linus making a complains to Kess with things like "it will be slow to compile, it's a PITA to mantain, i don't understand it therefore is crazy and nobody needs this", so if you value security over anything else then Linus isn't the best person to rely for an advice on the topic.

> For me, the fact that it has existed for 10 years and _not_ been merged does not speak highly to it's quality

Parts of the grsec patch have been implemented over the years but not the whole mostly because Linus doesn't understand the need of most of the features not for quality reasons.

> I feel that any non-kernel dev applying a patch to their kernel is the opposite of a good security recommendation. I'm not nearly as qualified about the tradeoffs between performance and security or even code quality as Linus and the kernel team. That's why I delegate the decision about what code goes in my kernel to them

The fact that you don't understand why you need it, it's the very reason why _you_ shouldn't use it. Leave that decision to someone else on your team with experience handling incidents not to Linus et al.

What's the easiest way to start?

Some distributions carry kernel images with the patchset, e.g. https://wiki.archlinux.org/index.php/Grsecurity

Use alpine linux on your servers, it uses grsec by default, and if you can bear it - even on desktop with xfce4.

ArchLinux has linux-grsec as a package, its enough to pacman -S linux-grsec linux-grsec-headers and boot into it.

I can recommend Debian as well, used its grsec flavour of the Linux kernel package for a few months on my desktops successfully.

I would recommend taking a look at NixOS as well, they have it integrated and it can be as easy as adding an option to your system configuration. If you further add any customization, you will get a unique kernel build for your system, what is said to be ideal security-wise. You can read the details on their manual:


They mention the downside of address space randomization - it kills bug repeatability. The effort to reproduce a crash is much higher. The result is bugs closed with "cannot reproduce" comments from developers.

At least they're trying to reduce the attack surface. But the kernel is just too big.

Couldn't they do something that's "repeatably random"? So that in case of a bug, you can extract some information from your kernel on its current randomisation, and then another kernel can use this information to repeat your random layout.

E.g. use pseudorandom numbers, store the seed somewhere. In case of a bug, extract that seed, pass it on to the dev, and he'll run his kernel with that seed to reproduce.

How do you prevent an attacker from getting the seed? This would just create a possible attack vector that could be used to effectively disable address space randomization.

e.g. only root can get access to the seed. Then the attacker would already need to have root, so then you're in huge trouble anyways. And yes, it may be weaker than a full-random solution. But a pseudorandom system that gets accepted is more secure than a full-random system that doesn't get accepted.

This would sort of negate the purpose of ASLR, as afaik the whole point is an attacker would not know the mem layout. The very fact its not reproducible is the solution and the problem!

No. If the seed would be made available only in a kernel bug report.

Yes, that is what frederikvs probably means but access to the seed could become a new weakest link of ASLR. Presumably only available to CAP_SYS_ADMIN/uid 0, but it's worth a great deal of caution in designing the feature that allows determining what the seed was.

Further adding to that, the seed could be changed before that bug report was submitted. A dedicated tool with a FSM and minimal privileges does it. That lets us verify it strongly.

I know of a few "one time in a billion" bugs that became "one time in a hundred" bugs after randomization. Then they were fixed, instead of continuing to linger.

That's a form of fuzzing. With fuzzing, when you find something unexpected, you replicate the situation and explore that neighborhood. You don't immediately go on to the next random try.

Yes. And fuzzing works better by conscripting more users. :)

I didn't get that argument really. Or specifically, do we know the number of bugs which are harder to debug because now they disappear -vs- the number of bugs which were detected because kaslr breaks invalid assumptions? Maybe kaslr just exposed some ticking bombs in the code.

I've certainly seen this for user ASLR. However it's easy to work around, you just turn off ASLR while reproducing your bug. There's a knob to tweak in /proc/sys/kernel.

Doesn't help much. Turning ASLR off for a memory-clobbering bug adds repeatability, but it may be repeatable as in "never fails" if the memory clobbered isn't important. Memory is big today; most random stores won't crash the system. Now it's a "cannot reproduce" bug.

Crash reports from production systems come in, and you try to classify them. If the crashes look similar, you have something to look for. (Microsoft has a classifier to do this for Windows crashes, and that's one reason the Windows Blue Screen of Death is rare today.) Then you try to reproduce the crash situation. ASLR adds noise to that data. It's harder to match up similar bug reports.

It'll take at least a decade until kernel developers are convinced that KASLR is not helping anyone, and infoleaks are defeating it on a daily basis.

Here, have 3: https://twitter.com/R00tkitSMM/status/796617449823236096

(no, I didn't get confused, KASLR is not helping regardless of the OS)

I've been experimenting with Tomoyo Linux lately. To me, it's the simplest LSM to reason about (although I have misconfigured it before). In the spirit of Russell Coker's SELinux play machine, I have an initial Tomoyo test machine that users may experiment with as root (uid 0). Feel free to ssh in and try it out. If you find an issue, or bypass Tomoyo somehow, please don't damage anything and let me know. Also, please no fork bombs. You don't need root to do that:

    ssh -x -a root@montani.w8rbt.org
    Password = tomoyo1
Also, if you want to share this information with a friend, please use this URL:


I feel like such a moron when I read stuff like this, because I don't know what half the acronyms and technologies even are.

KASLR: Kernel Address Space Layout Randomization. See https://en.wikipedia.org/wiki/Address_space_layout_randomiza... for an overview on what this does.

SMAP/SMEP: Intel/x86-specific security features. See http://j00ru.vexillium.org/?p=783 for an early (2011!) take on SMEP. (j00ru is great reading, in any case.) See https://lwn.net/Articles/517475/ for SMAP from LWN.

PAN: Privileged Access Never, Basically ARM SMAP: https://community.arm.com/groups/processors/blog/2014/12/02/...

Aye, I wouldn't be surprised if the authors of this stuff might feel similar listening to you talk to somebody in whatever your area of expertise is. Experts use acronyms amongst themselves because it's more efficient for communication. If you're interested in it, google that shit, read, and learn :)

EDIT: See also: https://en.wikipedia.org/wiki/Curse_of_knowledge

Most(except ro/no exec) of the stuff mentioned are poor band-aid after the fact just in case security by obscurity solutions :( Whats worse some of them add baggage of kludges and cost extra processing, all in the name of slowing down attacker (not stopping, slowing down). This is why Linus was mostly skeptical if not opposing this (and grsec).

And all of them are band-aid for keeping to use monolith OSes written in C.

I'd still add overflow protection to the good changes. If there's a counting logic error, there's not much you can do apart from locking down access to the resource. Provided you're already using a language that doesn't prevent it of course.

Lmao, same.

Don't waste your time with vanilla if you really care about security, just use grsec.

This would be great, but unless you've got a budget for support, you probably shouldn't do it. When they say the free version is for testing, they mean it. Probably on servers it's less likely, but on desktop it has a tendency to crash xorg every few versions.

> Probabilistic protections can be defeated if the information leaks out, but they are still effective and worth doing.

What is the argument here? Is there something about this randomization that distinguishes it from classic security through obscurity?

Exploit mitigation may be all that stops a vulnerability being successfully exploited.

Here are some excellent slides on exploit mitigation in general: https://events.yandex.com/events/ruBSD/2013/talks/103/

Of the top of my head there are four approaches to stopping memory vulnerabilities:

1) have no bugs, e.g. formal verification etc

2) use a memory-safe language

3) accept that there can be vulnerabilities, and use exploit mitigation to harden it

4) capability-based addressing as a mitigation (it doesn't solve use-after-free, for example; it relies on software to do that etc)

Of these, (3) is the one you can retrofit to existing C/C++ codebases... a route you are usually forced to travel.

(There may still be other kinds of bugs, e.g. the obvious sql injections etc; I am talking above about memory bugs specifically)

Yeah, makes sense. Thanks for the link.

Security through obscurity can be very useful. It just shouldn't be the only thing you rely on to secure your systems.

Totally agree. I just don't know too much about kernel security and wanted to clarify whether this was mitigation via stc or something else.

I love to see Linux Kernel, distributions put more focus on easy program / workflow that detect of insecure behaviors.

(Good old ZoneAlarm type GUI / workflow would be very nice.)

* Report and block if someone able to run any kind of privilege escalation exploit.

* Report and block if any non-white list apps attempt to make any network connections to internet and any external IP.

* Report and block if any non-white list apps, scripts try to run and execute any program.

Selinux seems to claim it can do most of these. But the barrier of entry to setup and using it effectively is high (at lease for noob like me).

Please read on RBAC. If you don't want to learn SELinux you can use AppArmor or grsecurity's RBAC system.


You're not allowed to post baseless accusations of shillage on Hacker News.

Neither Corbet or Cook could ever be considered grsecurity shills. Honest question: did you read the article?

He may not be talking about the LWN stuff, but about the comments on HN. Most of the root comments are pitching grsec in some way or other...

Prepare for the lovely blog post from the grsecurity author who is going to proclaim how he is so much smarter than everyone else, and that upstream Linux doesn't know what they're doing.


What an non-constructive, cynical, depressing response. Grsecurity is a good product, components have been added to the kernel without proper attribution, and although Brad can admittedly add drama, how is this different from the drama your post adds? Pot, meet kettle.

grsec will sadly slowly become irrelevant, they could have worked with mainline but instead acted like little spoiled children.

I disagree with this statement. grsec has been a proving ground for a lot of hardening technologies throughout the years, many of which are just starting to make it upstream through the Linux kernel political process. As long as the team that works on the grsecurity patchsets continues to innovate, it will not become irrelevant; instead, it will continue to be a testbed for things mainline will pick up and re-implement at their discretion.

You seem to think grsec isn't in Linux due to the politics of submitting patches, grsec hasn't submitted piecemeal patches and their attitude is to derail conversations into name-calling. I recommend reading the comments in: http://lwn.net/Articles/663474/ or really anything where grsec team interacts with people. You can be the most intelligent human alive but if you cannot interact with people in a productive way it doesn't matter.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact