1. It's reasonable to claim that amd64 (x86_64) is more secure than x86. x86_64 has larger address space, thus higher ASLR entropy. The exploit needs 10 minutes to crack ASLR on x86, but 70 minutes on amd64. If some alert systems have been deploy on the server (attacks need to keep crashing systemd-journald in this process), it buys time. In other cases, it makes exploitation infeasible.
2. CFLAGS hardening works, in addition to ASLR, it's the last line of defense for all C programs. As long as there are still C programs running, patching all memory corruption bugs is impossible. Using mitigation techniques and sandbox-based isolation are the only two ways to limit the damage. All hardening flags should be turned on by all distributions, unless there is a special reason. Fedora turned "-fstack-clash-protection" on since Fedora 28 (https://fedoraproject.org/wiki/Changes/HardeningFlags28).
If you are releasing a C program on Linux, please consider the following,
-D_FORTIFY_SOURCE=2 glibc hardening
-Wp,-D_GLIBCXX_ASSERTIONS glibc++ hardening
-fstack-protector-strong stack smash protection
-fstack-clash-protection stack clash protection
-fPIE -pie better ASLR protection
-Wl,-z,noexecstack don't allow code on stack
-Wl,-z,relro ELF hardening
-Wl,-z,now ELF hardening
For a more comprehensive review, check
* Recommended compiler and linker flags for GCC:
* Debian Hardening
> The exploit needs 10 minutes to crack ASLR on x86, but 70 minutes on amd64.
Is there any realistic threat model under which the difference between 10 minutes and 70 minutes is the difference between "insecure" and "secure"?
> Using mitigation techniques and sandbox-based isolation are the only two ways to limit the damage.
I'm not at all convinced that mitigation techniques represent a real improvement in security, because by definition a mitigation technique is not backed by a solid model. If you're letting an attacker control the modification of memory that your security model assumes isn't modifiable, how confident can you be that ad-hoc mitigations for all the ways you could think of to exploit that cover all the possible ways to exploit that? E.g. I can remember a time when ASLR was touted as a solution to C's endemic security vulnerabilities; now cracking ASLR as part of vulnerability exploitation is routine, as seen here. Mitigations appear to give a security improvement because an app with mitigations is no longer the low-hanging fruit, but I suspect this is a case of "you don't have to outrun the bear": as long as there are C programs without mitigations, attackers will go after those first. That's different from saying that mitigations provide substantial protection.
So in an “attack was detected, break all the glass” scenario, the difference between 10 and 70 minutes is sufficient to allow human operators to render the attack moot by offlining its target, while the attackers are still trying to break through API servers.
At both big corps I’ve been at, the incident response plan for an exfiltration attack on customer data was invalidate DB creds and take the system down ourselves.
Better to be out of service than lose custody of customer data.
How about an intrusion detection system that flags up a human response? 10 minutes is hardly any time at all to respond, an hour gives you a chance to roll out of bed.
That seems like something good to be able to turn on in a stock kernel, but not with that high a timeout. Imagine your shell failing to start another process for 30 seconds after you're debugging a segfault, or your browser failing to open another tab for 30 seconds after one crash.
500ms would drastically slow brute-force attacks without noticeably inconveniencing the user (and then they can always turn it off manually when doing something like fuzz testing).
30 seconds, predictably? I'd take it any day over that.
In either case, it still feels like pulling all things into systemd creates a much harder to protect surface area on systems. Why should init care if your logger crashes, let alone take down init with it? I am not a anti-systemd person but I honestly do see the tradeoffs of the "let me do it all" architecture as a huge penalty.
It cares in the same way it cares about all the other processes. There's nothing systemd-specific here. Journald service is configured to restart of crash, same as many other services.
It's not taking down init when journald crashes either.
Well, except journald itself.
100% this. Also, as I understand it the exploit would not exist if it was literally just outputting log lines to a file in /var/log/systemd/ ?
EDIT: Also as I understand it, appending directly to a file is just as stable as the journald approach, given that many, many disk controllers and kernels are known to lie about whether they have actually flushed their cache to disk (actually moreso, because the binary format of journald is arguably more difficult to recover into proper form than a timestamped plaintext -- please correct me if I'm wrong, though!!)
It depends what you mean by recover. To get the basic plaintext, you can pretty much run "strings" on the journal file and grep for "MESSAGE=". It's append-only so the entries are in order. Just because it's a binary file doesn't mean the text itself is mangled. (Unless you enable compression)
The reference may look complicated
but that's all extra features you may ignore for "recovery in emergency".
They're separate processes. Logger crashes do not take down init.
> I am not a anti-systemd person
Whether you are or not, you are (inadvertently) repeating misinformation about it.
A few minutes of high load can easily get overlooked.
Some systems run hard like this by default. See Transcoding
Time is given here just for an example. To crack systemd, it only takes 70 minutes, but in general, bruteforcing ASLR on 64-bit systems can take as few as 1.3 hours but as many as 34.1 hours, depending on the nature of bug. On the other hand, the ~20-bit of entropy on 32-bit systems is trivial to crack in 10 minutes for nearly all cases, and does not provide an adequate security margin.
Oon a 64-bit system there is ~32-40 bit of ASLR entropy available for a PIE program. It forces an attacker to brute-force it. Unlike other protections, no matter how is the system cleverly analyzed beforehand, it taxes the exploit by forcing it to solve a computational puzzle. This fact alone, is enough to stop many "Morris Worm"-type remote exploitations (they have suddenly became a serious consideration, given the future of IoT), since an exploit takes months or years to crack a single machine.
If it's not enough (it is not, I acknowledge ASLR by itself cannot be enough), an intrusion detection system should be used, and it already has used by many. For example, PaX offers an optional, simple yet effective anti-bruteforce protection: if the kernel discovers a crash, the `fork()` attempt of the parent process is blocked for 30 seconds. It takes years before an attacker is able to overcome the randomization (so the attacker is likely to try something else). In addition, it also writes a critical-level message to the kernel logbuffer, the sysadmin can be notified, and possibly uncover the 0day exploit the attacker has used. I'd call it a realistic threat model.
Finally, information leaks is a great concern here. Kernels and programs are leaking memory address like a sieve, and effectively making ASLR useless. Linux kernel is already actively plugging these holes (but with limited effectiveness, HardenedBSD should be the future case-study), so should other programs.
> e.g. I can remember a time when ASLR was touted as a solution to C's endemic security vulnerabilities; now cracking ASLR as part of vulnerability exploitation is routine, as seen here.
You can make the same comment on NX bit, or W^X/PaX, or BSD jail, or SMAP/SMEP (in recent Intel CPUs), or AppArmor, or SELinux, or seccomp(), or OpenBSD's pledge(), or Control Flow Integrity, or process-based sandboxing in web browsers, or virtual machine-based isolation.
Better defense leads to better attacks, and it in turns leads to better defense. By playing the game, it may not be possible to win, but by not playing it, losing the game is guaranteed. In this case, systemd is exploitable despite ASLR, due to a relatively new exploit technique called "Stack Clash", and for this matter, GCC has already updated its -fstack-check to the new -fstack-clast-protection long before the systemd exploit was discovered. If this mitigation has been used (like, by Fedora and openSUSE), it causes simply a crash, and is not exploitable. At least before the attacker finds another way round.
Early kernels and web browsers have no memory and exploit protections whatsoever: a single wrong pointer dereference or buffer overflow is enough to completely takeover the system. Nowadays, an attack needs to overcome at least NX, ASLR, sandboxing, and compiler-level mitigation, and we still see exploits. So the conclusion is all mitigations are completely useless? If it's your opinion, I'm fine to agree to your disagreement, many sensitive C programs need to be written in a memory-safe language anyway. But as I see it, as long as there are still C programs running with undiscovered vulnerabilities, and as long as attackers have to add more and more up-to-date workarounds and cracking techniques (ROP, anyone? but now the most sophisticated attackers are moving to DATA-ONLY attacks) to their exploit checklist, then we are not losing the race by increasing the cost of attacks.
On the other hand, if an attacker don't have to use an up-to-date cracking techniques, then we have serious problems. For example, broken and incomplete mitigation is often seen in the real word, and it's the real trouble. Recently, it has been discovered that the ASLR implementation in the MinGW toolchain is broken, allowing attackers to exploit VLC using shellcode tricks from the 2000s (https://insights.sei.cmu.edu/cert/2018/08/when-aslr-is-not-r...). And we still see broken NX bit protection and the total absence of any ASLR, or -fstack-protector in ALL home routers (https://cyber-itl.org/2018/12/07/a-look-at-home-routers-and-...).
The principle of Defense-in-Depth is that, if the enemies are powerful enough, it's inevitable all protections will be overcame. Like the Swiss Cheese Model (https://en.wikipedia.org/wiki/Swiss_cheese_model), a cliche in accident analysis, eventually there will be something that managed to find a hole in every layer of defense and pass though. What we can do, is to do our best at each layer of defense to prevent the preventable incidents, and adding more layers when the technology permits us.
My final words are: at least, do something. ASLR is already implemented as a prototype, analyzed, and exploited by clever hackers back in 2002 (http://phrack.org/issues/59/9.html), but only seen major adoptions ten years later. It would be a surprise if ASLR-breaking techniques has not improved given the inaction of most vendors.
> "Proof" suggests a level of absolute confidence that this example certainly does not give.
I agree. I should've use "given more empirical evidences" instead of "given a proof".
For real security, I believe memory-safe programming (e.g. Rust), and formal verification (e.g seL4) are the way forward, although they still have a long way to go.
I can, and I would.
> or virtual machine-based isolation
A little different because a VM can be designed to offer a rigid security boundary (with a solid model behind it) rather than as an ad-hoc mitigation technique.
> So the conclusion is all mitigations are completely useless? If it's your opinion, I'm fine to agree to your disagreement, many sensitive C programs need to be written in a memory-safe language anyway. But as I see it, as long as there are still C programs running with undiscovered vulnerabilities, and as long as attackers have to add more and more up-to-date workarounds and cracking techniques (ROP, anyone? but now the most sophisticated attackers are moving to DATA-ONLY attacks) to their exploit checklist, then we are not losing the race by increasing the cost of attacks.
> The principle of Defense-in-Depth is that, if the enemies are powerful enough, it's inevitable all protections will be overcame. Like the Swiss Cheese Model (https://en.wikipedia.org/wiki/Swiss_cheese_model), a cliche in accident analysis, eventually there will be something that managed to find a hole in every layer of defense and pass though. What we can do, is to do our best at each layer of defense to prevent the preventable incidents, and adding more layers when the technology permits us.
> For real security, I believe memory-safe programming (e.g. Rust), and formal verification (e.g seL4) are the way forward, although they still have a long way to go.
I think the defense in depth / swiss cheese approach has shown itself to be a failure, and exploit mitigation techniques have been a distraction from real security. It's worth noting that systemd is both recently developed and aggressively compatibility-breaking; there really is no excuse for it to be written in C, mitigations or no. Even if you don't think Rust was mature enough at that point, there were memory-safe languages that would have made sense (OCaml, Ada, ...). Certainly there's always more to be done, but I really don't think there's anything that would block the adoption of these languages and techniques if the will was there.
"You can make the same comment on NX bit, or W^X/PaX, or BSD jail, or SMAP/SMEP (in recent Intel CPUs), or AppArmor, or SELinux, or seccomp(), or OpenBSD's pledge(), or Control Flow Integrity, or process-based sandboxing in web browsers, or virtual machine-based isolation."
You can indeed say that about all those systems since they mix insecure, bug-ridden code with probabilistic and tactical mechanisms that they prey will stop hackers. In high-assurance security, the focus was instead to identify each root cause, prevent/detect/fail-safe on it with some method, and add automation where possible for these. Since a lot of that is isolation, I'd say the isolation based method would be separation kernels running apps in their own compartments or in deprivileged, user-mode VM's. Genode OS is following that path with stuff like seL4, Muen, and NOVA running undearneath. First two are separation kernels, NOVA just correctnes focused with high-assurance, design style.
Prior systems designed like those did excellent in NSA pentesting whereas the UNIX-based systems with extensions like MAC were shredded. All we're seeing is a failure to apply the lessons of the past in both hardware and software with predictable results.
"Better defense leads to better attacks, and it in turns leads to better defense. By playing the game, it may not be possible to win, but by not playing it, losing the game is guaranteed. "
Folks using stuff like Ada, SPARK, Frama-C w/ sound analyzers, Rust, Cryptol, and FaCT are skipping playing the game to just knock out all the attack classes. Plus, memory-safety methods for legacy code like SAFEcode in SVA-OS or Softbound+CETS. Throw in Data-Flow Integrity or Information-Flow Control (eg JIF/SIF languages). Then, you just have to increase hardware spending a bit to make up for the performance penalty that comes with your desired level of security. Trades a problem that takes geniuses decades to solve for one an average, IT person with an ordering guide can handle quickly on eBay. Assuming the performance penalty even matters given how lots of code isn't CPU-bound.
I'd rather not play the "extend and obfuscate insecure stuff for the win" game if possible since defenders have been losing it consistently for decades. Obfuscation should just be an extra measure on top of methods that eliminate root causes to further frustrate attackers. Starting with most cost-effective for incremental progress like memory-safe languages, contracts, test generation, and static/dynamic analysis. The heavyweight stuff on ultra-critical components such as compilers, crypto/TLS, microkernels, clustering protocols, and so on. We already have a lot of that, though.
"For real security, I believe memory-safe programming (e.g. Rust), and formal verification (e.g seL4) are the way forward, although they still have a long way to go. "
Well, there you go saying it yourself. :)
"Early kernels and web browsers have no memory and exploit protections whatsoever"
Yeah, we pushed for high-assurance architecture to be applied there. Chrome did a weakened version of OP. Here's another design if you're interested in how to solve... attempt to solve... that problem:
I hesitate to call stack probing "hardening". IMO it's better understood as a failure by compilers to emit proper code in the first place, and it's been a glaringly obvious deficiency for years if not decades.
The proper job of the compiler is to make sure that the generated code doesn't blow the stack in a way that overwrites random memory, whether because of alloca, a large stack-allocated object (of static or dynamic size), or recursion. It's true that blowing the stack in C is undefined whereas in a language like Rust it's supposed to terminate the program. But that's beside the point because Rust didn't actually implement stack probing either and therefore was just as susceptible to these vulnerabilities.
The only sane, acceptable behavior for the compiler is to generate stack probes for any stack allocations that may exceed the page size; not only for alloca, but even for regular, non-array objects which happen to be large. Both programmers and compilers for complex languages like C++, Rust, and Swift aggressively attempt to stack allocate as much as possible, which makes the issue even more acute for those languages. As others have hinted, both alloca and dynamic arrays have been frowned upon in C for a long time (C99 added dynamic arrays principally for the Fortran crowd, who typically consume trusted data, anyhow). The fact that most of the stack smash and stack clash exploits you see are for C is a consequence of most popular software libraries being written in C, and researchers being most familiar with developing PoCs for C-based codebases.
 Crucially, C99 dynamic arrays have block-scoped lifetimes whereas alloca allocations have function-scope lifetimes. Meaning calling alloca in a loop is doubly crazy. Early C99 implementations that simply reused the pre-existing alloca intrinsic were buggy. C99 compound literals were also buggy for several years for somewhat similar reasons.
 In truth it's de facto undefined in most languages, because most language designers and compiler authors have historically been content to ignore the issue.
 To be clear, I'm not saying that C only seems error prone because its popular. I'm only speaking to the particular issue of stack allocation overflow.
You can start with Ada83 as an example of this feature.
Without evidence to the contrary, I would assume that a GCC- or LLVM-based Ada implementation would also be susceptible to stack overflow in the same way--pathological allocation patterns that silently bypass the system's "this could never happen in the real world" assumptions. And just like with C, the fault would lie with the compiler, not the language.
Again, I realize the behavior in C is technically undefined, but its undefined precisely for the reason of permitting the implementation to do the most sane thing for the environment, such as sharing mechanism and semantics with sister languages like Ada or Rust.
Not every language community is so full of UB love as C and C++ ones, specially those where safety trumps performance in language design.
Naturally if the code segment is writeable and one uses Assembly rewriting as attack vector, then anything goes.
Indeed, I presume stack probing took so long partly because, short of memset'ing the entire stack frame on entry, ensuring contiguous initialization is non-trivial. But no matter how difficult, I'd bet it's less difficult than proving the generated code is safe without explicit stack probing.
 I'm reminded of the infamously brilliant design of Soft Updates for FFS, where the order of operations was meticulously rearranged in the file system implementation and formally proven to result in a stream of atomically consistent disk writes without having to change the on-disk layout. Modifying softdep filesystem code is notoriously tricky. By contrast, a journal is both easier to write and hack on.
EDIT: Perhaps you meant that triggering SIGSEGV would be non-compliant? Stack probes don't necessarily need to touch a guard page. AFAIU on Windows you can just query the TCB for the stack size, but it's substantially faster and in some respects easier to simply trap SIGSEGV (pretty sure Java does this), and the runtime is still free to rethrow SIGSEGV. If you mean stack overflow in Ada is supposed to throw a language-level exception, that's a rather trivial detail that can be accomplished equally well whether probe failures occur inline or asynchronously. In any event, I think my larger point about how to frame the issue and where culpability and responsibility reside still stands.
Ada Core, Green Hills, PTC (owns former IBM and Aonix compiler divisions), DDC-I.RR Software, OC Systems.
If the implementation is not able to validate stack size correctness on function entry and throw a stack allocation exception on failure, then it is a compiler bug.
In Ada this is a required runtime check, unless explicitly disabled.
The only way the stack layout would be corrupted, in a bug free Ada compiler, is to explicitly disable such check and make use of unchecked pointers in unsafe code.
Stack probing isn't the only option. Other options include fixed maximum sized stacks, architectures with separate stack and heap address spaces, etc.
However, you seem to be right that the program you linked is technically well-defined C, because the C11 spec doesn’t explicitly address stack usage. Not only does it not set a minimum requirement for the limits of local variable usage or function recursion, as far as I can tell, it doesn’t even acknowledge that such limits could exist! But if the program is well-defined, it ought to be able to execute to completion. Aborting the process, even cleanly, is no more acceptable than corrupting memory. Thus, a compliant implementation would have to have an infinite amount of memory. Since that’s a bit unreasonable to ask… it’s probably better to treat stack overflow as implicitly UB. The allowable level of stack usage could then be treated as implementation-defined.
Which is not to say that compilers shouldn’t try to handle stack overflow sanely. Stack probing was long overdue, and I’d love to see better support in mainstream compilers for static max-stack-usage analysis, among other things. It’s just that the C standard is probably not the right place to mandate such things, considering how conservative and compatibility-oriented it tends to be.
-fcf-protection=full ROP protection
It's relatively expensive, but is considered essential to prevent attackers from smashing the stack, and is enabled in most web browser engines.
-fstack-clash-protection stack clash protection
-fPIE -pie better ASLR protection
Red Hat has a performance analysis.
Arch Linux has also benchmarked its performance impact before deciding to enable it.
-Wl,-z,noexecstack don't allow code on stack
-Wl,-z,relro ELF hardening
-Wl,-z,now ELF hardening
If your malloc() implementation is slow, fix that! tcmalloc and jemalloc have been around for a long time.
or it at least made it harder for the code to be analysed by automated tools:
> CFLAGS hardening works, in addition to ASLR, it's the last line of defense for all C programs. [...] Using mitigation techniques and sandbox-based isolation are the only two ways to limit the damage.
Mitigations are here to limit the damage, not to eliminate vulnerabilities and exploits.
Looking for bugs and improving the code for security for as hard as you can, and in case you missed one (you certainly will), CFLAGS and ASLR is your last line of defense, and you can only hope for the best that it is able to buy you some time to patch it, before a better exploit appears...