Hacker News new | past | comments | ask | show | jobs | submit login
Mandatory enforcement of indirect branch targets (undeadly.org)
234 points by peter_hansteen on July 14, 2023 | hide | past | favorite | 130 comments



For anybody unfamiliar with this, as I was, this appears to refer to Intel's Indirect Branch Tracking feature[1] (and the equivalent on ARM, BTI). The idea is that an indirect branch can only pass control to a location that starts with an "end branch" instruction. An indirect branch is one that jumps to a location whose value is loaded or computed from either a register or memory address: think calling a function pointer in C.

Without IBT, you'd have this equivalence between C and assembly:

    main() {
        void (*f)();
        f = foo;
        f();
    }

    void foo() { }

    ---

    main:
        movl $foo, %edx
        call *%edx
        ret

    foo:
        ret
If IBT is enabled, the above code triggers an exception because foo doesn't begin with an "end branch" instruction. When IBT is enabled by the compiler, the above code gets assembled as:

    main:
        endbr64 
        movl $foo, %edx
        call *%edx
        ret

    foo:
        endbr64
        ret
Now the compiler inserts endbr64 at the start of each function prologue. The reason for this feature, is to use as a defense in depth against JOP, and COP attacks, as it means that the only "widgets" available to you are entire functions, which can be far harder to exploit and chain.

[1]: https://www.intel.com/content/dam/develop/external/us/en/doc...


The fun fact being that older CPUs decode ENDBR64 as a slightly weird NOP (with no architectural effects), but it'll fault on original Pentiums: https://stackoverflow.com/questions/56120231/how-do-old-cpus...


There's a good question in the comments there that I still don't see the answer to. How does this work if there's an interrupt between the branch and the endbranch? Does the OS need to save/restore the "branchness" bit?


Yes, on arm the branch type is saved in SPSR_EL1 in the BTYPE field. That stands for Saved Program State Register for Kernel Mode (Exception Level 1) and Branch Type. https://developer.arm.com/documentation/ddi0595/2021-12/AArc...


there is no branchness bit, if there's an endbranch you can jump to it


Ah so when you return from an interrupt, the check is no longer done?


I'd assume so since it wouldn't be a call/jmp coming from a computed address in a register. That said I haven't read the documentation for any of this. But interrupts should be having a stack pointer change and other things happening that would be different, which is why they use the IRET instruction and not the RET one.


Various architectures do other interesting things with NOPs, IIRC one convention on PowerPC had something vaguely related to debugging or tracing (I can't remember the details or find any references right now).


Not just architectures, but different OSes and ABIs have found ways to repurpose no-ops. One example[1] is Windows using the 2-byte "MOV EDI, EDI" as a hot-patch point: it gets replaced by a "JMP $-5" instruction which jumps 5 bytes before the start of a function into a spot reserved for patching. That 5 bytes is enough to contain a full jump instruction that can then jump wherever you need it to.

## Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?

[1]: https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=95...


Interesting, thanks for pointing this out! Just yesterday I was gazing at some program containing two consecutive xor rax, rax. I thought what’s the point? But as you point out it might be a NOP sled designed to be that specific length.


That would be surprising. xor is often used like that to set a register to 0, which is far from a nop. I'm not sure why it would do it twice, but it might be as simple as the compiler being stupid.


The second one is effectively a nop though.

The fact that it’s xor rax, rax rather than xor eax, eax is also interesting as it’s one byte longer for exactly the same effect (modifying the bottom 32 bits of a register clears the upper 32 bits). It makes me think there’s something weird going on other than compiler stupidity. I’d be interested in seeing the code it was compiled from.


I wonder if this is still true. Whenever I go to hook Win32 API functions, I use an off-the-shelf length disassembler to create a trampoline with the first n bytes of instructions and a jmp back, and then just patch in a jmp to my hook, but if this hot-patch point exists it'd be a lot less painful since you can avoid basically all of that.

Though, I guess even if it was, it'd be silly to rely on it even on x86 only. Maybe it would still make for a nice fast-path? Dunno.


Good read. Thank you.

This just worsens my fear of changing "unnecessary" code when I don't know the original motivation for it.


Intel Vtune will do this with 5-byte NOPs directly. I think LLVM's x-ray tracing suite did this with a much bigger NOP, also, to capture more information.


RISC-V has a whole HINT space that's basically just morphs of load immediate into zero register.

AArch64 has a similar space: https://developer.arm.com/documentation/ddi0596/2020-12/Base...

And yes, PowerPC has a similar space as well holding hints like 'give priority to the other hardware threads on this core' and the like. https://utcc.utoronto.ca/~cks/space/blog/tech/PowerPCInstruc...


I was wondering where did I read about PowerPC, and this is exactly the article! So, it was for thread priority. Strikes me as an odd design choice, this probably should've been something to be managed by the OS more explicitly.


I think the idea of exposing it to user space is to better handle concurrency before trapping into the kernel.

So consider the case of a standard mutex in the contended case. Normally the code will spin for a little bit before informing the kernel scheduler on the off chance that the thread that owns the lock is currently scheduled on another hardware thread. In that case it's in the best interest of the thread trying to grab the lock to shift most of the intracore priority to any other hardware threads so that it can potentially help the other hardware thread holding the lock get to a point where it gives up the lock quicker.


https://www.ibm.com/docs/en/aix/7.3?topic=h-hpmstat-command:

“random_samp_ele_crit=name

Specifies the random criteria for selecting the instructions for sampling. Valid values for this option are as follows:

ALL_INSTR

All instructions are eligible. This value is the default setting.

LOAD_STORE

The operation is routed to the Load Store Unit (LSU); for example, load, store.

PROB_NOP

Sample only special no-operation instructions, which are called Probe NOP events.

[…]”


Some MIPS cores had a superscalar NOP that would stall every ALU by one cycle, which was necessary because they lacked synchronization instructions.


That’s really clever use of the opcode space. Thanks for passing that along.


NOP on intels is in fact xchg eax, eax


It was an old joke that the opposite of "goto" is "come from", or that if goto is considered harmful, nobody said anything about a "come from". Marking something as a branch target reminds me of this.

https://en.m.wikipedia.org/wiki/COMEFROM


> GOTO considered harmful

COMEFROM considered harm-mitigating

It ingeniously makes Return Oriented Programming (ROP) a lot harder.


> COMEFROM considered harm-mitigating

You know, that’d be a fantastic OpenBSD release name.

Here’s hoping a dev sees this comment; there’s already been a few commenting in this thread.


Interesting. Seems like enforcement on Intel CPUs is supported since Tiger Lake (so ~2020). Windows has basically the same feature implemented in software since 2015, called Control Flow Guard [1]. I wonder what the story there is, and if Windows has any plans to (get everyone to) switch to the hardware version once those CPUs have sufficient market share.

1: https://learn.microsoft.com/en-us/windows/win32/secbp/contro...


Windows also recently implemented a far better version of this called Extended Flow Guard (XFG) that not only checks whether the location is a valid destination, but also whether it's a valid destination for that specific source.

For example, for any virtual function call or function pointer call, the destination must have a correct tag with the hash of the arguments. It's much more secure, and also faster, since loading the tag from memory can be merged with loading the actual code after it.

I wish this was the one implemented in hardware..


There’s a great article on XFG here [0] but it observed that a failed XFG check downgrades to a regular CFG check instead of a denial.. meaning it adds zero extra protection? Perhaps this behavior has changed since the preview they tested, though!

[0]: https://www.offsec.com/offsec/extended-flow-guard/


That can't be right, it would be entirely pointless then. It looks like the article was written during a pre-release time, so maybe it wasn't fully enabled?

I've not yet been able to use XFG in any production software, due to the requirement of rebuilding every static linked library with it enabled. But it didn't seem to fall back to CFG when I was testing it in a toy program.


That does sound like it would be more robust, but definitely sounds like it'd require a lot more silicon than the IBT that they did implement. Something like it might be something that comes in some future revisions.


ARM does it!


Interesting. I was able to get Clang to generate this using `-fcf-protection=branch`: https://godbolt.org/z/rooP8vPsM

It looks like endbr64 is a 4-byte instruction. That could be a significant code size overhead for jump tables with lots of targets: https://godbolt.org/z/xTPToaddh


OpenBSD disables jump tables in Clang on amd64 due to IBT, some architectures also had jump tables disabled as part of the switch to --execute-only ("xonly") binaries by default, e.g: powerpc64/sparc64/hppa.

https://marc.info/?l=openbsd-cvs&m=168254711511764&w=2

E.g: https://marc.info/?l=openbsd-cvs&m=167337396024167&w=2


Any idea what the performance impact is?


Why should every function start with endbr64 command? Aren't functions usually called directly?

Also, is it required to insert endbr64 command after function calls (for return address)?


As to why they're not always called directly, imagine some code like this:

    int FooWithoutChecks(void *p);
    
    int Foo(void *p) {
      if (p == NULL) return -1;
      return FooWithoutChecks(p);
    }
In general the caller is expected to call Foo if they aren't sure if the pointer is nullable, or if they already know that pointer is not null (e.g. because they already checked it themselves) they can call FooWithoutChecks and avoid a null check that they know will never be true.

The naive way to emit assembly for this is to actually emit two separate functions, and have Foo call FooWithoutChecks the usual way. But notice that the FooWithoutChecks function call is a tail call, so the compiler can use tail call optimization. To do this it would inline FooWithoutChecks into Foo itself, so the compiler just emits code for Foo with the logic in FoowithoutChecks inlined into Foo. This is nice because now when you call Foo, you avoid a call/ret instruction, so you save two instructions on every call to Foo. But what if someone calls FooWithoutChecks? Simple, you just call at the offset into Foo just past the pointer comparison. This actually just works because Foo already has a ret instruction, so the call to FooWithoutChecks will just reuse the existing ret. This optimization also saves some space in the binary which has various benefits in and of itself.

The example here with the null pointer check is kind of contrived, but this kind of pattern happens a LOT in real code when you have a small wrapper function that does a tail call to another function, and isn't specific to pointer checks.


> Why should every function start with endbr64 command? Aren't functions usually called directly?

They're usually called directly, but unless the compiler can prove that they always are (e.g., if they're static and nothing in the same file takes the address), endbr64 is required.

> Also, is it required to insert endbr64 command after function calls (for return address)?

No, IBT is only for jmp and call. SS is the equivalent mechanism for ret.


> but unless the compiler can prove that they always are (e.g., if they're static and nothing in the same file takes the address), endbr64 is required

Then why not just have the compiler break down every non-static function into two blocks: a static function that contains all the logic, and a non-static function that just contains an IBT and a direct jump to the static function? (Or, better yet, place the non-static label just before the static one, and have the non-static fall through into the body of the static.) Then the static direct callsites won't have to pay the overhead of executing the IBT NOP.


The IBT NOP is "free" in that it will evaporate in the pipeline; it still has to be fetched and decoded to some extent, but it does not consume execution resources.

From a tooling perspective, what you're describing (two entrypoints for a function, the jump you mention is pointless) would require changes up and down the toolchain; it would affect the compiler, all linkers, all debuggers, etc. By contrast, just adding an additional instruction to the function prolog is relatively low-impact.

It's also worth noting that at the time code for a function is emitted, the compiler is not aware of whether the symbol will be exported and thus discoverable in some other module, or by symbol table lookup, so emitting the target instruction is essentially mandatory.


Doesn't seem like it'd be that difficult to make the change the other direction, i.e. keep endbr64 as-is as the default case, but if there's a direct jump/call to anywhere that starts with endbr64, offset the immediate by 4 bytes; could be done in any single stage of toolchain that has that info with no extra help. But yeah, quite low impact, might not even affect decode throughput & cache usage for at least one of the direct or indirect cases.


> Doesn't seem like it'd be that difficult

Show me the code -- better yet, submit it to the relevant projects! :)


That's absolutely doable, just... How much is predicted unconditional jump slower/faster than ENDBR64? What's the ratio of virtual/static calls in real-world programs? And while your last proposal ("foo: endbr64; foo_internal: <code>") evades those questions, it raises up questions about maintaining function alignment (16 bytes IIRC? Is this even necessary today?) and restructuring the compiler to distinguish the inner/external symbol addresses. Plus, of course, somebody has to actually sit down and write the code to implement that, as opposed to just adding "if (func->is_escaping) emit_endbr(...);" at the beginning of the code that emits the object code for a function body.


That sounds a lot like “add a prefix to the function with an endbr64 instruction”.


What is the overhead of executing the IBT NOP?


It's not "executed" per se. It consumes space in the cache hierarchy, and a slot in the front-end decoder. It won't ever be issued, but depending on the microarchitecture in question it might result in an issue cycle having less occupancy than it might have had in the case where the subsequent instruction was available.

With that said, the first few instructions of a called function often stall due to stack pointer dependencies, etc. so the true execution cost is likely to be even smaller than the above might suggest.


C allows for any function to be called via a function pointer, and functions can be in different translation units, so the compiler can't simply assume that a function will never be called indirectly and has to pessimistically insert endbr64 in order to maintain a reasonable ABI.

And no, as I understand it, this is only for branch/calls not returns.


Well, if the function is marked "static", the compiler can actually check whether the function's address is taken in the current compilation unit or not and omit/emit ENDBR64 accordingly (passing pointers to static functions to code in another compilation units is legal, and should still work).


Good catch. Yeah, as long as the functions address is never taken the compiler has a lot of leeway with static functions; it can even avoid emitting code for them entirely if it can prove they're never called or if it's able to compute their results at compile-time.


Yep. Or inline them at every call site if that makes sense to do based on the optimization level and flags.


Is this theoretically something lto could remove?


If you disable dlopen and ld_preload.


Dlopen() "sees" only functions marked as exported (with macro like DLLEXPORT on Windows), not every function or am I wrong? Is C that bad?


On openbsd at least, every global symbol is exported unless you use an explicit symbol list. It's unusual for executables.


A traditional compiler needs to insert them for all external functions, because other compilation units may make an indirect call.


In case anyone wants a very simple introduction to JOP/COP exploits and mitigations of this type: <https://www.theregister.com/2020/06/15/intel_cet_tiger_lake/>


Thank you for the explanation!


Theo had to get his digs in against Linux in that announcement. Why not just focus on what OpenBSD is doing, and maybe contrast it to what Linux does without the speculation that they will still be doing the same thing in 20 years.

He's unquestionably brilliant, but I've had a few encounters with him on the mailing lists and he is so quick to take offense where none was meant and drop into name-calling and insults. I don't really get it. He may have some deep insecurities.


It's an important comparison of the mechanisms, even in 2023, you can still find binaries on modern Linux distributions with executable stacks due to the fail-open design, 20 years later.

The fact that Linux hasn't learned the right lessons in 20 years, and has chosen to "double down" in respect to IBT/BTI, does not inspire confidence that they will ever fix it. I'd say his 20 year estimate was in fact being pretty generous given the evidence available.

https://news.ycombinator.com/item?id=21554975


It's the price you pay for never-break-userspace. OpenBSD is fine with the very small probability that an executable which doesn't do branch tracking will fail to run under the enforced rules. The answer to that is to recompile because you've still got the source, and if not, well, tough cookies.


> OpenBSD is fine with the very small probability that an executable which doesn't do branch tracking will fail to run under the enforced rules.

To clarify slightly, OpenBSD is fine with the very high probability that an executable will fail under new rules. Otherwise, yes.


> the very small probability that an executable which doesn't do branch tracking will fail to run under the enforced rules

Isn't it any indirect branch in any program that will trip BTI/IBT? So most programs? I guess I disagree with the `small probability ` part.


Tough cookies translates for many people into: OpenBSD is not for me. The 'very small probability' likely approaches '1' for sufficiently old enough stuff. And even if you do have the source, does it still build without substantial work? Backwards compatibility is not something to toss out the window without thinking through the consequences.


> It's an important comparison of the mechanisms, even in 2023, you can still find binaries on modern Linux distributions with executable stacks due to the fail-open design, 20 years later.

Unfortunately, for C code using GCC’s nested functions extension (or for languages that want to be ABI-compatible with C and support nested functions, like that paragon of advanced features called Pascal /s ), there’s no other compilation strategy in current ABIs. The patches to switch C (and not just Ada) to function descriptors[1] with an ABI break have been sitting on the GCC mailing list since approximately forever[2], but it doesn’t seem like there’s been any progress.

[1] The strategy is basically to compile (*fp)() not as

  call *%rax
but as (untested)

     test $1, %rax
     jz 1f
     mov 8(%rax), %r10
     mov (%rax), %rax
  1: call *%rax
thus essentially inlining the (currently stack-allocated) closure calling thunk at all indirect call sites. It is ABI-compatible on x86 and x86-64 with all code that does not involve nested functions, place functions at odd addresses, or tag function pointers itself (and I think with all arm64 and riscv code, although arm32’s usage of the low pointer bit for Thumb interworking is bound to make this trickier).

[2] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-01/msg00735.h...


That strategy won't fly with IBT.

Now all software must pay the price and miss out on important mitigations, for all eternity, just because of some largely unused feature in one compiler?


IBT is already further along here. The hypothetical solution for executable stacks is to recompile all of your nested-function-using or -calling code with -ftrampolines (except that won’t work without the patch above—silently, really GCC?..). The already real and working solution for IBT is to recompile all of your indirect-branch-using code with -fcf-protection=branch. So, ignoring the fact that nested functions are in practice much rarer, if you accept the former as valid you’ll need to accept the latter as well, as far as logic as concerned.

I wouldn’t characterize this as a “largely unused feature in one compiler” screwing things up, but rather as the ABI on most Linux and -adjacent platforms (except SysV Itanium and FDPIC IIRC) being incapable of supporting closures (without executable stacks). That these are missing from standard C, and only present in languages that are either niche (Pascal, Ada) or don’t care about following the platform ABI (Rust, Go, C++’s lambdas), is a defect of C (and that’s at least a somewhat popular opinion among ISO C committee members[1]).

Of course, OpenBSD essentially does not have a stable ABI, so it’s much freer to experiment here.

[1] https://thephd.dev/lambdas-nested-functions-block-expression...


The funny thing is that this attitude towards breaking changes is one of the reasons why Theo is able to make this comment at all. If he would allow breaking changes then OpenBSD adoption likely would be higher and that in turn would cause him to resist the kind of things that Linux would not be able to get away with.

It's clearly different philosophies leading to different outcomes with neither of them clearly better than the other, it just depends on what you need. It would be possible to make that statement in a more graceful way.


"I have altered the ABI. Pray I do not alter it further." -- Theo de Raadt

https://marc.info/?l=openbsd-tech&m=157489277318829&w=2


Theo himself considers OpenBSD a “research” OS, so I don’t think he’ll ever consider OpenBSD going mainstream, especially as it allows stuff like this to happen.


Indeed, so it's apples-to-oranges.


That part doesn't look like a "dig" or an insult to me.

It seems like a reasonable, relevant, and plausible assessment of how the long-term outcomes may likely differ between OpenBSD's stricter approach versus a looser approach, specifically when it comes to the degree of security offered (which is one of OpenBSD's main focuses), based on a past situation that's similar.

How do you know that you aren't being, to use your words, "quick to take offense where none was meant" in this case?


> How do you know that you aren't being, to use your words, "quick to take offense where none was meant" in this case?

Past knowledge about Theo?


I wouldn't have it any other way. I love the OpenBSD mailing lists. Always an entertaining read when Theo gets involved.


upvoted and +1. Theo has been an important leader in OSS for decades: his brevity and impatience is a net positive. also he is usually correct.


That's the problem with many brilliant people: what they perceive as their interlocutors being deliberately obtuse on some completely obvious point is actually their interlocutors being just as smart as they always are on some point that is not obvious at all to them.


Perception of relative intelligence or sensible decision making is irrelevant. Just because you think you're doing a better job doesn't mean you need to shit on the other person.

You could not mention Linux at all, or you could even say "we think this is better than Linux's approach because of X" and it would be a great improvement.

I have always found it interesting that Rust purposefully avoided doing language comparisons - "we're better than Python like this and better than C like that". Their message purposefully avoided any positioning of it as a competition, instead focusing just on articulating Rust's value. It was an eye opening approach given our instinct is normally to pit things against each other.


I think parent agrees with you.


This is my main takeaway too. As a one time OpenBSD enthusiast (and still admirer), now I'm a bit older I find the continual smugness starts to grate.

Truth is, Linux has a lot more constraints on how it can implement something because it has users. Users that have all sorts of different ways they need it to work.


I think it's great that he's calling Linux's choices out. TBH Theo's attitude is borne of Linux's culture - most older school sec people have learned that this is the only way to get things to improve.


Are Theo and Linux more alike than OpenBSD and Linux?


Now, yes. Linus wasn't always so abrasive though. At some point he caught up to Theo.


Linus has been trying to calm down in recent years, in large part because he decided he no longer wanted to be lumped in with the crowd that endlessly complains about political correctness.

https://www.bbc.com/news/technology-45664640


Yeah this is good stuff, and why I felt bad about making the comparison. Not saying Theo is in that camp, but Linus is trying to be less abrasive in general, and Theo is not.


Perhaps we're reading into their personalities more than we should, based on public social-media appearances.

Egos tend to become exaggerated when benevolent dictator types make public statements. Their candor and bluntness on a mailing list or Twitter may be completely different than their demeanor and their kindness toward collaborators in private.

Now we have the very public drama that happened between Theo and that "other BSD" team to create the original schism. But have we had any subsequent drama that caused breakups or forks? I don't know. OpenBSD manages to plug away and push releases out the door on schedule, right?

Linus doesn't seem to have a lot of internal contributor drama, judging by the way they also push releases out the door and merge pull requests and add features.

Really, if either Theo or Linus were unreasonable men, their teams would fall apart and they would cease to be leaders of anything. I think their leadership abilities speak for themselves: they've both been committed and dedicated to the same project since decades ago, and they've both built and maintained cohesive teams of contributors who seem to mostly stick around long enough to make a difference.

They are "thought leaders", if you will; perhaps not charismatic ones, but canny businessmen who know how to nurture their pet projects.


"Running a successful open source project is just Good Will Hunting in reverse, where you start out as a respected genius and end up being a janitor who gets into fights."

-- Byrne Hobart, https://web.archive.org/web/20200909035546/https://diff.subs...


> They are "thought leaders", if you will; perhaps not charismatic ones, but canny businessmen who know how to nurture their pet projects.

The problem is that for every Linus, Theo, or RMS, you have a dozen tactless buffoons who aren't a tenth as talented as any one of these individuals, are a chore to work with, and couldn't manage their way out of a paper bag. I've even seen some developers defend their lack of social skills by drawing comparisons to people like Linus and Theo.

That's why Linus shows an incredible amount of insight and maturity by purposefully and vocally trying to distance himself from that image and set a better example. He might be able to make being abrasive work, but most people can't.


Both their jobs are largely reading and writing emails on mailing lists. They are some of the few famous people where the paper trail is what counts for many intents and purposes.


> Are Theo and Linux more alike than OpenBSD and Linux?

Is a Canadian kernel developer more like a POSIX operating system than a POSIX operating system is like a POSIX operating system?

I'm not sure I understand. Perhaps you meant to write "Linus" since Linus is also a kernel developer? That seems more like apples to apples.


> He may have some deep insecurities.

Explains why he spends all his time developing mitigations


I’d just like to interject for a moment. What you’re referring to as Linux, is in fact, NotOpenBSD/Linux, or as I’ve recently taken to calling it, Linux as opposed to OpenBSD…


I still run OpenBSD where I can, especially where security is more important. Yes, it's still missing A LOT of functionally compared to other UNIX-like systems, but security bases tend to be well covered.


I find OpenBSD's hardware support especially lacking. It doesn't really work that well on at least 3 devices where I tried it on (all Dell laptops from various generations, 3-10 years old), whereas Linux runs perfectly out-of-the-box on all three.

Which is sad, as I kinda like the *BSD approach to things


Not my experience at all, it works very well with a new Acer laptop I own: the graphics work (Intel Xe - 12th gen processor), audio, touchpad, keyboard (and special keyboard keys like brightness), wifi... All I had to do is to download the firmware with fw_update, nothing more.

Also I was pleasantly surprised to hear they support Apple M1/M2 Macs. Asahi Linux gets a lot of press around here but I had no idea OpenBSD supported it.


> Yes, it's still missing A LOT of functionally compared to other UNIX-like systems

Could you give some examples/samples of things you have ran into off the top of your head?


Sure. Poor SMP support (but this has improved heavily over the years), ancient file system, no Bluetooth (not important if you don't need this), reduced performance (due to a lack of optimizations and security mitigations overhead), limited Wi-Fi support (this is for numerous reasons, but it's better than other BSDs)...

I could go on, but, for my needs, it works very well and some of its simplicities are a godsend.


I don't really buy their approach to security honestly. Trying to fix all bugs is great, but they provide little to prevent unknown bugs bing exploited (pledge is nice for software that opts in to use it, but otherwise not so much). I'd love to see them implement something like AppArmor with their approach, it would probably be amazing.

I actually think NetBSD is a pretty interesting alternative, it has some nice security features like veriexec that don't get talked about much.


I think in the past they tried to fix all the bugs, and realized they couldn't, so they started to build all sorts of mitigations in the same vein as the one you see posted here today. As for pledge, and the related mitigations, yes, they're not useful if you don't use them, but I see this as them innovating in the space and giving application developers more tools to build hardened applications.

I see tools like AppArmor as band-aids to fix problems that shouldn't exist in the first place. The problem with these approaches are the band-aids tend to break things in unexpected ways and when that happens they simply get removed and unused.


> I see tools like AppArmor as band-aids to fix problems that shouldn't exist in the first place.

I fundamentally disagree on that. I think tools like that are amazing at protecting against unknown threats/exploits. They let you lock down software and protect against future unknown exploits, badly behaving software, malicious employees etc. I think something similar should be a part of any OS claiming to be security focused. Basic DAC is woefully insufficient.

On the other hand, the industry has largely found other solutions like sandboxing, but I still think MAC or RBAC or whichever has a place, certainly as art of a defense in depth strategy.


> they provide little to prevent unknown bugs bing exploited

They provide plenty of mitigations (https://www.openbsd.org/innovations.html). In fact OP's article is for preventing unknown bugs from being exploited.


They don't provide any mitigations of the sort I was clearly referencing. Specifically, for restricting malicious code or users that already has access to the system, exploiting insecure software that was not compiled with pledge support.


What kind of mitigations would help here?


SELinux/RSBAC/AppArmor/grsecurity and similar.


These largely require buy-in from applications just like pledge.


They absolutely don't, that's the key difference.

What makes you think otherwise?


You can’t just stick sandboxing around arbitrary apps without them breaking.


The technologies I listed are not sandboxing, as that term refers to a different category of technology.

And you're right, kind of; you need to set the permissions for apps, but that doesn't mean they need cooperation from the software developers. The whole point is that they don't. With those technologies you can lock down complex closed source programs, something not possible with pledge.


Those seem to be of the category of “I have a program and I want to restrict what it does” which seems like a sandbox to me. The problem here is that trying to figure out what goes on this list is difficult for arbitrary programs, even when you’re the one writing it. When you’re just applying it to third party software it’s very likely something will not function correctly.


It's not a sandbox though, because it's a different type of technology. You can say it's a type of sandbox in concept, and you could make an argument, but referring to it as a sandbox in a technical discussion simply isn't correct.

> The problem here is that trying to figure out what goes on this list is difficult for arbitrary programs, even when you’re the one writing it. When you’re just applying it to third party software it’s very likely something will not function correctly.

That's why there are things like, for example, SELinux permissive mode, where you run the software as needed and observe the permissions it needs, and then grant it those permissions while denying everything else.


I mean the typical term used for such things is “mandatory access control” but they always get used to implement a sandbox so that’s what I call them.

Also, watching a program to see what it does is exactly the issue I’m talking about. You’re stuck with whatever behaviors you tested and everything else that you didn’t hit will fail (loudly if you’re lucky, silently if you’re not). There are platforms that do exactly what you’re talking about and believe me working on these rules is miserable. You’ll have reports on your desk like “the profiler doesn’t work anymore” (nobody tested this) or “on desktop controls don’t render anymore” (someone changed the implementation and it needs something you didn’t include in your rules). Again, this is when you control the stack, doing this for arbitrary programs is an order of magnitude harder.


Some implement role based access control or other access control paradigms as well. I just don't think sandbox is a good term, but I see where you're coming from.

I agree initial setup can be cumbersome, but I think it's worthwhile. I'm a fan of RSBAC personally, it's as powerful as SELinux but a lot simpler. If people run in permissive mode and test properly, not just run it and do a few things, but test every function exhaustively before setting up permissions, it should be good.

Really, it only has to be done once, and I think it's a worthwhile investment given the security gained.

That's what I was saying higher up in the thread though. OpenBSD is known for having good, simple implementations of complex stuff like this, so if they ever were itnerested in implementing a version, it would probably be amazing.


OpenBSD has these on while on compiling.


I'm working on adding ENDBR support to the DMD D compiler backend.


That's good to hear. Is this ENDBR stuff new? Always been there in amd64? i686 etc?


I don't know when it arrived.


Is this protection really all that helpful? Surely there are functions you can call into the top of to do your diabolical deeds for you.

It would be more helpful if callers would store some machine specific hash of the function prototype and the function itself would check the hash, so that you could only redirect to calling a function with the right signature.

But that would also increase the overhead further. Already this is bad enough that it makes jump tables unattractive (which is too bad, considering the usually jump tables have little to no risk of control flow redirection).


The entire field of ROP exploits would basically never have been developed if it were as simple as just calling the function you want.


A software solution provided by the OS or language can make this hardware solution irrelevant.


Windows does this in software, since approximately 8 years.

An advantage of the software solution is that you don't need to have the feature compiled into every library for it to work, you just lose protection in those parts. That makes for a much quicker rollout. Also faster iteration times, in the Windows Insider Preview you can get the extended version that also checks that the hashed function signature matches.

1: https://learn.microsoft.com/en-us/windows/win32/secbp/contro...


You've got it backwards: this hardware solution makes the software solutions irrelevant.


Nope. Here's the actual problem, in these crappy languages it's really easy for mistakes to result in a stack smash, so, these types of hacks aim to make it harder for the bad guys to turn that into arbitrary remote code execution. Not impossible, just harder. Specifically in this case the idea is that they won't be able to abuse arbitrary bits of function without calling the whole function, at a cost of some hardware changes and emitting unnecessary code. So maybe they can't find a whole function which works for them and they give up.

Using better languages makes the entire problem disappear. You don't get a stack smash, the resulting opportunities for remote code execution disappear.

It suggests that maybe the "C magically shouldn't have Undefined Behaviour" people were onto something after all. Maybe C programmers really are so wedded to this awful language that just being much slower than Python wouldn't deter them. There is still the problem that none of them can agree how this should work, but if they'll fund it maybe it's worth pursuing to find out how much they will put up with to keep writing C.


I think one could argue that all the software mitigations that aren't based on compile time proofs result in quite a bit more "emitting unnecessary code", if "unnecessary" is taken to mean "not strictly intrinsic to the task of the program". And undefined behavior is bad, but getting rid of it wouldn't be a silver bullet for this problem in C, I think. All undefined behavior could become "implementation defined" tomorrow, where the C compiler becomes more like a high-level assembler (again), and you could still jump the instruction pointer into arbitrary program text.


> All undefined behavior could become "implementation defined" tomorrow, where the C compiler becomes more like a high-level assembler (again), and you could still jump the instruction pointer into arbitrary program text.

Try to work this through in your head. Imagine how you need to specify the working of the abstract machine in order to allow this. How do we talk about an "instruction pointer" on the abstract machine? What are the instructions it's pointing to? Am I defining an entire bytecode VM?

Nah, instead you're going to do one of two things. One: "Undefined Behaviour" which we explicitly took off the table, or Two: "If this happens the program aborts". And with that the big problem evaporates. Does it make those C programmers happy? I expect not.


Implementation defined means the compiler must specify the behavior, but it has near total freedom, and it can define it specific to the target system. There is no abstract machine. If I use GCC on Linux x86-64, then there very much is an instruction pointer.


In the real world, compilers just specify that the behaviour is undefined and tell you to suck it up. But we're talking about a hypothetical where we aren't allowing Undefined Behaviour. Saying "Oh, but we can if we say it's the implementation choosing" is a get out which is meaningless for the hypothetical. Just refuse to engage with the hypothetical instead if you don't like it.


I'm using specific, standards defined language, that's relatively well known. For example, sizeof(int) is implementation defined, meaning it must have a documented definition, specific to the implementation (e.g., gcc x86_64-linux-gnu, it's 4).

In languages like C that are closer to the machine, not everything has to be specified strictly in terms of a generic abstract machine.

I'm not trying to be hostile or evasive or derisive, I'm just genuinely responding to your original comment, that I think missed on some important info. And my point was that if we imagine a different world from the real world we're in right now, where in this new world, all undefined behavior became implementation defined behavior, then there would still be a need for mitigations like endbr64. So I'm not painting a rosy picture for C. I just think undefined behavior is a red herring. Assembly doesn't have undefined behavior, but obviously you can have all sorts of issues there.


> Assembly doesn't have undefined behavior, but obviously you can have all sorts of issues there.

The machine is in the real world and is thus obliged to have some actual behaviour, but it is not always practical to discern what that behaviour would be let alone make it reliable across a product line and document it in an understandable way. As a result actually your CPU's documentation does in effect include "Undefined Behaviour".


True, when writing my comment I wanted to qualify it to the same effect, but thought it would be an unnecessary subtlety to the general thrust of my point. That is, we can ignore this kind of "undefined behavior in the machine itself" for the purposes of this particular discussion.


I don't see how to ignore it though. If we're defining the behaviour but then our "definition" just doesn't specify the actual behaviour because it's specified in terms of hardware with no clearly defined behaviour for that situation then it's just word play, we're not really doing what I set out.


If for the purposes of this discussion we can't ignore it at the machine level (because we're assuming higher level languages, crappy or otherwise, are unlikely to generate machine instructions that exhibit undefined behavior), then why were we discussing higher level languages and their crappiness at all? I'm not saying this to be snarky, I just mean that I really think the likelihood of machine undefined behavior being an issue is on the order of likelihood for cosmic rays to flip bits -- happens, and can't be ignored (buy ECC memory), but more interesting to talk about the things that we are many orders of magnitude more likely to experience, e.g., bugs in C programs, bugs in unsafe Rust, bugs in managed language runtimes, etc. I think those things are not all equally likely, but could all benefit from endbr64 type mechanisms, including in JIT output.

To be clear, unlike the comment root, I don't think this particular hardware mechanism obviates the need/benefits of related software mechanisms. But in terms of cost/benefit/applicability, endbr64 type mechanisms look pretty good all around.


I’m always amused by how many of OpenBSD’s mitigations are patching over something as basic as lack of bounds checking, yet they’ll never add bounds checking. And, as you said, those are all just speed bumps, not fixes.


It's only irrelevant if the hardware solution is available on all the supported architectures/systems. As long as it's not, the software version must be maintained anyway, and might suffer from bitrot if it's no longer exercised on the major architectures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: