I am not a fan of binfmt_misc. Please hear me out.. This introduces the same framework for future vulnerabilities that Microsoft introduced long ago by having the system interrogate file type associations. This turned into easy mode for attackers to trick people into executing malicious files on their system. This has not happened _yet_ in Linux AFAIK, but this sets the framework for it.
In Windows, you can see this mapping by opening a command prompt and typing:
assoc
It was common practice for some time to change VBS mapping to text, along with several other file types.
In my opinion, we are bringing a dangerous behavior into systems that could one day be commonly used as desktops.
Interesting point - one I hadn't considered. While I'm not trying to discount your valid concerns, Linux does have one other thing in it's favour and that is that files still need an executable bit to be set in order for binfmt_misc to even get called. Whereas on Windows the only requirement for something to be executable is the file extension to have an association with an interpreter. So on Linux you wouldn't have the same default behaviour because that executable bit isn't set by default.
Moreover, if you can get around that executable bit problem then you're no worse of with binfmt_misc than you would be without since you could legitimately create a text file called "cat.jpg" with a Bash shebang and that would launch Bash in the same was a somescript.sh might.
Of course, you could take things one step further and say if a malicious actor has access to your system then they might get around the execution bit problem by just calling the interpreter directly (eg "bash somescript.sh" or the /lib64/ld-linux-x86-64.so.2 /bin/sleep 30 example in the article). But that requires the actor has remote code execution rights rather than just permissions to write to the file system. And honestly by that point you've already lost the game anyway.
Ironically NTFS DACLs have an execute bit (on directories this allows traversal of the directory to reach a subdirectory you have permission to access, on files it controls execute). The kernel only enforces this on executables directly. Scripts are handled via user-space file extension associations.
Unfortunately the default inherited DACL on NTFS grants execute to all files, either via CREATOR OWNER, EVERYONE, POWER USERS, or ADMINISTRATOR.
Nearly all interpreters don't bother checking for execute, even though nothing stops Python, Wscript, et al from checking for execute before parsing a file. This probably comes down to a lack of #!-type mechanism on Windows, so there's no way to disambiguate between a script invocation and the user manually invoking the interpreter; you'd want to enforce +x for the former case but not the latter.
Like UAC vs sudo, culturally Windows simply didn't grow up enforcing these things so attempting to retrofit them causes tremendous pain and massive incompatibilities, despite the fact that the underlying system has robust support.
I'm curious what advantages that brings for security since executables can still be called via relative (./) and absolute paths.
The biggest advantage that immediately springs to mind is writable directories included in PATH but even there some Linux distributions do include ~/bin in their users PATH. Thankfully PATH is an order of precidence and ~/bin comes last negating the ability for a script in ~/bin from "overwriting" /usr/bin/ et al. (ie `~/bin/sed` wouldn't become the default sed). I believe in Windows - though I'm not in a position to check - also places the working directory as last precidence when checking for instances in PATH.
There maybe some other vulnerability I am completely unaware of though but that's what I've experienced when messing around with Windows
To those that modded me down, I'll explain my point bit better: the order of precidence I was referring to is the individual directories that appear in the delimited PATH environmental variable. (Semicolon delimited in Windows, colon delimited in Linux / UNIX).
In both Windows and Linux the directories that appear earlier in the list in PATH get priority over those that appear later. So if you have the PATH that says
/bin:/home/laumars/bin
Then any executables named 'sed' in my home directory's bin folder would NOT become my system default sed utility so long as an executable named 'sed' also existed in /bin. However if I swapped that order around:
/home/laumars/bin:/bin
now my home directory's bin folder DOES become the first place the system looks for executables.
Windows also applies this order of precidence to its PATH variable.
This is why I believe having a writable localised directory in PATH is a relatively mitigated attack so long as the PATH variable is set correctly from the outset. Sure you could plonk a randomly named executable and get that called as there isn't any overriding executables higher up in the precidence, but then you need some way of launching that irregular file name - by which point your attack might just as easily use a relative or absolute path to the executable, sidestepping the need for a targeted attack against the machines specific PATH hierarchy in the first place.
One thing I didn't mention in my previous post but is worth noting is that on Linux, any zero length paths in PATH will be expanded to current working directory. This could be either two adjacent colons ( /bin::/usr/local/bin ) a prefixing colon ( :/bin ) or preceding one ( /bin: ). This can be really easy to miss if you build up PATH through other variables. Eg
export PATH=$PATH:$GOBIN
Where you've forgotten to define GOBIN before appending it to PATH (ie GOBIN will return "")
So it is worth being extra careful how you're building up PATH when using variables
It's presumably easier for an attacker to get a malicious ls binary into some local directory and convince a privileged user to execute ls there than putting a malicious ls binary into /usr/bin.
Yeah, I already covered that point re writable directories in PATH (though maybe I wasn't all that clear?). But in your specific case you need the additional process of getting the privileged user to change to that directory as well and for there to be no executables with the same name in another directory in PATH with a higher precidence. By which point it's easier to just getting said administrator to just run an executable via its absolute path rather than social engineering it into PATH
There are certainly some mitigating controls that make this a better story than what Windows implemented.
The malicious actor may have indirect access, or access via a web service that might give them limited ability to call a command or file. So in those cases, this framework can potentially make the complexity level of a vulnerability lower and the overall risk higher.
For now, I just limit binfmt and debugfs mounts so that regular users (especially service accounts) can't even access them. I think this only partially reduces the risk, since systemd acts as a proxy to binfmt. SELinux also reduce some risks, but so many people disable it that I can't factor that in.
There is probably a better way to disable this behavior, but I am working around many new features of systemd that are blurring the line between desktops and servers. The above also fixed several race conditions in systemd on hyper-v VM's that are considered low priority bugs.
"Modern" desktop environments of course do have file type associations, but this is unrelated to binfmt_misc.
I can't think of any realistic threat model in which binfmt_misc would reduce security. If you are running untrusted code, it doesn't matter it's an ELF, a shell script, or a VBS script ran through Wine. It's game over anyway.
You don't know if you're running any code at all. You double click on a .jpeg file, and suddenly a python script is running. There are many ways to trick systems and users into running something when they didn't want to run anything, but binfmt_misc makes this easier for the attacker.
We're talking about a very specific situation where the attacker has enough privilege to add an association but not enough privilege to just run the malicious program.
Generally the file managers on Linux use "magic" to determine file type rather than extensions, so even if it would run you should know it's really a Python script.
The point is that binfmt_misc could allow you to run unstrusted code in a context where you were not expecting to run any code.
For instance, if I do "./budget.pdf", I expect it to run the code from my pdf reader. In this case, an attack would require exploiting a vulnerability in the pdf reader. That is to say, budget.pdf is not untrusted code, but rather untrusted data.
However, in this case the unstrusted data can claim that it is code; in which case it would be executed as such.
In contrast, when I do `evince budget.pdf`, the only way for budget.pdf to become code is to trick the pdf reader into thinking it is code.
Granted, this is largely a moot point as most people would be using a GUI file manager; which already has a common interface for running programs and opening data files; and already automatically determines which action to take.
> For instance, if I do "./budget.pdf", I expect it to run the code from my pdf reader. In this case, an attack would require exploiting a vulnerability in the pdf reader. That is to say, budget.pdf is not untrusted code, but rather untrusted data.
You cannot simply do ./budget.pdf. If you don't have the executable bit set you'd get a permission denied error (since that file doesn't have permission to execute). And even if you did have the execution bit set, it would fail because there isn't an interpreter configured to execute a pdf (which is the point of binfmt_misc in this context).
The executable bit is what Linux / UNIX / et al use to set whether something is executable code or not. Not the file extension (like in Windows).
> However, in this case the unstrusted data can claim that it is code; in which case it would be executed as such.
Not without setting the executable bit first. (eg chmod +x budget.pdf)
> In contrast, when I do `evince budget.pdf`, the only way for budget.pdf to become code is to trick the pdf reader into thinking it is code.
Two points:
1/ PDFs technically are code though. They're Turing complete.
2/ running `evince budget.pdf` is exactly the same as running `./budget.pdf` where PDF's are mapped to evince via binfmt_misc. Just as running `./my-script.sh` is literally the same as running `/bin/sh ./my-script.sh` (where the shebang is /bin/sh). So your example and counter-example are actually the same process getting called by the kernel.
It's the same thing with binaries that don't have a shebang, such as ELFs. There is a magic header (which the article mentions so I wont reiterate that) and if the binary is identified as an ELF binary so the kernel runs it by calling the ld loader, eg:
/lib64/ld-linux-x86-64.so.2 /bin/sleep 10
(you can actually run this yourself from Bash too)
Wrong comparison. PDF is code only in the same sense as LaTeX is. PDF is based
on PostScript, just as LaTeX is a set of macros on top of TeX, and we all know
TeX and PostScript are programming languages and are Turing complete (look
for a web server written in PostScript).
It's a common misconception that PDF is an extension of PostScript. PDF is only loosely based on PostScript. In particular, it doesn't support any control flow, like conditions or loops. It is not a programming language, unless you count embedded JavaScript.
Yes, but unfortunately the executable bit would be preserved if the file was delivered by way of a .tar.gz bundle. A reasonable user expectation would be to double-click on the .tar.gz to unravel it and then double click on the pdf to open it.
But again, the PDF would just open with the same program it would have regardless of whether it's called via binfmt_misc or not.
Besides, a much easier vector for attack in your example would be to create a shell script called "budget.pdf" since you then need to make fewer assumptions about the target. And since Linux doesn't care about file extensions, it's perfectly fine having a shebang prefixed script with a .pdf extension.
Which is one of many reasons why common advise is not to blindly run any executables you've just imported into your system.
> 1/ PDFs technically are code though. They're Turing complete.
That's not really relevant to the concern being raised, though. Although implied, the real concern is running code that has the same system access as your typical executable. That means opening arbitrary files or executing arbitrary executables. I doubt a PDF can do that.
> 2/ running `evince budget.pdf` is exactly the same as running `./budget.pdf` where PDF's are mapped to evince via binfmt_misc. Just as running `./my-script.sh` is literally the same as running `/bin/sh ./my-script.sh` (where the shebang is /bin/sh). So your example and counter-example are actually the same process getting called by the kernel.
This also misses the point. The concern is not that evince will be called, but that it trains the user be less explicit about what he/she wants to do, and that this could cause the system to, at some point, imply the user's intention wrong.
If the first bytes of a file called, `./budget.pdf`, are "#!/bin/bash", the user is most probably going to expect running his trusted evince executable to run pdf code that he/she doesn't need to put particular trust on because the PDF reader will keep it completely restricted from accessing other OS interfaces. What actually will happen, though, is that it's going execute code that has access to everything the user has access to.
We know that this is similar to how scripts are treated, but we don't want that. We need to put way different amounts of trust between data and code files, and having the way we express opening a file differ from running a file expresses that difference of trust to the system.
>> For instance, if I do "./budget.pdf", I expect it to run the code from my pdf reader. In this case, an attack would require exploiting a vulnerability in the pdf reader. That is to say, budget.pdf is not untrusted code, but rather untrusted data.
>You cannot simply do ./budget.pdf. If you don't have the executable bit set you'd get a permission denied error (since that file doesn't have permission to execute). And even if you did have the execution bit set, it would fail because there isn't an interpreter configured to execute a pdf (which is the point of binfmt_misc in this context).
>The executable bit is what Linux / UNIX / et al use to set whether something is executable code or not.
You can't have your cake and eat it too. :) I'm not even sure if you're really refuting gizmo686's point here. The discussion in this thread is about the potential use of binfmt_misc to open up data files. Because binfmt_misc requires it, it's a given that the executable bit will be set, because this leads us to extend the meaning of "executing" something to opening up data files. I mean "[executing] a pdf"? This would also lead to "executing" png, jpg, zip, mp4, etc. If use of binfmt_misc for this becomes widespread, the executable bit will hardly protect a thing anymore.
I get your point but the risk of this becoming widespread enough that your nightmare scenario could happen is zero. It's not a standard way to use binfmt_misc, the only people who do are going to be nerds like us so on their own head be it. Plus regular users wanting to open PDFs et al will do so in the GUI rather than command line. The GUI has its own checks to make double clicking safe.
However I do take your point that if for some bizarre reason Ubuntu and derivatives did start configuring binfmt_misc to open documents and recommending users getting into the habit of setting the executable bit on docs, that would be a bad thing.
> For instance, if I do "./budget.pdf", I expect it to run the code from my pdf reader. In this case, an attack would require exploiting a vulnerability in the pdf reader. That is to say, budget.pdf is not untrusted code, but rather untrusted data. However, in this case the unstrusted data can claim that it is code; in which case it would be executed as such.
I do not see the difference between running `./budget.pdf` and `evince budget.pdf`' - in both cases all the attacker could rely on is a bug in `pdfviewer`. There is no magic going on with `./budget.pdf`, the command would be transformed to the latter.
binfmt_misc is awesome. Little known fact: on most distros, you can combine the power of binfmt and qemu to run binaries from another architecture: either in a chroot/container or directly in debian:
I was once extremely confused by this black magic. For some reason, the binaries that I cross-compiled for an embedded ARM system ran on the host system! I forgot that I installed Qemu a while ago, and someone at Stack Overflow had to point me at binfmt_misc [1]
I experimented recently with writing a user-space ELF loader. Why should ELF loading need to happen in the kernel? Ok, there are some good reasons (the kernel has an easier time of single-handedly cleaning up all file descriptors, memory mappings, etc. in one fell swoop inside execve(). Are there others?). But I think it's an interesting proof-of-concept to show that ELF loading from user-space is totally possible.
When I started down this path, I learned some interesting details of the kernel/userspace interface that I wasn't aware of before, like the ELF Auxiliary Vector, which is how the kernel communicates some basic information to the process, like its arguments and uid/gid.
> the kernel has an easier time of single-handedly cleaning up all file descriptors, memory mappings, etc. in one fell swoop inside execve()
fork() / execve() isn't a good mechanism for launching new processes; it's one of the worst aspects of early Unix design.
A properly designed syscall should require the caller to specify the desired behavior explicitly:
1. Do I want to inherit file descriptors, etc or do I want a clean slate?
2. Do I want a separate address space or to share address space?
3. What happens to threads, mutexes, locks, etc in the new child process?
4. If a separate address space should I inherit my parent's VM mappings or should I just load and execute a new binary image?
That gives flexibility to the caller and maximum information to the kernel to optimize. Instead we have a bunch of ad-hoc patches like vfork(), clone(), and posix_spawn() (+ non-standard attributes like POSIX_SPAWN_CLOEXEC_DEFAULT, POSIX_SPAWN_SETEXEC, and POSIX_SPAWN_USEVFORK).
Not nearly an ELF loader, but during my security class, I wanted to test my payloads, so I wrote a load_bin program which read a file into a buffer, then jmp`ed to the start of the buffer.
Like the author, I'm sure many of us imagined that the shell was inspecting our scripts and calling the interpreter. Good to know it happens in the kernel.
I wonder why more packages don't register handlers. It would be cool if simply installing dosbox made exe's executable.
I did this a few years ago. I find it kind oft annoying to be honest since I use many wine prefixes and don't usually want to execute Windows applications that way. Don't know it's just a preferance.
Yes, bitfmt_misc is quite useful. In WSL (Windows Subsystem for Linux), we register Windows executables in binfmt_misc and have a handler that launches a Windows process back and relays stdin, etc.:
This is nice because it uses standard Linux functionality; we didn't have to extend our Linux-compatible kernel interface to support running Windows processes.
I'm not usually that excitable about these kind of feature demonstrations, but when I first made an old DOS game execute from my Linux terminal I did something reminiscent of Lucille Bluth spotting Gene Parmesan.
This is a good write-up. I remember reading through the kernel source code trying to understand this a few months ago because I was curious about what it would take to add binary whitelisting to the kernel. The Linux source code is not too hard to understand (if you have some background in how OS's work and know C). Especially if you have a topic that you want to know about.
If the kernel parts are confusing you, I highly recommend "Linux Kernel Development" by Robert Love as a high-level introduction to the kernel architecture.
If you like that, proceed to "Linux Kernel Architecture" by Wolfgang Mauerer. The first chapter contains information similar to this article, but at a much greater level of detail.
I read Code by Charles Petzold so I think I have at least some concept of how the operating system and hardware communicate. So now I guess I need a better understanding of how applications communicate with the operating system, as recommended by another reply, more for the sake of knowing rather than any practical reason.
I'm not sure if you'll see this late reply, but here's an idea that may be interesting (ie, accessible and hopefully not too challenging): write a Linux program that doesn't link a standard library (aka glibc). The result will be something that only uses kernel syscalls.
Your first such program probably wouldn't do much; low expectations would be ideal. Returning an arbitrary exit code could be one startpoint, or printing a fixed string could work too (you have write() to write bytes to the TTY, but not printf() (and %s, %d, etc!), as that's a stdlib function :D). This would be a "journey" exercise, not an "end goal" exercise.
You'll need to learn how to
- Disable linking the C runtime (crt0 et al - compile a .c file with 'gcc -v' and pour over the output and you'll spot several crt* things at the link stage), which will mean learning about '-nostdlib', '-nostartfiles' and related parameters
- Deal with the fact that you don't have a main() anymore, you're expected to define _start() instead
- Rummage through a standard library to figure out how to do, well, anything - I recommend poking a smaller C library, like musl or klibc, for ideas (it's mostly boring drudgery you won't need to extensively follow up on just to get started; and there are sufficient "I'm making a tiny stdlib!" projects floating around github that it may be a good idea to weigh up making a 6,983rd - that said, by all means take the opportunity to implement your own strlen(), things like that they're standard entry-level whiteboard challenges)
A good way to get your feet wet might be to first link with musl/klibc/similar, so you can get an idea of the fragility at play (mostly compilers' faults) without blowing everything up instantly, so you can learn how "actually don't use glibc please" is handled in practice. (Read: gcc works, but only within a very narrowly defined scope; clang may work, but may not have as much (easily-found) documentation as gcc. The IRC channel/mailinglist will probably be useful if you're sufficiently interested, but the responses you get may take a day to understand/unpack :) )
There's just enough documentation floating around on stackoverflow and related places that the process isn't overly terrible to start with, and this particular learning process will very clearly explain a heap of stuff.
If you like I can try digging out a bunch of links, but it might take a while (my bookmarks are a thoroughly under-categorized tangle :( ).
I am not a fan of binfmt_misc. Please hear me out.. This introduces the same framework for future vulnerabilities that Microsoft introduced long ago by having the system interrogate file type associations. This turned into easy mode for attackers to trick people into executing malicious files on their system. This has not happened _yet_ in Linux AFAIK, but this sets the framework for it.
In Windows, you can see this mapping by opening a command prompt and typing:
It was common practice for some time to change VBS mapping to text, along with several other file types.In my opinion, we are bringing a dangerous behavior into systems that could one day be commonly used as desktops.