Hacker News new | past | comments | ask | show | jobs | submit login
Supporting four operating systems in a 400 byte ELF executable (justine.lol)
155 points by jart on Aug 29, 2022 | hide | past | favorite | 42 comments



It's worth noting that at one point there was an effort to standardise ABIs on Unix-like OSs, https://en.wikipedia.org/wiki/Intel_Binary_Compatibility_Sta... although it seems to have mostly been forgotten by the Internet.

(I dug into that a little bit, inspired by a previous article here: https://news.ycombinator.com/item?id=31456401 )

Four operating systems is under 400 bytes is pretty good!

I have a 26-byte Hello World that works in DOS 1.x, 2.x, 3.x, 4.x, 5.x, 6.x, Windows 1.x, 2.x, 3.x, 95, 98, 98se, Me, 2k, XP, Vista, 7, 8, 10. If I used CALL 5 I might get down into CP/M-land but possibly miss some of the newer OSs ;-)


I love the tricks that can be done with the old COM format. It's so versatile and simple, its almost like scripting. You can even make it ASCII clean while retaining some semblence of turing completeness!

I had a weird and pointless project somewhat recently where I wanted to bootstrap a programming environment using only the stock tools on NT 4. I didn't actually finish, but what I did write was pretty contrived. I wrote some code for JScript, which had to run in IE since there was no scripthost, and it would parse some horrible language using a very inefficient IE3 JScript compatible implementation of parser combinators and generate x86 code... And output it in base64 into a textarea. Which I would then need to somehow get back into proper binary data...

So my approach was to write a small DOS compatible COM binary with a really simple base64 decode routine. I'd do this with the DEBUG command, and save it to disk. Then, I'd use redirection in command prompt to feed it base64 and redirect the output to a file.

I was pretty ecstatic to see this awful hack job working somewhat end-to-end.

(I don't have the code, I think it got wiped out when the drive in the laptop I was doing this whole game on died. But it was quite straightforward to implement base64 in very little code; I believe what I wound up doing was accumulating 3 bytes of output from 4 bytes of input with a few conditionals to shift base64 symbols into the corresponding bits and then just some shifting and ORing. No LUTs necessary.)


Not a base64 decoder (written those too), but here's an ASCII-only .COM file that works on both x86 and 8080:

x)y'!B:)),DM!G@))p,T]l%)),@@@Hello, world!$XP5]A%=!P[PZ^V!wV!wX5.<(GV(gX534,!(GWr!

it could have been a few bytes shorter, but on 486 you need a jump to flush the prefetch queue


Don't you need negative displacement to be turing complete? And wouldn't negative displacement always entail non-ascii characters?


You'll either need self modifying code or a chain of short jumps that wraps around the 64K code segment.

The latter is used by http://tom7.org/abc


I was reading the comments this morning and thinking "I know somebody did this fairly recently..." but couldn't remember who.

Of course it was Tom7. Of. Course.

Tom7 is a treasure.


I'll grant DOS, maybe win16, Windows 9x, and NT, but it's a bit much to count multiple versions of the same OS and I'm on the fence about compatibility layers - can I include WINE and say that the same binary also works on Darwin, Linux, and Illumos? (If that binary works on DOS 1.x, I seriously doubt that that it's any more "native" on modern Windows that on, say, Illumos; they're both just using an ABI compatibility layer)


It comes with some other caveats: no AMD64 version of Windows has ever shipped NTVDM, so for newer OSes it will only work on the x86 32 variant. (Though, I do believe some variants of Windows on other architectures had NTVDM with an x86 emulator, and so this 26 byte binary is also cross architecture!)


For supporting both CP/M and DOS, you can skip over an 8080 jump instruction using a single extra byte:

80 C3 xx xx

The first instruction will be harmless on both processors, and even have the same mnemonic: it decodes as either "ADD BX,xxxx" or "ADD B"!


small correction: the first byte is 81, "ADD C" on the 8080


Really interesting projects on the index [1], just wish there was an RSS feed!

[1] https://justine.lol/index.html


Beyond the tiny binary result, I understand that Justine want the OS maintainers to understand that there is value in supporting a common ABI and that there are very little changes to do for that once they already standardized on the CPU arch and POSIX. But changes would need to be synchronized accross all OS, so there would be a large communication cost. So this bogs down to how much innovation is expected in system ABIs.


Interesting that this is the opposite of the Gentoo argument ("you should recompile on every target system for maximal optimization"), and that in practice people seem to be converging on Docker as "the standard cross-platform ABI".


I thought Gentoo did that more for the increased customisability than performance, since you have the compilation options to tweak (USE flags). Never used Gentoo though.


docker is not cross-platform


I think I’ve written shellcode for CTFs that’s less fragile than this :P


Unlike a vulnerability, there is no particular reason for anyone to break the tricks used here. Platform detection almost always relies on quirks like this if it hasn’t been designed in from the start.


I disagree. It's pretty easy to detect the OS using things that can't possibly change like ABI calling convention (the BSDs differ from Linux.) There's thousands of ways to differentiate Windows from Linux that can't be changed...


Breaking something like this would generally be for refactoring or performance reasons


Did you read the article? The only one I could foresee breaking is if OpenBSD decided to one day implement auxiliary values.


I did. Putting aside syscall numbers, which I've talked you about before, there's a lot of things that you are depending on. One of them would be if OpenBSD implements auxv, but relying on the various registers being zeroed out or equal to each other at program startup is dicey on most platforms.


Well if you consider trolling talking https://news.ycombinator.com/item?id=32562674 whatever they break I'll just ask them to fix. I've had to make plenty of reasonable asks in the past to make APE possible. For example, I got a nasty bug fixed in the NetBSD /bin/sh. I got FreeBSD /bin/sh patched. We got POSIX to change their rules to allow it. Let's not forget zsh and fish.


I think most OSes will be happy if you come to them with actual bugs, yes. But if you ask them to stop changing syscall numbers or various other "ABI" that is not actual ABI, I feel like they are not going to be as happy to accommodate you.


Are you an OS developer?


Actually, yes! I don't work on any of the OSes you mentioned here though, and I haven't touched the kernel in a while.


Yeah, same here, but for proof of concept viruses.


That's very interesting, but it seems rather dependant on implementation details - or is ELF parsing on each of these OSes considered stable?

I mean, if one of the non-FreeBSD OSes started insisting on a certain value for e_ident[EI_OSABI] then presumably he'd have to redo all this from scratch with a different technique.


This is akin to fuzzing in parallel across several OS's to keep it all working :)


Brilliant, as usual. Her patreon is woefully under funded, given the feats being performed. Hopefully that's not her primary source of funding.

Reminds me of when slack was sold for 30 billion dollars, and Boston Dynamics sold for 1.5 billion. Feels like the market rewards the wrong things sometimes.


The tech industry is mostly a marketing/sales industry. Sell the thing to more people and it is worth more money.


people generally value kinda cool and useful much more than cool and mostly useless.


Slack is far more useful than anything Boston Dynamics has made


Slack is a chat program of which their are countless. Most of them are literally free.

If you can't see the value in what Boston Dynamics has created, I have a ToDo list application to sell you for several billion dollars.


And yet nothing in your comment contradicts the parent comment.


I think that demonstrates an incredible lack of vision/imagination on your part.


Sounds like it was pure coincidence that this works at all.


I don't really think so. They're OSs that, when they added support for x86-64 CPUs, all opted to use the externally-defined ELF executable file format with the x86-64-sysv ABI. Note also that the architecture/ABI already includes a dedicated `syscall` assembly instruction.

On top of that, 3 of the OSs (FreeBSD, NetBSD, OpenBSD) share a common source code ancestry, meaning that the oldest and most fundamental Unix system calls (like `write(2)`) had an excellent chance of remaining compatible.

Yes, calling write(2) on Linux is different, but only in the syscall number. That's also not a coincidence, as the behaviour of write() is highly constrained by POSIX which Linux and the BSDs both adhere to, and the way to pass parameters to the call is specified by the `syscall` instruction.

That there are enough differences in the environment to be able to figure out which of Linux or *BSD the program is running on is a bit of luck, but probably not that much.


Clever and carefully researched, yes, but by coincidence, no.


Pure coincidence, perhaps not (see the other comments on this thread), but this did come across a few times as being a hack.

For instance, on the OS ABI field in ELF headers:

> Fortunately we can get around this by just setting it to the FreeBSD ABI. This is because FreeBSD is the only UNIX operating system that checks this field.

Or on loading a zero-byte segment into memory:

> However if the size in memory is zero too, then OpenBSD will refuse to run the progarm, whereas the other kernels just don't care.


I think you could indeed call it a coincidence because it's exploiting behaviors that are not really a proper ABI, just emergent behavior based on implementation details.

Someone working on the various operating systems could change those behaviors without knowing that they're breaking this thing. And they wouldn't necessarily be in the wrong to do so. Because it's not really an ABI.


There is always a certain amount of luck in successful runtime feature/platform detection when the system doesn’t provide an explicit mechanism for it. I would not call it a coincidence though. It would be rare for the systems to be different enough to require customization but also impossible to differentiate between at runtime.


I may be totally wrong here but perhaps this looks like it might be coincidental if one doesn't know many truly different OSes.

The modern OS space is dominated by 3 big OSes, and a bunch of smaller relatives, that are basically siblings. They share a huge amount of ancestry.

Whereas there are many, many others that are vastly unlike them in almost every way.

What you see as coincidence reflects the fact that these OSes are closely related.

The 3 big OSes now are Windows NT, macOS and Linux. The close relatives I mentioned are the BSDs, and then relative to them very slightly further away, Minix 3, QNX, and then things like GNU HURD, (Open)Solaris, other Unixes, etc.

All implemented mainly in C, all descendants of or inspired by one of just 2 OSes for just 2 DEC minicomputers.

Unix and all Unixes from a DEC PDP-7 OS, that became a lot more like modern Unix when ported to the PDP-11.

All Windows this century is Windows NT. The other forms (Win9x, WinCE etc.) are dead.

All NT is related to VMS, a DEC VAX OS, where the VAX is a 32-bit PDP-11.

They are all siblings.

Some OSes with the same conceptual model (disks, partitions, files, binaries, users, etc.) that are totally different:

* RISC OS.

* Classic MacOS

* AmigaOS (and MorphOS, AROS etc.)

None of these have replaceable "shells". None use file extensions. None have any trace of 8.3 filenames.

Others, like the large family that sprung from CP/M, including Concurrent CP/M, Atari TOS/GEM, etc. are broadly similar because they also came from DEC PDP OSes, but different ones, and were not originally implemented in C or anything because their shared rootstock is contemporaneous with the invention of C.

There are many many OSes that are almost nothing like even this basic model of "disks" with "filesystems" and "source code" that is compiled to "binaries" that are CPU-native and governed by "configuration files" that are probably "plain text".

Symbolics Genera and OpenGenera. IBM OS/400, now called IBM i. Taos and its successors Intent and Elate. Novell Netware, in some ways. Arguably, Inferno, the last of the Unix line.

You're looking at a cluster of closely-related OSes for a single CPU platform and saying hey, one binary can work on all of them, what a coincidence.

In fact, this very clever and ingenious hack is _because they are close relatives_.

You are pointing at a dog, a cat, a lizard, a cow, and a frog and saying "hey, look, what a coincidence: they all have 4 legs and 2 eyes and 1 mouth!"

No, it's because they are all relatives. They're all tetrapods.

But there are also insects and spiders and crustaceans and worms and comb jellies and starfish. All basically unrelated except they're all animals.

(I was going to include jellyfish, but looked at this way, jellyfish and corals and sea anemones are all related: they're all coelenterates.)

I mean, it's OK if the only animals you're familiar with are 4-legged vertebrates, but you should at least be aware that there is, literally and exactly, more to life. I mean, you've seen flies and cockroaches and things, haven't you?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: