Cosmopolitan Libc: build-once run-anywhere C library

etamponi · on Dec 28, 2020

This project sounds amazing and very interesting! I have one question: how does it work on bare metal? I tried to look for an explanation in the documentation but I couldn't find anything. The specific questions I have are:

1) How does this interact with a boot loader like GRUB? Or does it come with its own boot loader? But in this case, shouldn't this go in a specific location on the boot disk?

2) How does it handle io in bare metal mode? Specifically: sockets.

jart · on Dec 28, 2020

Author here. Bare metal so far has been a "we built it because we could" kind of feature, so please be warned it's not as polished. Here's a quick tutorial.

Basically, ape/ape.S is a traditional BIOS bootloader. It also embeds a GRUB stub, which basically just switches back to the real mode BIOS bootloader entrypoint. So it should work with or without GRUB. The boot sector loads the rest of your executable of disk and prints "actually portable executable" to your display. It then configures standard i/o to use serial uart, sets up the page tables, enters long mode, and calls _start(). That's all I've got working so far. I would love to have e1000 and virtio though!

To try it out, here's how you build the included PC BIOS system emulator and boot an x86-64 ape executable that prints a spinning deathstar:

    m=tiny; make -j12 MODE=$m o/$m/tool/build/blinkenlights.com o/$m/tool/viz/deathstar.com &&
      o/$m/tool/build/blinkenlights.com -rt o/$m/tool/viz/deathstar.com

Video screencast: https://justine.lol/cosmopolitan/deathstar.html

See also: https://justine.lol/blinkenlights/manual.html

Enjoy!

etamponi · on Dec 28, 2020

Thank you, so if I understood correctly, I should write the binary starting from the boot sector, and it can overflow and go on subsequent sectors too, because ape/ape.S will be at the beginning of the binary, and so the boot sector will have the necessary bits in place to load the rest of the program in memory. Am I right? Can't wait to try it!

1vuio0pswjnm7 · on Dec 28, 2020

Here's the story behind this:

https://justine.storage.googleapis.com/ape.html

rav · on Dec 28, 2020

I have Wine installed on my Linux system, which uses "binfmt_misc" to automatically run Windows-looking executable files through Wine. This causes an issue with Cosmopolitan, since it uses an output format that triggers binfmt_misc. I had to run this to disable binfmt_misc on my system:

    sudo systemctl mask proc-sys-fs-binfmt_misc.automount

...and this to disable it for just the current boot:

    sudo bash -c 'echo 0 > /proc/sys/fs/binfmt_misc/status'

jart · on Dec 28, 2020

Give this a try:

    sudo sh -c "echo ':APE:M::MZqFpD::/bin/sh:' >/proc/sys/fs/binfmt_misc/register"

That command should allow Actually Portable Executables to co-exist with WINE, which assumes the MZ prefix, since APE binaries always use the longer MZqFpD prefix.

rav · on Dec 28, 2020

Thanks! That worked.

I also tried adding the line to a new file /etc/binfmt.d/APE.conf, but it didn't seem to work (after a reboot); after renaming the file to zz_APE.conf, it did work after a reboot. systemd's binfmt.d registers files in alphabetical filename order, so I'm guessing it's important that /usr/lib/binfmt.d/wine.conf is registered before the APE handler, meaning its filename must be alphabetically after "wine.conf".

mmastrac · on Dec 28, 2020

It appears to build a "naked" thompson shell script without a shebang line that works as a PE file for Windows/DOS, and uses `exec` to feed a binary to the shell via a pipe.

Interesting approach for a universal binary.

modeless · on Dec 28, 2020

A downside of this approach is the binary can't be executed with the exec system call because the kernel doesn't recognize scripts without a shebang. It can only be executed from within a shell.

shakna · on Dec 28, 2020

> A downside of this approach is the binary can't be executed with the exec system call because the kernel doesn't recognize scripts without a shebang. It can only be executed from within a shell.

It's a little bit more nuanced than that, depending on the platform. You don't necessarily need to have a shebang. This behaviour is guaranteed in Linux, and traditional on BSD, but not POSIX:

> If the header of a file isn't recognized (the attempted execve(2) failed with the error ENOEXEC), these functions will execute the shell (/bin/sh) with the path of the file as its first argument. (If this attempt fails, no further searching is done.)

Which means that it _should_ work outside a shell on those systems.

jwilk · on Dec 28, 2020

This quote is about execlp(), execvp() and execvpe() functions, which are supposed to emulate what shell does.

POSIX has similar wording:

> In the cases where the other members of the exec family of functions would fail and set errno to [ENOEXEC], the execlp() and execvp() functions shall execute a command interpreter and the environment of the executed command shall be as if the process invoked the sh utility using execl() as follows:

  execl(<shell path>, arg0, file, arg1, ..., (char *)0);

> where <shell path> is an unspecified pathname for the sh utility, file is the process image file, and for execvp(), where arg0, arg1, and so on correspond to the values passed to execvp() in argv[0], argv[1], and so on.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/e...

mk89 · on Dec 28, 2020

This is one of the most interesting projects I have seen this year. Congratulations for the idea and for the execution.

orra · on Dec 28, 2020

It's beautiful, isn't it? Making something simultaneously compatible with multiple formats. A far more interesting use of the PE/COM “MZ” header than simply printing “This program cannot be run in DOS mode”.

oh_sigh · on Dec 28, 2020

This is such a nicer compliment than it will be in a week.

groodt · on Dec 28, 2020

This is very clever. I wonder if it could be adopted for Python C extensions (where possible) to make it easier to distribute cross-platform binary distributions. Of course it wouldn’t be suitable for gui toolkits, but I imagine that the majority of Python C-extensions could be handled.

jart · on Dec 28, 2020

Cosmopolitan would actually be a great fit for Python extensions, since manylinux1 requires RHEL5 support and Cosmopolitan is probably the only modern C library that still supports it (glibc these days needs rhel8+). Maybe someone will figure out a way to compile a single-file library that's simultaneously a .dylib / .dll / .so just like how Actually Portable Executable does for programs.

RcouF1uZ4gsC · on Dec 28, 2020

> For an example of how that works, consider the following common fact about C which that's often overlooked. External function calls such as the following: memcpy(foo, bar, n); Are roughly equivalent to the following assembly, which leads compilers to assume that most cpu state is clobbered:

I think most modern C and C++ compilers know that memcpy is special and optimize accordingly. I believe some C++ compilers recognize when you are using memcpy just to do bit casting to different representations and avoid doing a memory copy at all.

CyberDildonics · on Dec 28, 2020

Even if you are trying to avoid the standard library, visual studio will insert memcpy into a loop that is just copying memory, then the linker will error out when it can't find it.

mkup · on Dec 28, 2020

This is true, but it's possible to prevent this behavior by turning off optimizations for just one function (by adding ``#pragma optimize("g", off)'' before the function definition and ``#pragma optimize("g", on)'' after the function definition).

saagarjha · on Dec 28, 2020

A better way to do with with GCC at least is -fno-builtin.

fwsgonzo · on Dec 28, 2020

You can disable them individually too: -fno-builtin-memcpy

rurban · on Dec 28, 2020

Replace modern with clang, and awful with gcc. I have no idea why gcc is still so fast overall, but its memcpy is horrible, as shown in the blog post. My own memcpy in my libc also beats the gcc/glibc one. The compilers are only good for short memcpy's. For the rest the asm hinders modern compiler optimizations.

moonchild · on Dec 28, 2020

Not sure what your point is calling gcc both ‘awful’ and ‘so fast overall’. Nor why you're conflating gcc and glibc.

FWIW my experience has been has clang produces a superior development experience for c++ code, but that gcc is better for c. Llvm does have a more modular architecture than gcc, but it's not really fair to say that it's more modern overall. Gcc has gotten several features ahead of llvm, like hot/cold LTO separation and parallel compilation (the former is now in llvm and the latter apparently is impractical due to architectural problems).

> The compilers are only good for short memcpy's. For the rest the asm hinders modern compiler optimizations

Also not sure what you mean by that. TFA isn't performing memcpys with inline assembly. They're using inline assembly to perform a call to the exact same ‘memcpy’ function you would have called anyway, but also to inform the compiler that memcpy clobbers fewer functions. So it's true that this will mainly improve performance for copies of small memory regions; but the performance of large copies will be unaffected.

jart · on Dec 28, 2020

If anyone's curious, here's the link to memcpy() as it's actually implemented in the Cosmopolitan headers: https://github.com/jart/cosmopolitan/blob/de09bec215675e9b0b... One thing that the web page doesn't mention (for the sake of simplicity) is that the Cosmopolitan headers do call __builtin_memcpy() as well, but only for 2-power constexpr sizes. That's the only time when GCC and Clang both do the optimal thing. In all other cases it's faster to use asm("call MemCpy") which implements something faster than the builtin would otherwise generate. See https://github.com/jart/cosmopolitan/blob/de09bec215675e9b0b...

moonchild · on Dec 28, 2020

> __builtin_constant_p

Aww, I was hoping you would use the evil ICE_P - https://lkml.org/lkml/2018/3/20/805

parhamn · on Dec 28, 2020

In theory the linked APE library [1] can also be used for languages like Rust and Go binaries to make fat portable ones as well, right?

[1] https://justine.storage.googleapis.com/ape.html

jart · on Dec 28, 2020

Yes! Cosmopolitan and APE are now permissively licensed under ISC terms. So they'd be a perfect fit for language authors. One of the biggest obstacles language authors encounter, is overcoming the obstacles posed by portability between operating system apis. Cosmopolitan takes care of that. You can just use the normal textbook unix apis and have it work on Windows/Linux/BSDs. Best of all, Cosmpolitan is tiny, embeddable, and unobtrusive. As in, these "fat binaries" are actually more like 16kb in size. So any language author who chooses to use it is going to have more time to focus on their vision, while reaching a broader audience from day one, with a sexy lightweight form.

BeeOnRope · on Dec 28, 2020

This approach of using inline asm wrappers around functions which specify exactly the clobbered registers is neat, but...

I don't think the approach of making a call from inline asm is safe. In particular, it clobbers the redzone [1], which the compiler may have used and expect to be intact after the call.

In general, at least some compilers (e.g., gcc) assume there are no calls in an inline asm block.

---

[1] https://godbolt.org/z/e4ecKr

jart · on Dec 28, 2020

Cosmopolitan doesn't use the "red zone" because these binaries boot on bare metal and operating system code can't assume a red zone. GCC supports this. Otherwise the Linux Kernel wouldn't work, since it's compiled the same way. So invoking call from inline asm is perfectly safe. The only real issue is that the called function can't assume the stack is 16-byte aligned. But that's fine, since Cosmo only using asm(call) to call functions written in assembly.

BeeOnRope · on Dec 28, 2020

Well you edited the compile command on the page just now to add -mno-red-zone, but it wasn't there before. So do you mean it is safe as long as -mno-red-zone is used?

Even then, I am not quite sure: redzone is just an example of one of the assumptions that might be violated when calls are made from an asm block, but at least some gcc devs are on record saying that calls from inline asm are not supported: the main problem is that leaf functions can be compiled differently than non-leaf, and a hidden call breaks this (i.e., a non-leaf function may appear as leaf to the compiler).

stephencanon · on Dec 28, 2020

Agree. This is a great little micro-optimization, but definitely requires some compiler support to do safely (on the other hand, compiler devs should want to support it, because it's a great little micro-optimization, so I don't think that it would be _too_ hard to add).

Note that one way to get the code size wins for the caller in a compiler-blessed way is to use an alternative calling convention, like Clang's preserve_most; this isn't as good as explicitly providing a clobber list, but allows a library to provide a stable interface.

ahh · on Jan 7, 2021

I can confirm we came up with this optimization inside tcmalloc, and our compiler experts nacked it as unsafe, even only calling tightly controlled code. A pity, as it is really beautiful to see the improved clobbering (and even, in some cases, hacked calling conventions.)

qlk1123 · on Dec 28, 2020

Sorry for being naive, but how can it be possible that nobody had thought of this approach and this approach would have actually worked? What have been preventing people from coming up with something like this?

jart · on Dec 28, 2020

Author here. Here's the best explanation I've thought of so far:

> If it's this easy, why has no one done this before? The best answer I can tell is it requires an minor ABI change, where C preprocessor macros relating to system interfaces need to be symbolic. This is barely an issue, except in cases like switch(errno){case EINVAL:...}. https://justine.lol/ape.html

It also took an act of willpower. This is a novel approach to building software that says the conventional wisdom is wrong. When I first stumbled upon the idea, I couldn't believe it was even possible, until I actually did it. Now I'm glad that I did. Actually Portable Executable has been so useful for my work. I used to work on the TensorFlow team, so I've seen the worst c/c++ build toil imaginable, and looking back at those experiences I just can't believe how I managed to get by for so long without having portable build-once binaries without interpreters.

jes · on Dec 28, 2020

Have you tried to code-sign executables produced by this method?

umanghere · on Dec 28, 2020

On macOS, these executables can be signed with a detached signature. Surprisingly, the embedded codesignature also works (but the signature is stored in an extended attribute on the filesystem).

GordonS · on Dec 28, 2020

On Windows, Authenticode signatures rely on and use PE header - I'd also love if the author could answer this one. If it does work, I'd also like to know why it works :)

qlk1123 · on Dec 28, 2020

Fair enough and inspiring. However I am still wondering how to advocate changes in such a scale.

For example, Linux introduces some fixes for Y2038 problem during the past few years. I suppose the community of this C library should do something for those changes, right?

moonchild · on Dec 28, 2020

Why not keep constants constant (so your switch example works), but have an extra translation step in front of the syscalls?

Sebb767 · on Dec 28, 2020

I can think of two reasons: It's arguably a hack and it's not needed.

While this is awesome, it can easily break (zsh already seems to do so) and, as the author states, it is not meant for GUI apps or programs relying deeply on OS APIs, which includes a lot of apps.

Additionaly, it does not solve an important problem. Doing a build for each supported OS is not that hard and easily automated. The actual problem is to be able to reuse code and this is solved for the most part.

jart · on Dec 28, 2020

Author here. I intend to upstream a patch soon with zsh that restores backwards compatibility with the Thompson Shell.

As for GUIs, while Cosmpolitan isn't intended to write complex GUI apps (since I normally build web apps for that) it still can be used to build WIN32 GUIs. See https://justine.lol/apelife/index.html as an example of Conway's Game of Life built with Cosmopolitan, where it manages to embed a UNIX TUI and a WIN32 GUI in the same binary! Even if you turn the WIN32 off, it'll still run the TUI inside the Windows Command Prompt.

As for breakages, Cosmopolitan only depends on stable kernel ABIs with long term stability promises. The binaries you build will work on all Linux distros dating back to RHEL5, Windows versions dating back to Vista, etc. There's no reason to believe there will be any radical breaking changes to these ABIs. Why is it so stable? Alignment of self-interest. By using canonical kernel interfaces, we ensure platforms can't break us without first breaking themselves.

ludocode · on Dec 28, 2020

> Cosmopolitan only depends on stable kernel ABIs with long term stability promises

Unfortunately, on macOS a stable kernel ABI does not exist. macOS has had kernel ABI breakage before, even recently, and they explicitly warn against making your own system calls. On Windows things are more stable but that's only because Windows cares about backwards compatibility to the point of preserving the behavior of even the most badly written software.

Linux is the only one of the three that promises a stable kernel ABI. On macOS and Windows the system libc is the only safe way to talk to the kernel.

moonchild · on Dec 28, 2020

Ntdll is the kernel ABI on windows; it is stable, and cosmopolitan uses it.

Not sure what you mean by ‘on Windows thing are more stable’; syscall numbers change all the time. See https://j00ru.vexillium.org/syscalls/nt/64/

jart · on Dec 28, 2020

Cosmopolitan uses the stable WIN32 API on Windows. The word ABI is used loosely in this context. It's somewhat accurate since Cosmopolitan still configures the assembler to generate the binary linkage by hand. NTDLL APIs are only used on rare occasions, such as sched_yield(). When it is used, it's used in accordance with Microsoft's recommendations: namely having fallback. It'd be great if we were authorized to use the SYSCALL instruction on Windows, however we have to respect Microsoft's recommendations if we want long-term stability promises.

Mac however is a different story. One of the reasons Apple came back is they adopted the UNIX interface. They literally copied the magic numbers right out of the UNIX codebase. They shouldn't have the right to stand on the shoulders of giants, pull an about face, and then declare it their own internal API. Apple actually broke every single Go application a few years ago by doing that, and I'm astounded that Google didn't do more to push back. We need to give Apple as much feedback as possible about keeping the SYSCALL interface stable. It's the right thing to do.

pjmlp · on Dec 28, 2020

They didn't had to copy anything, NeXTSTEP already did it by having a FreeBSD kernel copied into their hybrid kernel.

That doesn't mean it is going to stay like that forever, the UNIX network stack is already considered legacy for modern code.

https://developer.apple.com/videos/play/wwdc2017/707/

https://developer.apple.com/videos/play/wwdc2017/709/

Also I have some doubts about support in non-POSIX OSes like IBM i, z/OS, ClearPath MCP, or RTOS like INTEGRITY.

asveikau · on Dec 28, 2020

Microsoft documents ntdll as subject to change and makes no promises not to break it. When I was at MS it was advised not to use it for things that ship outside Windows. They had automated scripts to flag such a use.

In practice it is pretty stable, and many use it. Many things cannot be changed without causing massive internal refactors and app compat issues. But it is regarded as internal.

pjmlp · on Dec 28, 2020

Ntdll is not meant to be used as public API, whatever works does so by pure luck.

modeless · on Dec 28, 2020

I think the right way to add GUI support to this is to use the web. Simply embed an HTTP server in your binary and open the user's default web browser on launch. I have a framework for this here: https://github.com/jdarpinian/web-ui-skeleton

jart · on Dec 28, 2020

Author here. I agree. Check out redbean which is a single-file distributable web server, built with cosmopolitan: https://justine.lol/redbean/index.html redbean is forking web server (works great on windows) that can serve 1 million+ gzip encoded responses per second (only linux and freebsd are capable of that kind of performance) because APE binaries are isomorphic to the zip format which enables kernelspace copies. You just bundle all your assets inside the .com executable by changing the extension to .zip and the web server will deliver them via http.

modeless · on Dec 28, 2020

Yeah, very similar idea. The zip trick is awesome; especially letting the browser do the decompression is brilliant. My version has a couple features you might want: automatically opening the browser when launched and shutting down the server when the last tab is closed, and a debug mode where it serves directly from the filesystem to make editing easier. Man, I can think of so many cool features I'd want to add to this as well:

CGI support. Maybe also an option to embed node or deno, or even tcc for directly executing C source as CGI scripts.

Embedded writable SQLite DBs. Appending would clearly be tricky but probably doable.

PWA support, giving you native notifications, launcher icons, and top-level windows sans browser chrome just like a native GUI app.

Option to morph into various app formats (apk, appx, ipa) and instant install on a connected phone or upload to app stores. (Wait, actually all of these formats are based on zip! So maybe this wouldn't even need that much morphing? Just some extra cruft in the zip file, some code signing BS, and maybe an embedded copy of adb.)

With all these features it might not be tiny anymore but it could make a pretty complete cross platform native app dev environment contained in a single file. Still way smaller than Electron.

iforgotpassword · on Dec 28, 2020

Why vista only? Is it 64 bit only? Otherwise for TUI-only this should go back to at least NT4. Or is it using anything clever that makes older versions barf?

Edit: Ah, maybe it's bypassing ntdll and syscalls changed?

jart · on Dec 28, 2020

Author here. It's Vista+ because Cosmopolitan focuses on x86-64. Cosmopolitan does however support i8086 since these executables boot from BIOS. You can even define a _start16 entrypoint if you want to write real mode apps. No effort has been made to natively target i386, however it's still technically supported since Actually Portable Executables will re-exec themselves under qemu for other architectures.

mmastrac · on Dec 28, 2020

I think it's been pondered a bunch, but the legwork required to build the headers for each system so that you can execute the same binaries on every platform was tough.

choppaface · on Dec 28, 2020

Not an answer, but Splunk currently has a large deployment of a C++ application that supports a large variety of platforms. At least as of a few years ago, they deployed C++ indexing and querying nodes on customer machines. They've spent a great deal of time to accommodate the peculiarities of enterprise customers, old compilers, etc. One might imagine that a "build once run anywhere" platform would benefit them, but they've found a business model where they were able to work around the peculiarities and still be profitable.

So while there's probably some interesting historical reasons that explain why there's no "JVM for C++", Splunk is a good example of a modern technical / business model where conquering platform fragmentation was feasible.

smcameron · on Dec 28, 2020

> Please note this is intended for people who don't care about desktop GUIs, and just want stdio and sockets [...]

Not to diminish the achievement at all, but a significant limitation to be noted (and not a very surprising limitation).

I also wonder about things like the sizeof(long). On 64-bit linux, this is 8, but on windows, it's 4. Well, "on windows"... maybe this throws all the APIs that make such assumptions, uh, out the window, (puts on sunglasses).

jart · on Dec 28, 2020

Author here. That's not an inherent limitation. See https://justine.lol/apelife/index.html for an example of a GUI + TUI that's built using Cosmopolitan. The reason why I said what you quoted, is I'm simply trying to calibrate expectations. I don't view desktop GUIs as a productive area of focus, because there's such a lack of consensus surrounding the APIs that requires, and web browsers do a great job.

As for sizeof(long) it's 64-bit with Cosmopolitan, which uses an LP64 data model. That hasn't impacted its ability to use the WIN32 API. See https://github.com/jart/cosmopolitan/tree/master/libc/nt where WIN32 API functions are declared using uint32_t, int64_t, etc. It works great. I compile my Windows apps on Linux and it feels wonderful to not have to touch MSVC. Cosmopolitan even takes care of the chore of translating your UTF-8 strings into UTF-16 when polyfilling functions like open() which delegates to CreateFile() on Windows.

mwcampbell · on Dec 28, 2020

> Cosmopolitan even takes care of the chore of translating your UTF-8 strings into UTF-16 when polyfilling functions like open() which delegates to CreateFile() on Windows.

That shortcoming of the MSVC runtime has been the bane of multiple projects I've done on Windows. Even without the portable executable trick, Cosmo might be useful as an alternative Windows CRT that can be used with GCC or clang. (Edit: Except that ARM64 Windows is now a thing, whereas Cosmo seems to assume the continued hegemony of x86-64.)

mhd · on Dec 28, 2020

Regarding consensus desktop APIs, maybe it's time to re-invent/-introduce Win32s ;)

(Although adding that to the library would add considerable bloat, and cross-platform runtime dynamic linking or a "dynamic polyfill" might be a bit beyond the scope of this)

Or raw X11 protocol, of course (WSL made X11 servers on Win10 quite common, and . If that wouldn't be so horrible, it would be interesting how small one could get basic widgets starting from scratch.

Asooka · on Dec 28, 2020

It should be possible to compile wxWidgets with Cosmopolitan, one variant per platform (win32, POSIX/X11, MacOS) and initialise the correct one at runtime depending on where you're launched, no?

mwcampbell · on Dec 28, 2020

> it would be interesting how small one could get basic widgets starting from scratch.

Please don't do this for any application with a wide audience. Small, simple widgets are missing many important things such as accessibility and internationalization.

mhd · on Dec 28, 2020

Sure, as do most TUIs and a lot of web apps. I'm merely suggesting a hack that goes along with this hack, not the foundation technology for a mono-binary Office clone.

Re-inventing the UI wheel seems somewhat common within the Go and Rust communities (as is re-inventing everything else), this would just have raw X11 protocol messages via socket as the foundation layer, not Vulkan/GL/SDL. And there just enough to be useful, e.g. text, input, buttons, canvas…

For something more serious, I suggest re-inventing PostScript/NeWs as a service, of course.

mwcampbell · on Dec 28, 2020

> I'm merely suggesting a hack that goes along with this hack, not the foundation technology for a mono-binary Office clone.

Sure. I just worry that one person's clever hack will then become another person's indispensable tool, which they'll then impose on their colleagues, employees, or contractors, blocking a third person from doing their job. I know a blind person who temporarily lost his job because his employer's largest client required an application that didn't work with screen readers. So I just want to make sure developers are aware of accessibility, and in general, the risks of reinventing things that are necessarily complex.

mhd · on Dec 28, 2020

That's one of the biggest issues some developers have: We pride ourselves in the wisdom to see "bloat", but as you say, a lot of complexity arose out of necessity, even if the developer themselves doesn't need it. Privileged minimalism.

pjmlp · on Dec 29, 2020

I have used XWin32 and Hummingbird since 2000.

smcameron · on Dec 28, 2020

Thanks for the informative reply.

> and web browsers do a great job.

Suppose one had a native linux game that uses POSIX, OpenGL and SDL. Is there any hope?

gary_0 · on Dec 28, 2020

Having given the apelife source a glance, it seems quite possible to me; it already shows you how to create a window on Win32.

And SDL2 has already been ported to Emscripten/WASM after all, and OpenGL is really no trouble, since it's usually accessed through a mostly platform-agnostic C API wrapper already (see for example, https://github.com/Dav1dde/glad ).

And maybe you could use MinGW to support POSIX on Win32...

mst · on Dec 28, 2020

Now I want to see somebody better at C than me try smashing cosmopolitan and the C version of Dear Imgui together and see what happens.

TimTheTinker · on Dec 28, 2020

This is an interesting, and potentially compelling argument that WebAssembly binaries and runtime actually aren’t the future of compile-once-run-anywhere systems software: these support syscalls, sockets, and stdin/stdout —- all of which requires platform-specific glue code to achieve using WebAssembly and standard libc (or JS).

jart · on Dec 28, 2020

Author here. WebAssembly has played an important role protecting CI on the Internet and it's cool that it enables us to run LLVM in the browser. I was very surprised when I learned that some people were using it offline outside of browsers, since doing so requires a JVM-like runtime such as wasmtime. Cosmopolitan proves that it's possible to just fix C instead, which might fix all the stuff that's built on top of C too. We can in fact have first-class portable native binaries, which will only require JIT compilation if they're launched on a non-x86 machine. This is probably how Java would have been designed, if Intel's instruction set hadn't been encumbered by patents at the time. That's changed. The x86-64 patents just expired this year. So I say why not use it as a canonical software encoding.

Someone · on Dec 28, 2020

If Sun would have picked a ‘real’ CPU, I doubt it would have been x86 (did they even sell any of them at the time?)

Also, picking a real CPU is only worth it for running on that CPU, and that particular one. If they had picked x86 in 1996, they wouldn’t even have supported MMX.

In a platform-independent design, using n registers in the virtual CPU is only a good idea for n = 0 (i.e. a stack machine, as in the JVM, .NET CLR and Web Assembly) and n = ∞ (as in LLVM).

If you pick any other number your platform-independent code needs to handle both the case where the number of op real registers is smaller (so you need to have a register allocator) and the case where it is larger (there, you could just ignore the additional registers, but that means giving up a lot of performance, so you have to somehow figure out which data would have been in registers if the CPU had more of them).

Why write and optimize both of these complex problems, if you can pick a number of registers at either end of the scale, and spend all your resources on one of them?

And that’s even more true for x86-64, which doesn’t have a fully orthogonal instruction set, has I don’t know how many ways to do vector instructions, none of which allow you to clearly express how long the vectors you’re iterating over are (making it hard to optimally map them to other CPUs or newer vector instructions), has 80-bit floats, etc.

There’s a reason Google stopped work on https://en.wikipedia.org/wiki/Google_Native_Client in favor of Web Assembly.

MaxBarraclough · on Dec 28, 2020

Also, the high-level nature of Java bytecode enables/simplifies many optimisations in the JIT.

For example, dynamic dispatch is handled by the JIT, it's not encoded into low-level Java bytecode instructions, so if the JIT can see there's only one class loaded that implements a certain interface, it can generate code that directly invokes methods of that implementing class, without going through a vtable. It can do this even across library boundaries. That wouldn't be possible (or at least, would be greatly complicated) if Java bytecode provided pre-baked machine code.

Modern JVMs also have pretty deep integration of the GC and the JIT, if I understand correctly. The Java bytecode format is high level so the JIT is quite free to implement memory-management however it likes. If the JVM took a truly low-level approach to its IR, we'd presumably be stuck with 90's GC technology.

I imagine it would also have implications for the way the JVM handles concurrency. It seems right that it defines its own memory model with its own safety guarantees, rather than defining its model as whatever x86 does.

It's telling that .Net took the same high-level approach that Java bytecode did.

roca · on Dec 28, 2020

"Only require JIT compilation if they're launched on a non-x86 machine" is a heck of a caveat.

The x86-64 instruction set is not a good portable IR. Some well-known problems:

* It's much more irregular to decode than other instruction sets/IRs

* There is a truly vast set of instructions, very many of which are almost never used. So in practice you need to define the subset of x86-64 you're using. E.g. does your subset contain AVX-512? AVX2? AVX? BMI? X87? MMX? XGETBV? etc etc etc. These decisions will impact the performance of your code on actual x86 CPUs as well as non-x86 CPUs.

* x86-64 assumes the TSO memory model which is horribly expensive to support on CPUs whose native model is weaker (e.g. ARM) (which is why Apple added a TSO mode for M1; no other ARM chips have this)

Honestly, declaring x86-64 your portable IR and then claiming that as a technological breakthrough sounds like a trick to me. I'd agree it's a breakthrough if you define your x86-64 subset, show your x86-to-ARM compiler, and show that it produces code competitive with your competitors (e.g. WASM compilers).

ssokolow · on Dec 28, 2020

> * It's much more irregular to decode than other instruction sets/IRs

According to this blog post, it's not just "much more irregular to decode", instruction deciding is the Achilles Heel of x86 which allows M1 to be so fast by comparison.

https://debugger.medium.com/why-is-apples-m1-chip-so-fast-32...

moonchild · on Dec 28, 2020

Because x86 instruction encoding is a mess. Take a look at this[1] flowchart from the amd manual (vol. 3).

That doesn't even tell the whole story; there's a ton of path dependence, and the operand format is quite complicated and has tons of special cases.

1. https://0x0.st/-rce.png

jart · on Dec 28, 2020

I thought x86 was pretty straightforward once I understood the octal encoding. Cosmopolitan provides a 3kb x86 instruction length decoder that supports all past, present, and future ISAs: https://github.com/jart/cosmopolitan/blob/master/third_party... The Cosmopolitan codebase also provides a debugger / emulator for i8086 + x86-64 that's similar to GDB TUI and Bochs. See https://justine.lol/blinkenlights/index.html For example, here's the ALU code: https://github.com/jart/cosmopolitan/blob/master/tool/build/...

Someone · on Dec 28, 2020

I wouldn’t call 3kb “pretty straightforward”, certainly not relative to the competition. The instruction length decoder for quite a few CPUs is

  return INSTRUCTION_SIZE;

I don’t know any other that has more than a few cases for immediate constants and jump addresses.

dnautics · on Dec 28, 2020

xoreaxeaxeax's videos about how to systematically parse the asm space of x86 really hit home to me how bad the encoding is.

https://www.youtube.com/watch?v=KrksBdWcZgQ

MaxBarraclough · on Dec 28, 2020

> doing so requires a JVM-like runtime such as wasmtime. Cosmopolitan proves that it's possible to just fix C instead

This doesn't sound like a fair comparison. WASM isn't just providing portability, it's also providing security and runtime safety. Cosmopolitan doesn't, it presumably requires you to trust the codebase, and doesn't protect you from C's undefined behaviour. Of course, WASM also imposes a considerable performance penalty, I presume the Cosmopolitan approach easily outperforms WASM.

> This is probably how Java would have been designed, if Intel's instruction set hadn't been encumbered by patents at the time.

To mirror the comment by Someone, I sincerely doubt this. Java bytecode is a very high level stack-based IR, nothing like a CPU ISA. They could easily have made it resemble a CPU ISA if they'd wanted to. The lesser-known GNU Lightning JIT engine takes this approach, for instance.

rini17 · on Dec 28, 2020

WASM only provides sandboxing. That is not the same as security nor it means runtime safety nor protection from undefined behavior.

MaxBarraclough · on Dec 28, 2020

> WASM only provides sandboxing. That is not the same as security

The relevant Wikipedia article is named Sandbox (computer security).

> nor it means runtime safety nor protection from undefined behavior

It puts stronger constraints on what mischief undefined behaviour can lead to, and guarantees that various runtime errors are handled with traps. [0] This isn't the same as a hard guarantee that execution will terminate whenever undefined behaviour is invoked, but it's still a step up.

[0] https://webassembly.org/docs/security/

ssokolow · on Dec 28, 2020

Speaking as someone who would love to see WebAssembly succeed as a cross-platform, cross-language way for me to write sandboxed plugins and CLI utilities, I do have to point out that, for out-of-browser use, WASM's MVP does regress various things:

* https://00f.net/2018/11/25/webassembly-doesnt-make-unsafe-la...

* https://www.usenix.org/system/files/sec20-lehmann.pdf

(eg. Under WebAssembly's memory model, dereferencing NULL=0 won't lead to a segfault.)

MaxBarraclough · on Dec 28, 2020

> Under WebAssembly's memory model, dereferencing NULL=0 won't lead to a segfault.

Thanks that's a curious one. It won't always lead to a segfault with a conventional compiler either, undefined behaviour being what it is. [0][1] Fortunately GCC can be asked to add such checks at runtime, [2] this approach could also be taken with WebAssembly.

[0] https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

[1] https://blog.regehr.org/archives/213

[2] https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.h...

ssokolow · on Dec 30, 2020

I was more referring to how it's standard practice on modern OSes to leave the zero page unmapped in a process's address space so that, if the compiler converts the NULL dereference into a dereference of address 0x0, it'll trigger a segfault.

pjmlp · on Dec 28, 2020

> We can in fact have first-class portable native binaries, which will only require JIT compilation if they're launched on a non-x86 machine.

This is nothing new, IBM and Unisys mainframes have been doing it since the 60's.

jart · on Dec 28, 2020

IBM System/360 and NexGen i586 were big inspirations for the design of Cosmopolitan.

thequux · on Dec 28, 2020

Are you familiar with IBM i (née AS/400)? There's a similar idea there, where programs are compiled to a virtual instruction set called TIMI then statically recompiled to whatever the machine's native instruction set is at the first load on a given machine. As a result, you can just drop a System/36 binary on a modern PowerPC system and not even notice that you've just crossed two changes of instruction set.

aledthemathguy · on Dec 28, 2020

excuse my ignorance on this, but this sounds amazing. game changing. yet it's being "served" here on ice. casually. what am I missing?

morelisp · on Dec 28, 2020

WORA's been promised for decades. Even if it's true no one will believe it (including me). And portability is not the only reason people choose Rust or Go over C.

That's before you get into whether you can aesthetically convince everyone that x86-64 is good, actually.

Still very interesting though, especially compared to the usual Show HN crap.

the-dude · on Dec 28, 2020

Your comment would be better without the last sentence.

ddalex · on Dec 29, 2020

Portability was a long desired attribute of computer software up until it turns out that it doesn't matter at all.

When software was deployed on floppy disks and there were 10k architectures going around, portability was critical to expand the market reach of any distributed software.

But in the modern age when the software runs "in the cloud" and there are dominant platforms out there, turns out that portability doesn't matter - nobody is changing platforms when they don't need to.

So this makes this achievement a pretty cool demo without any industry impact, since people don't distribute C binaries anymore.

The idea behind it and the technology developed is pretty cool :)

nn3 · on Dec 28, 2020

WebAsm is not performance competitive:

"We find that the mean slowdown of WebAssembly vs. native across SPEC benchmarks is 1.55× for Chrome and 1.45× for Firefox, with peak slowdowns of 2.5× in Chrome and 2.08× in Firefox."

https://arxiv.org/pdf/1901.09056.pdf

comex · on Dec 28, 2020

x86 isn’t performance competitive either if you’re targeting a CPU that doesn’t run it natively. In fact, it’s much worse than WebAssembly.

Even Apple Rosetta takes roughly a 1.25x-1.50x hit on SPEC compared to native [1], and that’s with ahead-of-time compilation and no software sandboxing, whereas WebAssembly VMs typically use JIT compilation and do provide software sandboxing, both of which come with overhead. In contrast, inNative [2] is a WebAssembly compiler that has ahead-of-time compilation and no software sandboxing, and it can supposedly reach 95% of native performance (no idea if the benchmarks are cherry-picked though). This makes sense: WebAssembly is higher-level, closer to a compiler IR, whereas x86 assembly has already decided on low-level details (like register allocation and calling conventions) that are suboptimal to emulate on another architecture.

And other x86 emulators will do worse than Rosetta. For one thing, Rosetta can take advantage of the Total Store Ordering mode added to M1 chips for the specific purpose of emulating x86 faster. An emulator running on pretty much any other CPU has to emulate a stronger memory model in software on top of a weaker one, which comes with a massive unavoidable penalty for multithreaded workloads. Also, most x86 emulators are just not as efficient as Rosetta, adding an additional penalty.

[1] https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste... [2] https://innative.dev/news/

timClicks · on Dec 28, 2020

These binaries wouldn't provide the isolation that wasm provides... Although it's looks like they're essentially unikernels, which means that they could potentially be run per-invocation by a hypervisor.

jws · on Dec 28, 2020

Doesn’t the “strlcpy” example near the bottom of the page overwrite the buffer by one byte when the source string is as long or longer than the buffer?

Imagine I have a three byte buffer, n is 3 and I copy in “foo”. The result of the MIN is going to be three. And d[3] is the fourth byte of my three byte buffer.

jart · on Dec 28, 2020

Author here. The example and repository has been updated. Many eyes make all bugs shallow. I was hoping to do more testing before posting on Hacker News. Now that the cat's out of the bag, I'd like to explain what's being done to make sure bugs like that never happen.

Cosmopolitan implements a simplified version of the Address Sanitizer (ASAN) runtime. See https://github.com/jart/cosmopolitan/blob/master/libc/log/as... I've been systematically working towards using it to vet the entire codebase. It's helped to address more than a few memory issues so far. However more work needs to be done polishing the ASAN code so it can be bootstrapped for the purposes of having memory safety in the lowest level libraries like LIBC_STR, due to the way that the Makefile requires acyclicity between libraries.

This has been less of an issue, because bugs are normally very evident in functions like strcpy(). The BSD version is an exception though, since it was only used to port programs from the OpenBSD codebase which were used sparingly. But in either case, the real focus here is having continual incremental improvements over time, and I'm very committed to addressing any and all concerns.

Cosmopolitan also has extensive unit tests. This is something that C libraries have traditional done somewhat sparingly, if at all, since C library authors conventionally rely on an ecosystem of ported software that very quickly breaks if everything isn't perfect. If you build the Cosmopolitan repository, you'll notice there's 194 independent test executables in the o/test folder. That's one of the great things about Actually Portable Executable. It makes binaries so tiny that we can have lots of tests, completely isolated from each other!

kjiuui774 · on Dec 28, 2020

Have you ever tried C formal verification tools? Check out open source Frama-C https://frama-c.com which uses ACSL (ANSI/ISO C Specification Language). The free documentation and tutorials are pretty good.

harikb · on Dec 28, 2020

I think you are right. `strlcpy` is only supposed to copy `n-1` bytes from src to dest.

https://github.com/freebsd/freebsd/blob/master/sys/libkern/s...

ben_bai · on Dec 28, 2020

strlcpy always NUL terminates the string (that is it's big selling point). So it copies up to n-1 bytes from src to destination. If src was bigger then destination it is truncated and n-th byte is set to NUL.

comex · on Dec 28, 2020

Seems that way to me as well.

pwdisswordfish5 · on Dec 28, 2020

Cute trick, but probably fragile. It basically requires either writing functions like memcpy in assembly by hand or parsing the compiler-generated assembly post-codegen to determine the actual set of register clobbers. Otherwise it is bound to be broken as soon as the register allocator decides to clobber a different set of registers when generating the function.

Also, this effectively makes the body of the function a part of its public ABI, which makes it morally equivalent to inlining; it requires users of the function to be recompiled each time the body (and in consequence the clobber set) is changed. But since this seems to be static-linking-only library, this seems not that much of a problem in this case.

Still, I’m expecting improvements to whole-program LTO and inter-procedural optimisations to render this hack obsolete.

vitiral · on Dec 28, 2020

I'm a huge fan of the ideas behind this library. Using the stable ABI of "magic numbers" is brilliant. Would love to see this solution gain more usage!

saagarjha · on Dec 28, 2020

Sadly, the "magic numbers" are much less stable than you'd think.

jart · on Dec 28, 2020

That's like saying the '9' in kill -9 might change someday. However it really does depend on which numbers. In the APE blog post I mention the importance on focusing on the subset of numbers that systems share in common. In other words, the really old ones. https://github.com/jart/cosmopolitan/blob/master/libc/sysv/c...

saagarjha · on Dec 29, 2020

Not quite–kill is part of POSIX, and platforms that expose a libc interface must keep signal numbers consistent for applications. Syscall numbers need not be stable if they are not exposed, as is true on pretty much every platform but Linux.

erik_seaberg · on Dec 28, 2020

In case it helps anyone else, this is a compact libc for an x86_64 binary format that can bootstrap and make syscalls on several kernels but relies on static linking (it doesn’t use any platform’s shared libc).

mintyc · on Dec 28, 2020

A couple more links that may be helpful in this pursuit:

https://sagargv.blogspot.com/2014/09/on-building-portable-li... and

https://blog.ksub.org/bytes/2016/07/23/ld.so-glibcs-dynanic-...

Watch out for the implications of licenses etc.

fwsgonzo · on Dec 28, 2020

This is what I've been working on with RISC-V, and you have no idea how much resistance I get when I try to fix the problems relating to calling conventions, inline assembly and system calls.

I don't want to have to explain what I'm doing from the beginning every time I have a complex question about how to make a function call from inline assembly. It has really turned me off from stackoverflow as a whole. Sorry for the rant.

I think your project is awesome, and really shows that we can improve the base upon which everything rests.

xyproto · on Dec 28, 2020

I have the same experience. Stack Overflow is not great when you try to achive something novel or do something new from scratch.

PaulHoule · on Dec 28, 2020

It's also a crap shoot when you try to solve a routine task. You might read through 1500 lines of people arguing before you get to the right answer, get an almost right answer, or no answer.

If you are working in a language which has good documentation (say Python or Java) you are best off learning how to look up the right answers in the official manual quickly and let your competitors waste time with stack overflow.

OOPMan · on Dec 28, 2020

I barely even use SO these days, it's usually more fruitful to engage with people on various official discord/slack/mailing lists.

SO has become a mound of garbage people dig through to find some piece of garbage solution to their garbage problems

jacobr1 · on Dec 28, 2020

What is a way to fix this? Is this some part of an endless long-cycle of adoption for social networks over some kind of crossing-the-chasm style adoption curve? Or is there a structural way to fix it? Stackoverflow is better than its predecessors, but clearly is much less valuable now than for its first few 5 years or so.

peheje · on Dec 28, 2020

For me SO could be more powerful if more answers would link or refer to documentation/github landing pages supporting/documenting their solution.

Often I see some weird way to implement something in the .NET world, especially Core/.NET 5, search Microsoft docs and find an 5-10-min read article updated last Wednesday that explains in-depth (relatively to a SO post) three different newer way of doing it.

A big weakness of SO, as I see it is aging. Most upvotes and marked-as-answer is most likely also the oldest.

Maybe the page could warn for old answers or highlight newer answers that are gaining traction.

Maybe questions and answers should be encouraged to specify version of tools. Asking identical questions for V1 and V2 of a tool should not be met by "POSSIBLE DUPLICATE" warning or get downvoted to oblivion. Posts shouldn't need "EDIT 1 updated for V1", "EDIT 2 updated for V2", as this is OK with 2 versions but what about 10?

Maybe points (or some score system) should decay over time? Idk. But I think something needs to be done.

PaulHoule · on Dec 28, 2020

It's not just that social networks go bad the way cheese goes bad, but also that any "social network" business becomes opposed to the interests of its user base at the moment it becomes a "sustainable" business (e.g. the users would want 100% of the surplus to go to improving the system, but management, employees, investors and such won't agree)

Once you are #1 in the Google Economy there are many reasons to say "why try harder?" Questions might have better answers some place else but there is no one site that threatens the SO hegemony. Also, a site like SO has a proven compatibility with Google SEO -- if you made any big changes to a site like that there is a high risk that your traffic would drop catastrophically.

What I would do:

StackOverflow Q/A are all available by a "cc-by-sa" license, so it would be a good starting place for something better: an "antisocial" project curated by a small group of people. Take the top 0.1% of questions from SO and only the best answer and strip away the "socialjunk" (e.g. see "chartjunk") and you are starting to get there.

A system like that could justify itself it were your own personal knowledge base, but to make something generally useful I'd expect to curate 100 questions a day for 3 months or so: about 10,000 questions in all.

There are a number of methods that could speed up that curation a lot, not least of which is that you'll find some authors who consistently write great answers and others who consistently get upvoted with mediocre answers, etc. No amount of analytics will make up for zero manual curation work, you could be maybe 5x as productive once you've curated 10,000 questions and developed some automation.

specialist · on Dec 29, 2020

You just reminded me of The Well.

I had the notion their membership was limited, like a country club or fraternity or something, to a fixed number. To join, someone else had to first leave. I can't now confirm this, so I probably misremembered. Regardless, I've always liked this idea.

Wiki claims a distinction between virtual communities and social medias. The Well, MetaFilter, Ravelry and others have membership fees. Some have verified identities. Those frictions would certainly deter most dysfunction, antisocial behaviors now plaguing social medias.

I think curated social media is mostly fruitless. Having done a lot of online moderation, I honestly don't know how u/dang and others pull it off.

Culture and civility is much easier with preventative measures, like verified identity.

bluedino · on Dec 28, 2020

Can you guys link to an example?

abainbridge · on Dec 28, 2020

A question I asked that springs to mind is https://stackoverflow.com/questions/37554058/fast-display-of...

It was down voted to a negative score initially. People posted irrelevant/unhelpful comments that I had to hurriedly refute before the question was closed and deleted.

To be fair, I have tweaked the question since, so it probably makes more sense than it did when I initially asked it. But I feel that is something the community should help with.

kenniskrag · on Dec 28, 2020

I reached the same conclusion about stackoverflow.

Is there an incentive from the real world to answer questions because so many beginner try it?

wrycoder · on Dec 28, 2020

Is there an incentive from the real world to answer questions

Apparently so - if you look at the “Customer Questions and Answers” section of Amazon product pages, there is a remarkably high proportion of answers like, “I don’t know, I haven’t opened it yet” and “I’m not sure, I bought it as a gift”.

ComputerGuru · on Dec 28, 2020

No, that is sleazy as anything “growth hacking” from Amazon.

If you bought product X and some months or years later “Sam” posts a question on Amazon that isn’t answered, Amazon will send you an email with a subject line that is a variation of “Sam has a question for you about Product X, can you help?” and the body would be something along the lines of “Dear foo, Sam wants to know if <insert question regarding Product X here>? Click here to respond to Sam and help her out!”

In that context, you can easily see why people reply with such otherwise useless “answers.” When someone personally asks you a question you don’t know the answer to, responding with “sorry, I bought it for my husband as a gift and I don’t know,” is not just perfectly valid, it’s also the polite thing to do. But that answer is posted to the website in the genera Q&A as of you had viewed the question and then instead of passing on a question you had no answer for, you chose to reply with a useless answer.

wrycoder · on Dec 28, 2020

Very interesting. I had not considered that possibility.

_underfl0w_ · on Dec 28, 2020

Occasionally AMZN will send out questions others have asked about products to you in the form of an email if you have previously purchased the same product. Some people mistake those for questions directly asked to them instead of realizing where their answer will be posted. StackOverflow doesn't do that AFAIK.

ComputerGuru · on Dec 28, 2020

See my other reply. It’s not fair to even say the user is making a mistake; it’s Amazon purposely manipulating users into thinking they’re being personally asked a question to game their way into getting more answers.

wrycoder · on Dec 28, 2020

Oh. I didn’t think of that. I’m too naïve. Thanks!

egil · on Dec 28, 2020

I guess it's a ROI issue. Low effort answers to many easy questions gets you many points, while novel questions which requires thorough understanding is high effort and few points.

ndesaulniers · on Dec 29, 2020

> So what Cosmopolitan does for memcpy() and many other frequently-called core library leaf functions, is defining a simple macro wrapper, which tells the compiler the correct subset of the abi that's actually needed

By not marking all memory as clobbered for a call to memcpy, couldn't the memcpy overwrite a local variable, in which case reads post memcpy could miss the updated value?

> The above two compiles used the same gcc flags

Why does the before have frame pointers and post does not?

Uptrenda · on Dec 28, 2020

Now all we need is a good package manager for it and you're looking at a C renaissance

api · on Dec 28, 2020

Can you code sign these binaries? This is necessary for software distribution on Windows and Mac if you don't want ugly warnings.

andi999 · on Dec 28, 2020

Can somebody explain the whole thing to me? I know how to programm on windows and linux, but thats basically at the level of calling the compiler and linker, and some parameters to sometimes build a dll. So what does this do, and how does it work? Does it apply to DLLs as well? Also, 'run anywhere' means on the same processor type?

pwdisswordfish5 · on Dec 28, 2020

It’s a libc implementation accompanied by a linker script that tricks the linker into generating a polyglot executable file, simultaneously interpretable as an MZ executable, x86 boot sector and a Unix shell script, the latter of which re-writes the executable into ELF or Mach-O format and then executes it again. I haven’t analysed it all that thoroughly, but that’s the gist of it. Here’s more information: https://justine.storage.googleapis.com/ape.html

Polyglot files are not a particularly new invention, but devising a reproducible process to generate those can be quite tricky, so most don’t bother unless there’s a special need for it. (For example, the GRUB4DOS fork of GRUB contains a ‘bootlace’ executable that is simultaneously executable as a DOS .COM and an ELF file.)

The libc itself contains a number of specially-crafted functions and header files that expose the functions’ clobbered register set as part of the functions’ public ABI, which allows the compiler to use that knowledge to better allocate registers and optimise more aggressively. The downside is that if the clobbered register set changes, it requires everything using the function to be recompiled.

andi999 · on Dec 28, 2020

How is the libc polyglot? I mean don't system call vary widely between windows and Linux?

pwdisswordfish5 · on Dec 28, 2020

    int ftruncate(int fd, int64_t length) {
      if (!IsWindows()) {
        return ftruncate$sysv(fd, length);
      } else {
        return ftruncate$nt(fd, length);
      }
    }

You get the idea. The OS is detected at startup and then checked each time a function is invoked.

thelastbender12 · on Dec 28, 2020

Would this not make the implementation inefficient?

vitiral · on Dec 29, 2020

IsWindows is likely a macro and therefore you have if(0) or if(1) which the compiler can easily optimize away.

thelastbender12 · on Dec 29, 2020

I thought this implementation was meant for compile-once-run-everywhere usage, so I can't see how a compiler would do away with a this if-statement. Could you please say more?

jart · on Dec 29, 2020

The compiler keeps the branch. The branch doesn't impact performance because it's fully predictable and therefore costs less than 1 nanosecond. Please note however that fadvise() is a system call and the SYSCALL instruction o/s context-switch generally costs 1 microsecond, which is 1000x slower than any overhead incurred by the portability benefits Cosmopolitan offers.

You can however tune the build so the `IsWindows()` branch is dead code eliminated anyway:

    make MODE=tiny CPPFLAGS=-DSUPPORT_VECTOR=0b11111011

Basically unsetting the Windows bit causes the Windows polyfills to not be linked in your binary. It's only useful for making binaries tinier. So rest assured, if you find your 16kb APE .com binary to be too bloated, you can always use SUPPORT_VECTOR to squeeze it down to the order of a 4kb ELF executable. See: https://github.com/jart/cosmopolitan/blob/1df136323be2ae806f...

rhdxmr · on Dec 28, 2020

Wow I am very surprised that this could happen. An executable that can run on every platform? I am curious about it

Santosh83 · on Dec 28, 2020

When you think about it, 99% of the code of a compiled C program is processor specific object code which in theory should not care about the operating system or environment at all. It is the 1% like API/ABI and header magic that makes the same object code incompatible under different OSes.

dom96 · on Dec 28, 2020

This looks amazing, I just gave it a try but sadly got this:

    /usr/bin/ld: crt.o: unable to initialize decompress status for section .debug_aranges

This on a WSL2 Ubuntu 18.04 install with gcc 7.5.0. Any ideas what the problem could be?

dominicjj · on Dec 28, 2020

I have the same problem also in 18.04 with gcc 7.5 - it's a bug in binutils:

https://wiki.gentoo.org/wiki/Binutils_2.32_upgrade_notes/elf...

I have been unable to find a backported package for 2.32 or better as a deb or ppa and there's no ways I'm installing binutils from source in case I clobber my build system. Any ideas?

magnio · on Dec 28, 2020

Works fine for me on Ubuntu 20.04 (also WSL2) and binutils 2.34. Have you tried updating binutils to newer version? (>=2.32)

0xbadcafebee · on Dec 28, 2020

Not to be "that guy" but DevOps doesn't really relate to the tag line. It's like saying "build-once run-anywhere c without lean / agile / six sigma". What's a C library got to do with a people management strategy?

> you won't need to configure a CI system to build separate binaries for each operating system

If that's where the DevOps part comes from, CI predates DevOps. DevOps is just a general strategy for coordinating work that incorporates CI among other things. And you'd still use CI with this because CI is more about an SDLC than portable builds.

I like the idea of the project. Is there a list of apps that use it?

TimTheTinker · on Dec 30, 2020

How close is this to being able to compile something like Redis or SQLite?

kelvin0 · on Dec 28, 2020

Impressive and well written article. But unfortunately I get an error message when attempting to run from a cmd line in Windows 10.

C:\Users\Foo>emulator.com -t life.com

"This app can't run on your PC"

Maybe I missed something?

jart · on Dec 31, 2020

That's an old version of the emulator from the APE blog post back in August. Download blinkenlights.com which should work fine in the windows command prompt. https://justine.lol/blinkenlights/index.html

Brian_K_White · on Dec 28, 2020

This is plain delightful.

jariel · on Dec 28, 2020

Bravo, this is really interesting, applicable or not these kinds of projects are really important. Looking forward to seeing this evolve.

jedisct1 · on Dec 28, 2020

That would be a great target for Zig.

zxspectrum1982 · on Dec 30, 2020

If this could be used to build a Python interpreter that runs anywhere, and maybe even be combined with PyInstaller, then this would solve one of the most annoying problems of Python as a systems programming language.

Eduard · on Dec 28, 2020

Run anywhere? On Dreamcast? On some exotic Toshiba microcontroller?

lights0123 · on Dec 28, 2020

Bare metal amd64 with BIOS, amd64 Windows, amd64 platforms that understand shell scripts and ELF or Mach-O, and any platform that understands shell scripts and has QEMU installed.

eggy · on Dec 28, 2020

Any chance this could be made to work with tcc instead of gcc on Windows? I don't understand enough of how it is tied into gcc and linux to even take a shot at it.

jart · on Dec 28, 2020

Author here. tcc is tired and chibicc is wired. Check out https://github.com/jart/cosmopolitan/blob/master/third_party... where I've been working on building a 300kb actually portable executable c11 compiler + assembler + linker that supports all the best gnu extensions.

eggy · on Dec 31, 2020

Wow, I'll have to look at chibicc in detail. Sounds great!

gonzus · on Dec 28, 2020

There is a typo in this fragment: "the following common fact about C which that's often overlooked". Just trying to help.

jart · on Dec 28, 2020

Fixed. Thanks!

racytas · on Dec 28, 2020

Is there any danger for this allowing cross platform viruses?

rhdxmr · on Dec 28, 2020

Wow. I am so surprised that this could happen

danellis · on Dec 28, 2020

"Run anywhere" appears several times in the page, but it seems to be a wild exaggeration, especially given the comparison to Java.

nineteen999 · on Dec 28, 2020

Considering that for Java it really means "run anywhere you have a JVM/JRE already installed", the marketing slogan is an exaggeration in both cases.

roelschroeven · on Dec 28, 2020

Indeed: as far as I understand it, Cosmopolitan works across different OS's (or even without OS), but not across different CPU architectures. Comparing that to Java is a gross exaggeration indeed.

anon321321323 · on Dec 28, 2020

What if you include an emulator in the executable? eg x86-64 on ARM. Then you could slowly run the x86 code on the pi for various values of "slow".

jibanes · on Dec 28, 2020

can you bundle shared objects in the (blob) executable?

chaganated · on Dec 28, 2020

For the insight, I tip my hat. For the timeline where people actually use this, I put it back on.

Bob Barton is surely turning in his grave.

rodneyzeng · on Dec 28, 2020

wow, most likely, this means there would be a virus infective to all operating systems...

viraptor · on Dec 28, 2020

It's not really necessary for that case. At the level of taking over control, you're attacking a specific system, so you need to know what it is and can provide specific payload. For the later phases, the dropper knows where it's running so it can request a specific version of the virus as well.

So there's nothing stopping multi-os viruses right now.

teleforce · on Dec 29, 2020

Not sure why you're being down voted, perhaps for telling the obvious?

I can foresee Morris Worm v2 coming up sooner rather than later but this time it won't just targeting UNIX systems. It is the obvious killer application for WORA library that use C, whether we like it or not.

On the bright side, for a start this probably can solve the many redundant VC++ compiler versions installation in a single computer just before we get to the 2030 VC++ compiler versions :-)

Anyway kudos to the library's author, when the late Dennis Ritchie invented C for portable OS, he had probably never envision how far it can be made portable.