Note that a typical "Hello World" executable written in C does not contain any code capable of actually printing anything. It links to libc shipped with the operating system. That is about 30MB of code that C executables get for free.
You can link libc statically and dead-code-eliminate everything except print. That's what Rust does too, but Rust has fancier machinery for formatting and handling of panics with backtraces. These bits don't get dead-code-eliminated as thoroughly in Rust (probably because the print could theoretically fail).
And below libc is probably X or Wayland which tells the kernel where to put pixels. And below that is the kernel and graphics driver which figures out how to do that and communicates it to the GPU. And below that is the actual GPU which figures out how to modulate an array of pixels into an HDMI signal. When do we stop quibbling about the size of binaries? It’s pointless to point out that small code relies on other abstractions. No one complains about tiny demo scene programs using tons of other stuff built into the OS. So why here?
Your understanding of computers does not seem correct at all... But maybe it's my understanding that wrong. I don't know.
C64 demos and 8088 DOS demos use very little OS infrastructure. Obviously things like JavaScript demos do.
The challenges presented by a particular piece of hardware or software in the demo scene are chosen for different reasons.
Sometimes, the demo author wants to show how well they can use the existing OS infrastructure. Sometimes they want to show what can be done without it. In both cases, small sizes are impressive.
I'm not complaining that a Hello World exe doesn't come with its own operating system.
It's just worth knowing when you compare exe sizes that C doesn't have an order of magnitude more efficient code generation. It just had a >30-year head start to get its standard library shipped with every operating system. Other languages typically don't have that advantage, and either have to reuse libc, or take a hit from statically linking their own libstd.
The entire exercise is an attempt to compare apples to apples.
In your example, both the C bin and Rust bin are using X, GPU, etcetc - so they warrant no distinction. So what is different? What is included in the Rust binary that isn't in the C? Because that's what is being tested in these types of comparisons.
This difference may be because some language included inefficient instructions - has bloated includes, etc. Or it could be because you're comparing Apples to Oranges, eg one ships with libc in the binary, and one dynamically links to it.
I’m not attempting to compare apples and oranges. I’m ranting about how almost every time someone posts something about tiny programs (like 100 line JS programs or wherever), someone has to chime in about how they’re able to do something so small because they’re using all these abstractions (code) below them and, as such, it’s not an achievement.
Specifically that your rant is here because the Parent comment mentioned Libc - but this is because the Rust version isn't using that - but the C lib is.
So it's Apples to Oranges because they're fundamentally doing different things, and yet we're comparing their Line Count. That's not only reasonable, it's necessary if you want an Apples to Apples comparison.
That's why we're not concerned about the code to render with GPU or some other junk - but we are fundamentally talking about two programs that do different things - but then comparing their line count.
Your rant may be valid in other cases, but it seems super off base here. Your rant has, imo, nothing to do with what's happening in people pointing out the discrepancy.
This is my attempt at making the smallest exe with rust for windows using the msvc toolchain. If y'all have any pointers how to get it smaller let me know :)
I think the thing being measured by GP is "easiest way to get to the 1kb mentioned in the title" not "most complicated way to get as small as possible".
You just need to add some compiler options; /MD should get the size down quite a bit already, then decrease the file alignment and merge sections together:
I notice you moved from asm! to llvm_asm! [0]. Since the commit message doesn't explain why, would you be able to do so here? (Purely for my own curiosity)
They are the same: asm! was replaced by llvm_asm! because it currently only supports llvm asm and the name might suggest otherwise [1]. A backend independant syntax for asm! was recently specified [2], but that will take some time to implement.
They are not the same, and the asm! version is already implemented; the RFC came with an implementation.
That first PR moved the old asm to llvm_asm, and then the new asm was built after a period of time so folks could migrate their code to llvm_asm, as a temporary measure. This commit happened after that transition so this has to be from the new asm back to the transition version.
No problem :) It was a feature that had no changes for a long time, and now has had a bunch of improvements and is finally on a path to stable. It can be hard to keep up with things sometimes.
Resulting hello.exe is 3072 bytes. From these 2560 are zero bytes. The only system calls from the resulting binary are to ExitProcess, GetLastError and WriteFile from KERNEL32.dll Additionally, changing the release optimizations:
produces hello.exe of 2560 bytes, 1849 from which are zeroes.
Note all: winter_blue has right to mention Zig. It's about what's provided out of the box in that new language and the provided optimizations are really effective.
Edit2: Please note that zig doesn't eventually link to printf and libc there, but links and resolves the std calls from the source to the calls to KERNEL32.dll.
Ah, yeah, I wasn't trying to shrug off the usefulness of zig, sorry. I think when it comes to compile size zig does it a bit better due to it's smaller std although, when compiling rust with just a link to libc and a printf call I believe you get an exe ~2048b when optimizing for size so I'm not certain how well that scales.
The 100 line hack I used was to noop out the `.idata` section which link.exe refused to merge/shrink, that only saved me 1Kb in the end.
Looking at the section info of the zig exe it looks like there is some black magic going on with it's alignment so I may have to dig deeper to see what zig is doing here. Also thank you both for pointing zig out, I haven't heard of it until now and it may come in handy for something else I'm planning on making :)
@Edit2 I saw that, and that's pretty cool in regards to how the zig std is a wrapper rather than a rewrite. Although, if we were optimizing for size in this case it might be more optimal to have only one entry in the import table :D
Maybe "zig std is a wrapper rather than a rewrite" was the slip and what's written is the opposite of the idea? I wouldn't call zig's std "a wrapper" in this case where it implements the functionality without the C standard library? Had it called something in the C library, then I'd call it a wrapper.
Yeah my choice of words wasn't great. It seams as if the zig std acts as a macro rather than a wrapper. That could just be the aggressive optimization though
It's all based on the language features and the actual implementations of zig's standard library code, and at least I don't see anything "wrapped" there, and nothing what C++, C or asm programmers understand as "a macro".
See an example calling firstNPrimes. "When we compile this program, Zig generates the constants with the answer pre-computed." "Note that we did not have to do anything special with the syntax of these functions. For example, we could call the sum function as is with a slice of numbers whose length and values were only known at run-time."
I'm a little short on time so I didn't read through all of it but, comptime generics sounds similar to how rust compiles generics and comptime parameters are similar to const fns although zig appears to have more control over what is deamed "compile time".
The size limited demos are use techniques like in the OP, scrapping the runtime, no std, filling the unused exe header sections with own data, quantizing floating point values, etc. And/or exe packers like Crinkler and kkrunchy.
COM's are no longer used.
Regular com programs won't run, but I'm waiting for the day the vulnerability is discovered that let's you run a specially crafted binary because of left over code from the old com launcher. Is the com extension still recognized in 64-bit windows?
Considering that COM programs are headerless I can't imagine any way a specially crafted binary could do anything since there is no parsing going on. The "COM loader" just hands it off to NTVDM which does not exist on 64-bit Windows (there are 3rd-party implementations though)
And yes, the com extension is still recognized and will pop up a dialog telling you "This app can't run on your PC".
But actually .com and .exe files are handled equivalently, so you can just rename an .exe (i.e. PE executable) to .com and it will run - that doesn't make it a COM executable though...
No, like really impossible on a hardware level (most 16-bit instructions are hidden on hardware level on entering 64-bit mode) and on a software level (NTVDM is separate from Windows NT from day one (since it only works on x86 hardware and not on DEC Alpha for example) and it was an optional component since Vista but only on an x86 installation though).
If you're willing to rely on undocumented behavior, you can always skip kernel32 and use ntdll.dll directly. Or you could go even further and direct invoke the right syscalls. But either way, you'd be relying on behavior that Microsoft can change without notice.
I thought of that but couldn't figure out how to get the syscalls for my system and the assembly around it. Being on win10, kernel32 links to api-console-... and I haven't been able to find it on my system. A few online downloads had it but, they don't export `GetStdHandle` and `WriteFile`. I thought maybe there was dynamic export generation going on because they did have those names as strings.
Edit: Also I believe ntdll doesn't export `GetStdHandle` anymore.
The Win32 loader will always load kernel32 no matter what. You don't even need it in your import table. It's just a matter of finding where it's loaded and calling into the right functions.
Between a 40b strcmp method (If I had/if it was more widely supported I could use sse4.2 strcmp) and ~74b of resolving said exports a couple syscalls sounds pretty nice provided the context switch isn't more expensive than resolving the imports.
Well if you really want to you can get the output handle from the PEB[0]. You can then call NtWriteFile[1] using syscall `0x0008` (Windows 10 x64 only)[2].
I completely forgot stdout handle is in the PEB! Thanks! After learning how the windows x64 syscall calling convention works I got it working on win10.
I know making tiny Windows executables like this is done just for fun, but I'm curious if there's technically a performance loss from merging sections. IIRC the sections are split up and 4k aligned because they have to be page aligned in memory anyway.
That contains a lot of platform-specific build instructions for Linux (and for the ELF file format used on Linux but not Windows), so it's never going to work with the msvc toolchain.
I linked these guys here https://github.com/pts/pts-tinype. They managed to do something similar with the PE headers, and I might have to try that next although I'm doubtful win64 exes are permitted to run if they are >1Kb
You can link libc statically and dead-code-eliminate everything except print. That's what Rust does too, but Rust has fancier machinery for formatting and handling of panics with backtraces. These bits don't get dead-code-eliminated as thoroughly in Rust (probably because the print could theoretically fail).