
Why is a Rust executable large? (2016) - new4thaccount
https://lifthrasiir.github.io/rustlog/why-is-a-rust-executable-large.html
======
lifthrasiir
Wow, I wrote this 3 years ago and I feel nostalgic. Nowadays I consider this
to be obsoleted, as the official Q&A has the exactly same question and many
things (especially jemalloc) have been changed in the Rust world. I personally
found the HN discussion at that time [1] was greatly fruitful.

[1]
[https://news.ycombinator.com/item?id=11823949](https://news.ycombinator.com/item?id=11823949)

~~~
klmr
Unfortunately that link to the Rust FAQ is broken.

~~~
maddening
They removed the FAQ as it didn't fit they newer site.

You can find it at the old site version: [https://prev.rust-lang.org/en-
US/faq.html](https://prev.rust-lang.org/en-US/faq.html)

~~~
intertextuality
I'm confused as to why they redesigned the website but removed prior
functionality.

A few weeks ago people pointed out that the internationalization also suffered
due to this. Edit: by suffered, I mean "was removed entirely". I just checked
and couldn't find a way to change the locale, nor could I manually set it e.g.
by appending /de-DE, etc.

If the new design requires a link to the old design to access removed
functionality, is it really ready to launch?

\---

edit; these were the languages offered: Deutsch, English, Español, Français,
Bahasa Indonesia, Italiano, 日本語, 한국어, Polski, Português, Русский, Svenska,
Tiếng việt, 简体中文

Now if I want to refer to someone about Rust in Korean I have to link to
prev.rust-lang.org, which is weird.

~~~
andrepd
It's the "redesign culture" in a nutshell. Change things (expending effort)
for no discernible reason, break familiarity, arrive at subjectively worse
appearance, and at objectively worse functionality. It's hair-pullingly
frustrating.

~~~
lisper
It's a cargo cult. For any technology, there is a period of time when it
improves more or less monotonically, and so most new things are cool. Some
people over-extrapolate this and conclude that new things are always cool, and
so is born the cult-of-the-latest-thing, which persists long after actual
improvements stop.

------
nineteen999
This is a good write up from the point of view of a C/C++ programmer, since it
gives a fair breakdown of where that extra space is going and rightly points
out that statically linked C/C++ executables are going to be large as well.
It's also a fun tour of how to access the syscall interface directly from Rust
and to perform optimizations that most C and C++ programmers wouldn't ever
perform perhaps outside of the most space constrained embedded projects.

I take a minor issue with the handwaving away (not just in this article but
with others as well) of glibc and the suggestion that it can just be replaced
with musl or another libc. There's a reason that glibc doesn't _officially_
support static linking, and that is NSS, and PAM.

If all you need is a static executable for your Docker container or whatever
that reads user/group information and authenticates out of flat files in /etc,
then go for it. But in the "enterprise" space things like LDAP, Active
Directory, 2FA etc. are real, and if your application needs to support those,
then you're going to need glibc and its dependency on dlopen() and friends.

And this goes for every language which has a dependency on your chosen libc as
well (which let's face it, is a large majority at least in the Linux world),
if you want to use NSS and PAM modules.

~~~
nwmcsween
OK fist I'm not defending c or c++ but trying to fix incorrect information:

> ...rightly points out that statically linked C/C++ executables are going to
> be large as well.

This is false, and both how a library is structured how linkers work, if you
statically link parts of musl expect a somewhat tiny size increase.

> There's a reason that glibc doesn't officially support static linking, and
> that is NSS, and PAM.

That and the code isn't structured for static linking as there are dependency
chains pulling in many symbols.

> ...most C and C++ programmers wouldn't ever perform perhaps outside of the
> most space constrained embedded projects.

This is because syscalls are os specific and depending on the os unstable.
Also Linux has an interesting way of encoding errno into the ret val (maybe
others too?) not to mention vdso 'syscalls'.

> ...glibc and its dependency on dlopen() and friends.

If $code doesn't work on musl 99.9% of the time its due to $code, also musl
has dlopen and a dyn linker otherwise alpine wouldn't work.

~~~
nineteen999
Thank you for the technical clarifications. In the boring, corporate,
enterprise software bubble in which I have mostly worked for 25 years,
musl/busybox/Alpine are barely a blip on the radar. I have not ever seen an
Alpine Linux install in 25 years, unless it is the basis of various busybox
based appliances and I've not noticed. Certainly never seen it used for
running "mission critical" bloated Java enterprise apps etc.

So within my comment I thought it was implicitly clear I was referring to
glibc based distributions, for example Redh^H^H IBM.

You CAN statically link binaries against glibc on these distributions, and in
many cases it will work. However NSS and PAM will not work in my experience.
Is it a generally a good idea to static link against glibc? No.

> If $code doesn't work on musl 99.9% of the time its due to $code, also musl
> has dlopen and a dyn linker otherwise alpine wouldn't work.

Is sssd officially supported on Alpine Linux? Does it just work out of the
box? It seems to be in "testing" branch from what I can see with over 600 open
bugs.

There is a reason that shareholders of large organizations want to pay a
large, established Linux vendor for support, regardless of whether or not
their engineers will ever use the support or not. They want to pay for
stability/security updates. They are paying for a Linux distribution that 3rd
party application providers have certified their application for.

They want LDAP/AD and other pluggable PAM modules to work out of the box
without too much tinkering. Alpine Linux may fit that criteria for all I know.
Doesn't matter. My employers wouldn't use it whether I wanted to or not.

Not saying that I like how things have turned out for Enterprise Linux
necessarily, but it is what it is. Redhat and clones absolutely DOMINATE this
particular space, at least in the USA, UK and Australia.

~~~
nwmcsween
> They want LDAP/AD and other pluggable PAM modules to work out of the box
> without too much tinkering.

PAM works fine with musl even though the main implementation of PAM is
horrible for security. Also PAM isn't tied to a libc so I don't understand why
its mentioned, the glibc implementation of NSS tied to libc but there are
other implementations of NSS.

------
monocasa
For those writing embedded firmware, one of the subpoints of a talk I did
recently (about a rust dev kit for the Nintendo 64) was that you want to avoid
a standard model where in C and C++ you might have most of your code in a
static library and build the executable as a second step. Instead you really
want all of your code including your asm in rust source files so it's as
painless as possible to generate an LTOed binary. I had binary sizes go from
~1MB to about ~70KB. Rust really depends on LTO if you care about binary size.

~~~
StrangeDoctor
Was this talk recorded anywhere? it sounds really interesting.

~~~
monocasa
No, unfortunately. I really just need to take the content and stick it in a
blog, but life gets in the way...

~~~
naikrovek
So kick life's ass, tell it to back off, and blog that info somewhere!

It would be good info for the rest of us, and it would be great if you could
chronicle that stuff.

(Don't tell Life I said to kick it's ass, please. It will crush me.)

------
edflsafoiewq
Both executables have grown on my machine since 2016. On Linux, no
optimization or stripping

* C Hello World - 19K

* Rust Hello World - 1.6M (!)

The C gain is attributable to a change to ld: it puts read-only data and
executable data in separate segments by default as a security hardening
measure since ld 2.30. I don't see why this shouldn't produce three segments
(R, RE, RW), but here I get four (R, RE, R, RW). (Anyone know why four?) If I
pass -znoseparate-code to disable this the C goes back to the 8K shown in the
article. (No effect on the Rust).

The Rust gain appears to be mostly in debug info. Stripped I get 187K, which
is similar to 121K the article gives after stripping and removing jemalloc
(which isn't in by default anymore).

Does anyone know why Rust has grown so much from 2016 to 2019?

~~~
dathinab
Did you compile with the --release flag? Without it will build a debug build
and put all the debug info in there (which mainly due to println is a lot).

~~~
Grollicus

      $ cargo --version && cargo init hello && cd hello && cat ./src/main.rs && cargo build --release && du -h ./target/release/hello
      cargo 1.34.0 (6789d8a0a 2019-04-01)
           Created binary (application) package
      fn main() {
          println!("Hello, world!");
      }
         Compiling hello v0.1.0 (/tmp/foo/hello)
          Finished release [optimized] target(s) in 0.31s
      1,6M ./target/release/hello
    
    

Edit:

    
    
      $ strip target/release/hello && du -h ./target/release/hello
      192K ./target/release/hello

~~~
aasasd
Note that, while it doesn't matter much in this case, with `du` you'll get
~±4Kb out of nowhere because it measures by fs blocks. You need `--apparent-
size` for the file size―which doesn't exist in the BSD version (= MacOS
version).

------
pornel
The real answer is that they're not. It's just that C has a head start of
having a 10MB libc already installed on your system, and Rust doesn't.

If you compile C with libc statically linked in, it'll make executables as
large as Rust's. If you compile Rust for dynamic linking, it'll give you hello
world as small as you get from C.

~~~
mahkoh
"If you compile C with libc statically linked in, it'll make executables as
large as Rust's."

    
    
        $ cat test.c && musl-gcc -static -O2 test.c && echo && du -h a.out 
        #include <stdio.h>
    
        int main(void) {
            printf("Hello World\n");
        }
    
        20K a.out

~~~
pornel
I stand corrected. Congrats to MUSL!

~~~
saagarjha
I would also imagine that the compiler is performing a couple of optimizations
that would significantly reduce the resulting binary’s size: the call to
printf would likely resolve to puts, which might be further optimized to a raw
write call by a smart enough compiler.

~~~
wbl
And possible dead code elimination in the linker.

------
nickcw
> I didn’t mention this because it doesn’t improve such a small program, but
> you can also try UPX and other executable compressors if you are working
> with a much larger application.

I don't recommend this as it will increase the memory used by your program
which is usually more important than how much disk space it takes up.

OSes only page in the bits of the executable that is actually run, however
when upx decompresses the binary it has to load it all into RAM. That RAM can
be swapped out but for a normal executable backed by a file the kernel would
just drop pages from the file so no write IO needed.

In my tests with rclone, using upx made the binary 31% of the size, but made
it use 42% more RAM. YMMV!

~~~
lifthrasiir
While you are technically correct, nowadays the executable file size is
minuscule compared to the available memory. Since upx uses the UCL algorithm
which requires no additional memory besides the target buffer AFAIK, the
memory increase can only be attributed to the original file size---I think
rclone weighs about 20 MBs, so it is probably the case that rclone unusually
consumes less memory! :p

The real problem with upx is the invocation latency, which is a big blocker
when your program starts a lot (common for CLIs). Executable compression has
been much more common in Windows because the download size matters there and
programs tend to last much longer.

~~~
ChrisSD
I've found that UPX triggers AntiVirus programs more often than uncompressed
binaries on Windows.

------
pslam
I have routinely created firmware written in Rust which are just a handful of
KB. The issues people are having are related to system-integration, such as
static linkage, can be mitigated in cases anyone really cares about, and will
disappear over time.

------
new4thaccount
Also, the author mentions this is in the Rust FAQ ([https://prev.rust-
lang.org/en-US/faq.html](https://prev.rust-lang.org/en-US/faq.html)) now under
the following question:

Why do Rust programs have larger binary sizes than C programs?

~~~
Kurtz79
I find interesting though that the Rust code that generates a larger binary is
about as complex and lengthy as the C code, while the Rust code which
generates a smaller binary is clearly clunky and nobody would ever use it.

Of course it is a toy example and I imagine in most real-world examples the
size of the binary is a moot point.

~~~
simias
I believe that rust now defaults to the system's malloc implementation instead
of bundling jemalloc, so that bit is no longer necessary:

    
    
        #![feature(alloc_system)]
        extern crate alloc_system;
    

As for the part that removes libstd it's how you'd generally do it for a bare
metal program (i.e. bootloader/OS) since you can't really use it without an OS
unless you manually reimplement all sorts of primitives (memory allocations
being the most obvious one).

Now if you do use libstd it's true that it's significantly larger that libc,
but it's also vastly more powerful. You don't have anything like String, Vec
or iterators in libc for instance, just a bunch of relatively low-level
functions and wrappers around syscalls. It's also a fixed cost, so obviously
for a simple Hello World there's a massive overhead but for a more complex
application it should be less noticeable.

I mean when you think about it even the C Hello World is ridiculously bloated.
If you were to write the equivalent program in assembly without any dependency
on the libc you could probably get a binary that would be significantly
smaller (and most of its size would really be the ELF metadata).

~~~
chaosite
Well, if you're going to go that route, you in fact can make an ELF exectuable
that's smaller than the size of the ELF header (under Linux anyway)

[https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...](https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html)

------
ChrisSD
While comparing the sizes of "hello world" programs is cute, how do real world
small utilities compare?

------
JohnFen
That's an interesting breakdown. The size of Rust binaries is probably the
largest reason why Rust is not terribly compelling for me.

Seeing that it's possible to make that more reasonable is enlightening!

------
codedokode
The author doesn't even understand what's wrong with large binaries and writes
nonsence about Internet connection speed. Main problem is that statically
built libraries are not shareable. For example, if you have two Electron
applications, each with its own copy of Chrome, then every Chrome will take
about 100 Mb only for code section (not counting data, stack which typically
will be much larger) and together they take 200 Mb. But if those applications
didn't ship their own copy of Chrome and used the same libraries then the code
would be shared and took only 100 Mb of RAM. That is, 100 millions of bytes of
RAM are saved.

The more RAM such applications take, the higher is probability that the system
will get out of RAM and start swapping. And in this case the system will
become so slow that it won't matter what kind of Internet connection you have.

The problem of non-shareable code is also actual for interpreted languages
like Javascript, Python or Java. When running such applications, their code
and byte-code generated from it often are not shareable. So if you run two
copies of desktop JS application, that uses some JS library, you can get two
copies of this library in memory. As I remember, Android uses some clever
trick that allows sharing compiled Java code. Most other interpreted languages
cannot do it.

Also, a fun fact: if you are using Gnome-shell on Wayland and the system
starts swapping, computer can stop responding even to mouse movement, and you
won't be able even to switch to other VT.

~~~
monocasa
But part of the point of electron is a fixed base that you control. You only
have to validate against one specific version of electron. Forcing people to
use the same electron is directly against part of the value proposition to
using it in the first place.

Like, I say this as someone who hates the electron-ification of the desktop.

Additionally there's some pretty good arguments when you're not in a dual
interpreted/compiled world and can LTO fully like a pure Rust app is, that the
benefits of having everything shared between processes are outweighed by
stripping everything out that isn't necessary.

------
ncmncm
The REAL answer is still the same as in 2016: the language is not mature yet.
It will be fixed, in time, after other things that matter more.

Binary size has less priority than important goals such as feature
completeness, compiler speed, optimization performance, target platform
coverage, performance tooling support, library coverage, and myriad others.
Memory has become cheap enough that excess binary size is not typically the
limiting factor preventing deployment, where that happens.

As the language matures, numerous impediments to industrial deployment will
fall, one at a time. The language and its ecosystem are maturing with
impressive, even stunning rapidity, but it will still be ten years before the
language is an obviously safe choice for any random project -- if it gets
there at all. Historically, odds are against that, but the only real risks for
Rust are whether its adoption rate can be grown fast enough to retain
relevance, and whether its development jumps some unforeseen shark not easily
recovered from.

------
saagarjha
> it somehow aborted. Probably a libbacktrace issue, I don’t know, but that
> doesn’t harm much anyway.

Are you talking about the SIGILL? That’s probably a ud2 inside of panic, which
presumably tells LLVM it will not return and causes this instruction to be
placed there.

------
titzer
It always amazes me when new languages start out and their implementations
include a huge pile of crap by default.

Virgil is different, it only includes the stuff which is reachable from
main(). In fact the entire compiler is organized around only compiling
reachable code into the binary.

Even including the entire runtime and garbage collector, about (8KiB), there
isn't much smaller you can get:

HelloWorld.v3: def main(a: Array<string>) {

System.puts("Hello World!\n");

}

-rwxr-xr-x 1 titzer wheel 80 May 16 13:59 HelloWorld-jar

-rwxr-xr-x 1 titzer wheel 9088 May 16 13:59 HelloWorld-x86-darwin

-rwxr-xr-x 1 titzer wheel 5744 May 16 13:58 HelloWorld-x86-darwin-nogc

-rwxr-xr-x 1 titzer wheel 4132 May 16 13:58 HelloWorld-x86-darwin-nort

-rwxr-xr-x 1 titzer wheel 4692 May 16 13:59 HelloWorld-x86-linux-nogc

-rwxr-xr-x 1 titzer wheel 340 May 16 13:59 HelloWorld-x86-linux-nort

-rw-r--r-- 1 titzer wheel 6309 May 16 13:59 HelloWorld.jar

-rw-r--r-- 1 titzer wheel 63 May 16 13:57 HelloWorld.v3

-rw-r--r-- 1 titzer wheel 3486 May 16 14:04 HelloWorld.wasm

-rw-r--r-- 1 titzer wheel 256 May 16 14:46 HelloWorld-nogc.wasm

(the executable files are, well, the executables). The Linux executable
without runtime or GC is literally 340 bytes, the wasm executable without GC
is 256 bytes. The x86-darwin-nort binary is large because apparently Mach-O
executables don't work right unless they are at least one 4KiB page in size.

~~~
rmu09
I wanted to check that out, but it seems the homepage at
[http://compilers.cs.ucla.edu/virgil/](http://compilers.cs.ucla.edu/virgil/)
is broken, the webserver doesn't include the server side includes (e.g.
linkbar.inc, footer.inc)

~~~
titzer
[https://github.com/titzer/virgil](https://github.com/titzer/virgil)

