
Why is a Rust executable large? - adamnemecek
https://lifthrasiir.github.io/rustlog/why-is-a-rust-executable-large.html
======
haberman
I've been playing around with a tool to answer the more generic question: why
is my binary (written in C, C++, Rust, etc) so large?

We use CPU profilers to tell us why our programs are slow. My tool is intended
to be a _size_ profiler: why is my program big?

It turns out there are some really interesting analyses and visualizations you
can perform to answer this question.

A few direct comments on the article:

\- I do not recommend stripping your binaries completely! Use "strip --strip-
debug" instead (or just don't compile with -g). You should realize that debug
information and the symbol table are two separate things. The debug
information is much larger (4x or more) and much less essential. If you strip
debug information but keep the symbol table, you can still get backtraces.

\- I don't believe -Wl,--gc-sections has any effect unless you also compile
with -ffunction-sections and/or -fdata-sections, which these examples for
C/C++ do not.

If you care about binary size in C or C++ you should probably be compiling
with -ffunction-sections/-fdata-sections/-Wl,--gc-sections these days. Sadly,
these are not the default.

~~~
lifthrasiir
\- It is fine to strip both debug information and symbol table if you want the
distributable (I mentioned the ISP at the beginning for one reason) and the
program does not rely on them in any way. Actually, I wonder why separate
debug files [1] are not a norm in Unixes.

\- `-Wl,--gc-sections` does have an effect even at the absence of `-ffunction-
sections` (and so on) because libc and libstdc++ is already compiled in the
way allowing for GC---perhaps necessarily. The example had a single function
anyway and I was lazy enough to exploit that `-ffunction-sections` wouldn't
have a difference...

[1] [https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-
Fil...](https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html)

~~~
haberman
Without symbols you can't get backtraces, profile the program, use function-
based DTrace probes, readably disassemble it, etc. I'm not saying it's
impossible to distribute stripped binaries, I'm just saying I don't recommend
it. Compared with the debug information, I think the symbol table is much
bigger bang for the buck, considering how much smaller it is.

It's strange -- I can verify with -Wl,--print-gc-sections that this is indeed
discarding some sections from a static glibc link, so I was wrong about that.
On the other hand I can also see plenty of sections in libc.a that have more
than one function in them -- not sure why this would happen if libc was indeed
compiled with -ffunction-sections.

I agree that separate debug files are nice, and OS X's implementation of them
is especially nice, since it does a very good job of finding the right debug
information for an executable.

~~~
lifthrasiir
Yeah, I don't doubt the usefulness of the symbol table. I'm probably thinking
of the distributable in _Windows_ , where you... don't really do such things.

IIRC the coding convention of glibc is that most functions (and probably all
public functions) are contained in their own files. So it effectively results
in the similar effect, even when `-ffunction-sections` is missing.

------
pslam
The short version is: as small or large as you want to trade-off convenience
vs performance vs size.

I've made useful firmware for a micro-controller (yes in Rust) which is just
5KB. You can, as the article shows how, dynamically link and then it's about
the same as the equivalent C/C++.

The point is, there's nothing inherent about Rust - the language - which
results in binaries appreciably different in size to what you can achieve in
C/C++.

~~~
davidcuddeback
> I've made useful firmware for a micro-controller (yes in Rust)

Would you mind sharing which microcontroller and how to get a Rust compiler
for it? I would love to use Rust to program microcontrollers. Every time I've
looked into this, I've thought it wasn't possible, because I don't see any
microcontrollers in the list of supported platforms here ([https://doc.rust-
lang.org/book/getting-started.html#platform...](https://doc.rust-
lang.org/book/getting-started.html#platform-support)) or here
([https://github.com/rust-
lang/rust/tree/master/mk/cfg](https://github.com/rust-
lang/rust/tree/master/mk/cfg)). I've considered trying avr-rust
([https://github.com/avr-rust/rust](https://github.com/avr-rust/rust)), but
the README says "NOTE: This does not currently work due to a bug." Any
pointers to get started would be appreciated.

~~~
kam
I've used Rust on a variety of ARM Cortex-M based microcontrollers including
Atmel SAMD21, NXP LPC1830, and STM32F4. AVR is tougher because that's an 8-bit
architecture with very new support in LLVM.

On microcontrollers, you use libcore instead of the full libstd, for lack of
an OS or memory allocator. Libcore is easy to cross compile because it has no
dependencies. You provide a target JSON file to specify LLVM options, and a C
toolchain to use as a linker, and nightly rustc can cross-compile for
platforms supported by LLVM.

[https://github.com/hackndev/zinc](https://github.com/hackndev/zinc) is a good
starting point.

Things like interrupt handlers and statically-allocated global data require
big chunks of unsafe code. Rust has a promising future in this space, but it
will take more experimentation to get the right abstractions to make
microcontroller code more idiomatic.

~~~
TylerE
How does programming a "modern" 8 bit architecture work anyway? Do you have
16bit longs? 32bit long longs?

~~~
Sanddancer
For 8 bit platforms, 8 bits are still bytes/sbytes. Most of them have native
functions that work on 16 bit integers, they just take up a pair of registers.
You have access to the same array of integer sizes. Plain int is 16 bits, and
longs are 32 bits and long longs still 64. Still, I much prefer using the int
definitions included in C99, where you define the bit size explicitly.
uint16_t is a lot more explicit than int, especially if you've got code that's
being shared between a few different micros of different word sizes.

------
Someone1234
A related but distinct question might be: How does Rust filesize scale?
Meaning, a Hello World application is X, and X is bigger than C/C++. But does
Rust outgrow projects in C/C++ as the complexity increases? Or is the code
generation actually consistent and all of this is simply about the corelib
size and nothing more.

~~~
kibwen
Given that Rust monomorphizes generic functions like C++, and given that Rust
uses a high-quality C++ backend for code generation, I'd assume that default
binary sizes would be comparable. Producing a more scientific comparison would
require implementing a large project more-or-less identically in both
languages, which is unlikely in the near future (it might be instructive to
compare Servo to Gecko, but Servo isn't near complete yet, and even then Servo
does many things differently that might influence the comparison).

------
glandium
_> Just to be cautious: Is it a valid question to ask after all? We have
hundreds of gigabytes of storage, if not some terabytes, and people should be
using decent ISPs nowadays, so the binary size should not be a concern,
right?_

$ find /usr/bin -type f | wc -l 2254 $ du -sh /usr/bin 445M /usr/bin

If all these programs were written in Rust and statically compiled, assuming
only a 600K difference by binary, that would make my /usr/bin 1.3G (or 300%)
larger.

But in reality all those programs are dynamically linked against many
libraries in /usr/lib, so the difference would be even bigger, with libraries
duplicated between all those programs.

Sure, you can dynamically link with Rust too, but then, you hit the other
problem that there is no stable ABI (yet?), and that upgrading Rust (every 6
weeks) means recompiling _everything_.

~~~
yazaddaruvala
Forgetting about Rust for a second. Talking about dynamically linked libs in
general.

Dynamic linking was an optimization which came about when memory was
expensive. Memory is no longer expensive.

Is 1.3GB (or even 13GB) a lot on your hardware[1]?

According to "Do not prematurely optimize": Pretending we never had dynamic
linking, and given today's hardware constraints[2], as a community would we
choose to reimplement and universally adopt this optimization?

[1] Keep in mind that a single "modern" application, on average, weighs in the
10s of MBs or GBs.

[2] I'm talking about the general case, for the majority of OS distributions,
ignoring the relatively exceptional case of embedded systems, which do in-fact
still need it.

~~~
rtpg
Dynamic linking also lets you update libraries due to things like security
issues, it's not just a memory thing. Kinda agree on the space thing too (plus
much less chance for things like buffer overflows..)

~~~
yazaddaruvala
FWIW: I think everything has its place, and everything has tradeoffs. I can
definitely see a lot of usefulness for dynamic linking. The point you raise
probably being the best _current_ reason.

... but since I'm already playing devils advocate :)

Dynamic linking also lets you update libraries ... and cause security issues
simultaneously across all applications. Increasing the number of possible
attack vectors to successfully utilize that vulnerability.

~~~
jacques_chester
Actually, it's a wash. If all we had was static linking, people would
statically link the _same_ common libraries. So you'd have to update multiple
binaries for a single vulnerability.

I've seen this in my day job at Pivotal. The buildpacks team in NYC manages
both the "rootfs"[0] of Cloud Foundry containers, as well as the buildpacks
that run on them.

When a vulnerability in OpenSSL drops, they have to do two things. First, they
release a new rootfs with the patched OpenSSL dynamic library. At this point
the Ruby, Python, PHP, Staticfile, Binary and Golang buildpacks will be up to
date.

Then they have to build and release a new NodeJS buildpack, because NodeJS
_statically_ links to OpenSSL.

Buildpacks can be updated independently of the underlying platform. The
practical upshot is that anyone who keeps the NodeJS buildpack installed has a
higher administrative burden than someone who uses the other buildpacks. The
odds that the rootfs update and NodeJS buldpack are updated out of sync is
higher, so security is weakened.

Dynamic linking is A Good Thing For Security.

[0]
[https://github.com/cloudfoundry/stacks](https://github.com/cloudfoundry/stacks)

[1] [https://github.com/cloudfoundry/nodejs-
buildpack](https://github.com/cloudfoundry/nodejs-buildpack)

------
mraleman
> Just to be cautious: Is it a valid question to ask after all? We have
> hundreds of gigabytes of storage, if not some terabytes, and people should
> be using decent ISPs nowadays, so the binary size should not be a concern,
> right?

Phones? IoT? Embedded? OS devs? Someone just checked in a 14KB binary size
reduction to our shell (by removing unnecessarily virtual methods in some C++)
that was widely celebrated.

Also, smaller .dll's load faster from disk.

------
AndyKelley
Here's Zig[1] for comparison:

    
    
        $ cat ../example/hello_world/hello.zig 
        const io = @import("std").io;
        
        pub fn main(args: [][]u8) -> %void {
            %%io.stdout.printf("Hello, world!\n");
        }
        $ ./zig build ../example/hello_world/hello.zig --export exe --name hello --release --strip
        $ wc -c hello
        5760 hello
        $ ldd ./hello
    	not a dynamic executable
        $ ./hello 
        Hello, world!
    
    

5 KB. There are a few unfortunate reasons this isn't even smaller, one of them
being a bug report[2] I filed in LLVM.

Also note that Rust is a lot further along than Zig right now. Zig does not
have backtraces or threads yet. But I believe that the executable size for
hello world in release mode will not contain backtrace code, or threads, or a
memory allocator, even when Zig catches up to Rust in terms of std lib
functionality.

[1]: [http://ziglang.org/](http://ziglang.org/)

[2]:
[https://llvm.org/bugs/show_bug.cgi?id=27610](https://llvm.org/bugs/show_bug.cgi?id=27610)

------
DannyBee
"C and C++ folks had been fine with that approximation for decades, but in the
recent decade they had enough and started to provide an option to enable the
link-time optimization (LTO)"

LTO is much older than this, even for C/C++

Just not in GCC :)

~~~
eatonphil
Where has it been around?

~~~
plorkyeran
VC++ first shipped it in Visual Studio .NET (early 2002), and I don't remember
it being touted as the first production-quality implementation of the concept,
so I assume it was around elsewhere before that.

------
neopallium
Golang binaries are also very large [0].

    
    
      package main
      func main() {
        println("Hello, world")
      }
    

That compiles to 1.1M using go 1.5.

0\.
[https://github.com/golang/go/issues/6853](https://github.com/golang/go/issues/6853)

~~~
cgag
Go doesn't pride itself on not having a runtime though. People aren't
surprised that there's space taken up by the garbage collector and green
thread management and such.

~~~
ComputerGuru
I don't understand your remark. To the contrary, a runtime should decrease
binary size. The same PHP script (interpreted not JIT or VM'd though,
obviously) will be only a few hundred bytes..

~~~
cgag
The runtime in this case is compiled into the binary. With php, the runtime is
all contained in the php binary that runs your scripts, in Go's case, the
runtime is copied into every binary it produces.

~~~
nowprovision
For a more true comparison with external scripting runtimes one can do: go run
main.go

Although this would be ill-advised, kind of like PHP in general..

~~~
cgag
I'm not sure what you mean. Isn't this equivalent to `go build main.go &&
./main`?

~~~
nindalf
It is. Not sure if the other guy has ever used Go.

------
dkhenry
Before I found rust I used to be a big advocate for Scala. What eventually
drove me away from Scala and towards go and rust was my jar's for Scala were
clocking in at 300 to 400 Mb's, so the fact that a rust binary has a pretty
fixed overhead of only a few hundred Kb I consider that a big win.

------
stcredzero
I recall someone once getting a Squeak Smalltalk image down to 384k. However,
there was a Digitalk originated project called "Firewall" that could produce
Smalltalk images as small as 45k, suitable for writing command-line programs,
even on early 90's machines. (Even recent versions of VisualWorks can get
their memory footprint down around to that of Perl 5's, and even beat Perl 5
in terms of startup speed, provided you are prepared to dig around and shut a
whole lot of things off.)

~~~
kbenson
That's fairly impressive. Perl 5 is pretty quick to start:

    
    
        # perl -E 'my $cmd = shift; use Time::HiRes; my @times; for (1..10) { my $start=Time::HiRes::time; my $out = system($cmd); my $stop = Time::HiRes::time; die if $out>>8; my $time = $stop - $start; push @times, $time; printf "%0.4f\n", $time; } @times = sort {$a<=>$b} @times; @times = @times[1..8];  my $cumulative=0; $cumulative += $_ for @times; my $average = $cumulative/8; printf "Average time of 10 runs of \"%s\", dropping best and worst: %0.4f\n", $cmd, $average;' "perl -e '1;'"
        0.0039
        0.0032
        0.0031
        0.0031
        0.0032
        0.0035
        0.0036
        0.0030
        0.0033
        0.0032
        Average time of 10 runs of "perl -e '1;'", dropping best and worst: 0.0033
    

That's Perl 5.22.1. For "python -c '1'" (Python 2.7.5) I get 0.0173. A minimal
C program (just return success from main after including stdlib and stdio)
built with default gcc opts is <9k in size, and vacillates between averaging
0.0007 and 0.0010 in the the benchmark above when I run it.

~~~
thwarted
That perl code that Time:HiRes to measure the time it takes to start perl via
system(), which includes the time it takes to fork a shell and parse the
command and then fork to spawn perl. `time perl -e '1;'` is more
representative of the raw perl startup time, which on my machine is reliably
0.002 real seconds, ⅔ of your times (that is, there's a lot of overhead in
your measurement).

 _A minimal C program (just return success from main after including stdlib
and stdio)_

You do realize that a "including stdlib and stdio" means nothing for a C
program, right? These are, literally, just the API definitions, in the Oracle-
vs-Google Android sense. The default gcc options probably produced a dynamic
executable; if you compile it statically, you might be able to shave a few
more microseconds in startup time.

~~~
kbenson
> includes the time it takes to fork a shell and parse the command and then
> fork to spawn perl.

The only difference from the shell builtin time, or /usr/bin/time, should be
the shell startup and exec call. Every command is fork+exec, so you can't
really get away from that. If I _really_ cared, I would try to account for
that, but I don't. I thought it was pretty obvious that I wasn't being very
rigorous. I just did the minimum to make the values I got not useless.

> which on my machine is reliably 0.002 real seconds, ⅔ of your times

And on mine it fluctuates between 0.002 and 0.009. You mention your times
relative to my times, but that's useless. I ran mine on a 512mb VPS. The only
relevant measure to determine overhead would be my _method_ vs the shell
builtin. Is that what you were actually referring to?

> You do realize that a "including stdlib and stdio" means nothing for a C
> program, right?

Truthfully, so I wouldn't have to remember the setup for C, I just googled
"minimal C program" and removed the printf it had. I only mentioned the
includes for completeness sake, and I didn't want to clutter the comment with
the source.

> if you compile it statically, you might be able to shave a few more
> microseconds in startup time

Possibly, but that's not really what I was trying to convey. I was just
pointing out that a minimal Perl program isn't slower than a minimal C
program, and it's notable if you can get your startup times close to that for
a system with a virtual machine.

~~~
kbenson
Err "minimal Perl program isn't slower than a minimal C program" was supposed
to be "minimal Perl program isn't _that much_ slower than a minimal C
program".

------
grenoire
I really like executable shrinking posts. However, isn't it the case that the
size of the executable won't increase significantly if you use --release to
distribute bigger programs? After all the size comes from the library and
memory allocator being included in the executable. As long as the libraries
are not heavy, the executable should stay reasonably small.

~~~
kbenson
That matches my (amateur) understanding of how it works. The executable would
increase by small amounts because presumably your own code is a relatively
small portion of all the code included in the default binary. Then again,
since it's not dynamically linking, every crate you use increases the size...

------
amag
Dunno what GCC is doing but a statically linked C++ "Hello World" built by
MSVC is about 140k and for C it's about half that.

------
jokoon
Another example of how languages don't matter, and that developing a good
enough language with all the nice ecosystem that goes with it, is a massive
amount of work, that can only be accomplished by OS vendors or years of
dedicated OSS contributors.

At that moment, I think most language developers should turn to LLVM to spread
their languages more rapidly, rather than trying to mess around with all the
realities of language making, especially when it involves statically typed and
compiled languages.

~~~
pjmlp
*BSD and GNU/Linux eco-systems are a bit different in this regard, but on other OSes usually system programming languages that aren't part of the OS vendor offerings tend to have a hard time getting wide adoption support.

------
halestock
How much of that initial 650k is constant in size? Would we expect it to
increase with programs of substantial complexity, or is the overhead
relatively constant?

~~~
tatterdemalion
A large portion of it is constant. As this article discusses, much of the bulk
comes from statically linking the standard library and jemalloc.

------
Kiro
The tl;dr doesn't explain why it's so big, only how to fix it.

------
masgui
`ls -lh` please!

------
masgui
ls -h please!

------
astro1138
Why not link the Rust libstd dynamically?

~~~
jontro
They did in the article

    
    
        $ cargo rustc --release -- -C prefer-dynamic 
        $ ls -al target/release/hello
        -rwxrwxr-x 1 lifthrasiir 8831 May 31 21:10 target/release/hello*

------
miander
Good analysis. Now that good information on the causes is available, the
larger community can discuss solutions.

~~~
toss1941
I don't understand the article's point about ISP speeds. Really if you're on a
slow enough ISP that a <1MB file is too large for someone to download, then
practically speaking they won't be able to download anything smaller either.

~~~
voltagex_
Because it's not that single ~1MB file that's a problem, it's the cumulative
effect. There are people on dialup and GPRS and even people who pay by the
megabyte.

