
Why are my Go executable files so large? - caiobegotti
https://www.cockroachlabs.com/blog/go-file-size/
======
nickcw
> prior to 1.2, the Go linker was emitting a compressed line table, and the
> program would decompress it upon initialization at run-time. in Go 1.2, a
> decision was made to pre-expand the line table in the executable file into
> its final format suitable for direct use at run-time, without an additional
> decompression step.

This is a good choice I think and the author of the article missed the most
important point - it uses less memory to have an uncompressed table.

This sounds paradoxical but if a table has to be expanded at runtime then it
has to be loaded into memory.

However if a table is part of the executable, the OS won't even load it into
memory unless it is used and will only page the bits into memory that are
used.

You see the same effect when you compress a binary with UPX (for example) -
the size on disk gets smaller, but because the entire executable is
decompressed into RAM rather than demand paged in, then it uses more memory.

~~~
pcwalton
No, there's no need for a tradeoff at all in a properly implemented compiler.
You can certainly compress a table in memory and still have that table be
indexable at runtime. Entropy coding is not the only type of compression that
exists. DWARF solved this a decade ago.

~~~
harikb
I thought Go did use DWARF for debugging info. Your remark sound as if some
easy optimization was ignored. Can you please elaborate?

~~~
pcwalton
This "pclntab" format duplicates what DWARF does, but is not DWARF:
[https://golang.org/src/debug/gosym/pclntab.go](https://golang.org/src/debug/gosym/pclntab.go)

------
PeterisP
Is it just me or something like that runtime.pclntab shouldn't be included in
production builds at all?

I mean, it makes all sense while you're developing and testing, but it should
be reasonably possible to strip it out from production build binaries, and
instead put it in a separate file so that if you _do_ get a crash with a stack
trace, then some external script can transform the program counters to line
numbers, not have it embedded in every deployed binary.

~~~
benesch
The Go language literally requires that pclntab be included in release builds.
I'm with you—it seems kind of crazy that this was designed into the
language—but there you have it.

The reason is that Go's standard library provides functions that allow
retrieving a backtrace and symbolicating that backtrace at runtime:

* [https://golang.org/pkg/runtime/?m=all#Callers](https://golang.org/pkg/runtime/?m=all#Callers)

* [https://golang.org/pkg/runtime/?m=all#CallersFrames](https://golang.org/pkg/runtime/?m=all#CallersFrames)

Unlike in C or C++, where you can link in something like libbacktrace [0] to
get a best-effort backtrace, those Go functions are guaranteed to be correct,
even when functions have been inlined. This is no small feat, and indeed
programs compiled with gccgo will often be incorrect because libbacktrace
doesn't always get things right when functions have been inlined.

[0]:
[https://github.com/ianlancetaylor/libbacktrace](https://github.com/ianlancetaylor/libbacktrace)

~~~
PeterisP
Is it _that_ common for these functions to be used? Perhaps transitively
through some popular libraries? Just being in the standard library doesn't
even necessarily mean that these functions (and the data they need) should get
included by the linker.

~~~
benesch
Any program that uses a logging framework, including the stdlib log package,
will wind up depending on runtime.Callers at least transitively. That’s
probably most Go programs; certainly most of the programs large enough to be
worrying about binary size.

Unlike in C, there are no macros like __FILE__ and __LINE__, so there is no
alternative to runtime.Callers (short of preprocessing your Go source code).

~~~
umanwizard
You can still get a backtrace without symbols though.

Why couldn't the Go team introduce a flag that strips symbols, while making
clear to people that they should only use it if they are okay with backtraces
looking like

    
    
      #1  0x00007ffff7ddb899 in __GI_abort () at abort.c:79
      #2  0x0000555555555156 in ?? ()
      #3  0x0000555555555168 in ?? ()
      #4  0x000055555555517d in ?? ()
      #5  0x0000555555555192 in ?? ()
    

or similar

~~~
giovannibajo1
Because one of the mantra of Go is _not_ telling users to have a "debug" build
and a "release" build. The development build is the one that goes into
production, with no difference in optimizations, symbols and whatnot. This has
pros and cons, like all tradeoffs.

~~~
jasonhansel
The "26M pclntab" issue sounds like an excellent argument against that design
decision.

~~~
loeg
Counterpoint: our work infrastructure for symbolizing stripped binary stacks
sucks. I'd love to get backtraces with symbols from the field.

But also this entire section could be compressed and seekable and only
expanded on demand. No need to leave it as a bloated uncompressed blob.

~~~
umanwizard
> our work infrastructure for symbolizing stripped binary stacks sucks. I'd
> love to get backtraces with symbols from the field.

If stripped binaries don't work for you, nobody is saying you should be forced
to use them. You would presumably just not avail yourself of the option.

(As an aside -- what's wrong with your symbolification infrastructure at work?
This is something I spent significant time working on in a past job, so I'm
always curious how other people are doing it).

~~~
loeg
My employer's internal infrastructure for this just sucks; it's not endemic to
stripping binaries.

Specifics? The automation to associate cores with specific builds is bad or
non-existent; the automation to load up GDB with the right files and
corresponding sources or source branch is bad/non-existent; the fileserver
storing symbol data is slow and sometimes remote to the developer across a
very thin pipe; etc, etc. All to some extent foot-shooting by my organization.
But we have TBs and TBs of build artifacts and hundreds of unique daemons
producing cores and also have to debug kernel cores and multiple product
branches.

Part of the pain is perhaps that we're cross-building for an embedded FreeBSD-
derived system, but the majority of our developers use Linux or Mac, so we
can't necessarily use host-native (or host-native-only) tools.

I think basically the situation is begging for someone familiar with the
problem-space to sit down and bang out a FUSE filesystem or two (e.g. fetching
sources on-demand as GDB reads them, instead of checking out a full multi-GB
repo copy) along with a shell or Python script to load up a core. But no one
has done it yet and management doesn't really prioritize developer tools.

------
swiftcoder
This is where letting a large enterprise guide the development of a piece of
widely-used software becomes questionable. At a FAANG the constraints are
fundamentally different.

At work I routinely see CLIs programs clocking in at a gigabyte, because it
was simpler to statically link the entire shared dependency tree than to
figure out the actual set of dependencies, and once your binary grows that
big, running LTO adds too much time to one's builds. And disk space is
basically free at FAANG scale...

~~~
jrockway
> This is where letting a large enterprise guide the development of a piece of
> widely-used software becomes questionable. At a FAANG the constraints are
> fundamentally different.

I see where you're going with this, but the conclusion of the article is that
Go isn't (what you call) FAANG-y enough for them.

From the article: _" This design choice was intended to lower the start-up
time of programs [...] This performance goal is not relevant to server
software with long-running processes, like CockroachDB, and its incurred space
cost is particularly inconvenient for large, feature-rich programs."_

What they want is a way to pass 1000 parameters to the compiler so that nobody
ever gets a working binary unless they have a team of release engineers to
make one for you. The Go team and their users at Google easily have access to
said team of release engineers, but took the opposite approach. They optimized
for programs that start fast even though most of those teams at Google are
writing programs that run for a long time. They optimized for a compiler and
runtime that don't have a billion knobs to turn, even though they can afford
to pay an army of engineers to tune those knobs.

So I actually think this has very little to do with optimizing for FAANG. This
is more about every program getting the same compiler options, which is merely
a philosophy that some people on the Go team have, not really a philosophy
that every FAANG has. (They use Java at Google too, and Java is more than
happy to give you knobs to turn. It also doesn't start up very fast.)

~~~
BossingAround
> It also doesn't start up very fast

This has been changing rapidly with GraalVM and projects like Quarkus, which
can build a static binary of a Java program that starts, as you'd expect, much
faster (still not performance of a C/C++ program, but way faster than a
regular Java program).

~~~
jhgb
Did I miss something important about GraalVM? I thought it was another runtime
on top of JVM, not a replacement for it. For a Java program, for example,
GraalVM would presumably have no impact since it wouldn't be used for it.

~~~
thu2111
GraalVM is several things.

In this context people are talking about the native-image tool which uses a
small JVM written in Java (SubstrateVM) to compile to native code ahead of
time, no HotSpot or other JVM dependencies.

There's also Truffle which is a way to put other languages onto the JVM (both
HotSpot and SubstrateVM).

------
tytso
I'm not sure why people are so worried about the size of the executable file
here. If the runtime.pclntab table is never[1] used then it won't be paged
into memory, and disk space is mostly free these days.

[1] Well, hardly _ever_! (Sorry not sorry for the obligatory Gilbert and
Sullivan reference.)

If you're using the Go executable on a system without virtual memory support,
yeah, that's going to suck, but it appears the Go runtime is horribly bloated
and not really suited for super-tiny 16-bit processors in the micro-embedded
space. But for something like Cockroachdb, why worry about the file size?

~~~
F-0X
> disk space is mostly free these days.

This is the only "argument" ever presented, and I don't think it is any good.
I care about file sizes. I want to get the most out of my hardware. Not
needing to buy another drive is always going to be cheaper for me and every
other user.

~~~
everdev
> Not needing to buy another drive is always going to be cheaper for me and
> every other user.

128GB+ drives are standard on mid-range laptops. Even at 64GB are you really
going to fill up disk space because of Go executables?

CockroachDB (a large software project) is only 123MB. I doubt most people even
have 100 pieces of non-default software on their laptop or that executables
are going to fill up storage and break anyone's bank these days.

If you're short on HD space, I'm typically targeting photos and videos, not
software.

~~~
mimi89999
Well. Grow your entire system by 60%. Even with 128GB it will be non
negligeable.

------
nneonneo
This is where go’s insistence on reinventing the wheel feels terribly
misplaced. Every major debug format has a way to associate code locations with
line numbers. Every major debug format also has a way to separate the debug
data from the main executable (.dSYM, .dbg, .pdb). In other words, the problem
that the _massive_ pclntab table (over 25% of a stripped binary!) is trying to
solve is already a well-trodden and solved problem. But go, being go, insists
on doing things their own way. The same holds for their wacky calling
convention ( _everything_ on the stack even when register calling convention
is the platform default) and their zero-reliance on libc (to the point of
rolling their own syscall code and inducing weird breakage).

Sure, the existing solutions might not be perfect, but reinventing the wheel
gets tiresome after a while. Contrast this with Rust, which has made an overt
effort to fit into existing tooling: symbols are mangled using the C++ mangler
so that gdb and friends understand them, rust outputs nice normal DWARF stuff
on Linux so gdb debugging just works, Rust uses platform calling convention as
much as possible, etc. It means that a wealth of existing tooling just works.

~~~
antics
I am not a fan of Go, and I also wish these things were true (and more[1],
actually), but I find it hard to agree that its priorities are "terribly
misplaced." Inside the context of Go's goals ( _e.g._ , "compile fast") and
non-goals ( _e.g._ , "make it easy to attach debuggers to apps replicated a
zillion times in Borg") these trade-offs make a lot of sense to me. Like: Go
rewrote their linker, I think, 3 times, to increase the speed. If step 1 was
to wade through the LLVM backend, I am not sure this would have happened. Am I
missing something?

I love Rust, but Go is focused on a handful of very specific use cases. Rust
is not. I don't know that I can fault Go for choosing implementation details
that directly enable those use cases.

[1]: [http://dtrace.org/blogs/wesolows/2014/12/29/golang-is-
trash/](http://dtrace.org/blogs/wesolows/2014/12/29/golang-is-trash/)

~~~
mwcampbell
> non-goals (e.g., "make it easy to attach debuggers to apps replicated a
> zillion times in Borg")

But wouldn't it still be nice to have a standardized way to analyze post-
mortem dumps across languages?

~~~
nine_k
Google's anointed production languages used to be five: C++, Java, JavaScript,
Python, and Go. Not much to reasonably standardize across, especially if a
standardized solution ends up with more compromises than a custom one.

~~~
pcwalton
But DWARF uses less space than Go's native format. So inventing a custom
"linetab" format seems like the compromise, not using DWARF.

~~~
nine_k
I suspect that the format is again copy-pasted from somewhere in Plan9, and
existing Plan9 tools for it are ported, too.

------
aequitas
There is an project called TinyGo [0] which brings Go to embedded systems
where binary and memory size matter even more.

[0]
[https://archive.fosdem.org/2019/schedule/event/go_on_microco...](https://archive.fosdem.org/2019/schedule/event/go_on_microcontrollers/)

------
evmar
The next time you need to make an HTML treemap like this, try my tool:
[https://github.com/evmar/webtreemap](https://github.com/evmar/webtreemap)

It provides a command line app that accepts simple space-delimited data and
outputs an HTML file. See the doc:
[https://github.com/evmar/webtreemap#command-
line](https://github.com/evmar/webtreemap#command-line)

(It also is available as a JS library for linking in web apps, but the command
line app is the one that I end up using the most. I actually built it to
visualize binary size exactly like this post and then later generalized it.)

~~~
pacoverdi
Another option is to generate a text file in the format expected by
flamegraph. Especially useful when the data is hierarchical.

[https://github.com/brendangregg/FlameGraph](https://github.com/brendangregg/FlameGraph)

Exemple for Java:

[https://github.com/pcdv/deps-
flamegraph/blob/master/README.m...](https://github.com/pcdv/deps-
flamegraph/blob/master/README.md)

------
kris-s
Great writeup, I believe there is a open issue from Rob Pike 2013 this would
fall under:
[https://github.com/golang/go/issues/6853](https://github.com/golang/go/issues/6853)

~~~
ChrisSD
Except that this write up says this was a deliberate design decision to trade
space for speed. That's not something to be fixed, unless you convince Go to
make different trade offs or to provide more optimization options.

~~~
kris-s
The linked issue is tagged "NeedsFix".

------
lazyjones
The author guessed a few things wrong:

* fmt.Println pulling in 300KB isn't proof that Go's standard library isn't "well modularized". It's the wonders of Unicode and other code that is actually used.

* 900K for the runtime isn't surprising when you have complex garbage collection and goroutine scheduling among other things

~~~
majewsky
> fmt.Println pulling in 300KB isn't proof that Go's standard library isn't
> "well modularized". It's the wonders of Unicode and other code that is
> actually used.

I would guess a large part is that it has to pull in the entire reflection
library to support format verbs like %#v which renders the argument as a Go
literal.

------
Carpetsmoker
You can compress with upx (at the cost of increased startup time in the order
of hundreds of ms, which is okay for servers) and/or not include all debug
symbols. Doing both usually shaves >60% off a binary.

~~~
amscanne
UPX transforms demand-paged, reclaimable page cache memory into a blob of
unreclaimable anonymous memory.

It makes no sense for most use cases where I’ve seen it. It adds runtime costs
both in terms of start-up and memory usage.

Maybe it helps in terms of binary sizes for downloads — but those are often
compressed anyways! E.g. Your docker images are compressed and UPX’ed binaries
in a layer aren’t buying you anything (just adding runtime costs).

~~~
Carpetsmoker
For a lot of applications the increase in startup size and memory usage are
negligible. The startup time is increased by dozens or hundreds of ms, which
is not a lot for a server application. I tried to measure the increase in
memory, but wasn't really able to, so either it's a very subtle difference or
it's very small.

Not everything uses Docker.

~~~
charleslmunger
UPXed binaries have their code pages mapped as dirty - this means the OS can't
page them back to disk if needs or wants to. In some cases, that's an
acceptable cost to pay - for low-latency servers you might want to mlock all
your executable pages so there's no risk of a page fault and disk read killing
your tail latency. Of course if you're doing that then you have to either pay
the full cost of your binary size in memory, or you have to have some warm up
phase and then hope that everything you need is loaded then. In the first case
you suddenly care a lot about binary size, because memory is quite a bit more
expensive than disk.

But one valuable reason to use something like UPX is that you can attach a
crappy and thus inexpensive disk to servers that you're not using for actual
storage. Compression on disk lets you load from a slow disk faster, and of you
weren't paging to disk anyway then UPX doesn't have much of a cost.

But if you're on a traditional desktop operating system, UPX will increase
your effective memory footprint, and force writing to swap instead of merely
dropping pages. On Android, which doesn't swap, you'll significantly increase
your memory footprint.

------
todotask
Rob Pike submit issue:
[https://github.com/golang/go/issues/36313](https://github.com/golang/go/issues/36313)

------
throw_m239339
Being an Electron and a Go developer, I'm not complaining too much about the
size of Go executables. Electron backed softwareon the other hand...

~~~
dana321
Have you tried this? I've played around with it in the past.

[https://github.com/asticode/go-astilectron](https://github.com/asticode/go-
astilectron)

------
selljamhere
Would it be possible to add a flag for the compiler to disable the line table
pre-expansion?

~~~
w0rd-driven
I can’t exactly tell if we’re saying the same thing but my thought was a flag
to switch between the 1.2 way for faster startup and the earlier approach for
longer running processes. The trade off is added complexity in identifying
your binary usage patterns and keeping both methods in the tooling.

These kind of changes may not be breaking in a technical sense but it’s very
unexpected behavior if you’re one to notice patterns like file sizes changing
in such a significant way over time. An answer of “stick with v1.1x
indefinitely if you want the old behavior” only feels like a very temporary
answer.

------
kccqzy
Language flame wars aside, Bloaty McBloatface is a wonderful tool to analyze
why the binary size is big:
[https://github.com/google/bloaty](https://github.com/google/bloaty)

I frequently use this tool to answer questions for C++ binaries, another
language that has a penchant for producing large executables.

------
zerr
70MB source seems somewhat bloat for such project, no?

~~~
thrill
For an ACID compliant, resilient, consistent, distributed, auto-sharding and
auto-tuning for low latency, highly scalable SQL database? No.

~~~
BubRoss
70MB of source is such an extreme amount I don't know how it could be
reasonably justified, there must be an enormous amount of waste. All of sqlite
is 6MB.

~~~
paulddraper
When you have the features of sqlite, you can have the size of sqlite.

PostreSQL is 36MB. [1] Granted Go is much terser than C and has a larger
standard library, but we're not at absurd levels.

[1]

    
    
         find src -name '*.c' -o -name '*.h' | xargs cat | wc -c

------
warent
So what is the solution then? Will they just have to fork Go and compress the
table again like before? It's completely insane that it would eventually
surpass the size of the program itself.

~~~
kseistrup
Is the line table in go executables absolutely essential? Shouldn't they be
strippable, the way you can strip debug symbols from a C binary?

~~~
Carpetsmoker
They are, and many people don't include them in production builds. This
article is incomplete for not mentioning it.

~~~
caiobegotti
You're right, but that wouldn't make the main problem go away though. I just
built a simple plugin for Kubernetes's kubectl and it's about 32MB with go
build's -ldflags -s -w where 16MB of that is still the pclntab mentioned in
the article.

~~~
Carpetsmoker
The problem with removing that completely is that you won't get _any_
information on panics. I don't think this is what you really want, and the
current behaviour strikes me as a reasonable middle ground.

32M is really large for a simple plugin, and to be honest I think that says
just as much about Kubernetes as it says about Go.

------
_bxg1
> prior to 1.2, the Go linker was emitting a compressed line table, and the
> program would decompress it upon initialization at run-time. in Go 1.2, a
> decision was made to pre-expand the line table in the executable file into
> its final format suitable for direct use at run-time, without an additional
> decompression step.

Sounds like a good case for a flag.

------
Timothycquinn
Why are they not considering keeping the pre 1.2 compressed runtime.pclntab
and have the data be decompressed to a separate file on first run. This way,
memory footprint is kept low whilst keeping executable size down?

------
bsaul
sidenote : currently trying to get up to date with the best way to get
distributed acid key value storage those days. Is coackroach the new standard
? I tried to find benchmarks comparing it to things like postgres for various
use case but only found articles that read like ads.

~~~
redis_mlc
> the best way to get distributed acid key value storage

You will need to define "best way", "distributed" and "acid" based on your
requirements.

For most people, multi-master MySQL and Redis with vector clocks is a great
combination.

> Is coackroach the new standard

No database less than 5 - 10 years old works in production, so no, it's not a
standard.

> I tried to find benchmarks comparing it to things like postgres for various
> use case

Well, without more detailed requirements, good luck. Also comparing pg to
distributed databases is ... like comparing apples and oranges.

Source: experienced DBA.

~~~
bsaul
distributed : able to scale horizontally, ideally over multiple availability
zone, to ensure both performance and data safety.

acid : able to perform bank account money transfert style transaction.

last time i checked, pg was able to shard and work in multiple replication
configuration, ensuring a good scalability story up to terabytes of active
data..

Does multi master mysql enable row locking ?

------
aasasd
> _runtime.pclntab_

Ah yes, an example of Google's long and descriptive identifier names, a lament
of which surfaced here recently.
([https://news.ycombinator.com/item?id=21843180](https://news.ycombinator.com/item?id=21843180))

> _I’m glad we now live in a futuristic utopia where keyboard farts like p,
> idxcrpm, and x3 are rare._

~~~
esprehn
What's interesting is that Go encourages letter soup while the rest of Google
styles encourage long descriptive names.

Sometimes I think it was intentionally done to troll the rest of Google that
used long_descriptive_names_ with an 80 col limit (Java argued up to 100 but
no one else did, not even JS which uses Java's long names).

I always thought it was funny walking around the office seeing everyone with
their squished and wrapped code on a huge 32in monitor. Lots of the code in
google3 looks like haiku squished to the right margin.

------
dana321
Go is a _hybrid_ language.

Its not a jvm, but its runtime has jvm-like features such as garbage
collection and reflection but also a thread scheduling system called
goroutines.

I love the fact it is monolithic in nature.. One exe is all you need no matter
which platform you use. Everything is statically compiled into the binary.

No bundling the jvm and a load of jar files, or lib*.so dependencies.

~~~
int_19h
Toolkits to precompile a Java app into a single native binary with the runtime
and stdlib inside have been around for 20 years or so.

------
kazinator
> _there is about 70MB of source code currently in CockroachDB 19.1_

 _That 's_ what is insane here; way more so than Go executable size issues.

------
ausjke
go really should be the same size as c with static link, somehow it's 4x
larger in size, why is that?

upx can help, with upx, c static binary is still much smaller.

both removed debugging info.

------
fxtentacle
TLDR: They included debugging information when they did not intend to.

~~~
techsin101
What's the fix?

~~~
layoutIfNeeded
Don’t use Go.

------
andy_ppp
Because Go isn’t C?

------
boulos
Disclosure: I work on Google Cloud (but this applies generally).

There seem to be a lot of arguments about disk space (arguably free in 2020),
memory (free-ish because as tytso points out, they won’t get backed) and then
bandwidth.

I _think_ most people saying “bandwidth” might mean “time to fetch”, because
GCS, GCR, S3, etc. all have free egress on the transfer from them to GCE/EC2.
If you have a self-hosted Docker Hub or something on an EC2 instance, that’s
not the case (you may pay Zone to Zone egress of $.01/GB).

If you were paying a penny per gigabyte egress, a 100MB-ish binary is only .1
GB and therefore at best 1/1000th of a dollar. On the 16 vCPU hosts that
CockroachDB prefers (see their recent benchmarks), that’s equivalent to about
5 _seconds_ of runtime.

A fair retort is that in container land, this becomes death by 1000 cuts as
each binary includes all the same stuff over and over again (for minikube, the
.iso is nearly 1 GB [1] now, but not because of the 50MB binary itself).

Even so, a _1GB_ image takes almost as long to pull from a object store like
GCS as a 100 MB image (at many Gbps, the constant factors dominate). If you’re
trying to run something this large as a function on Cloud Run or Lambda or
Knative, you’ll probably be sad (you’ve burned about 800 vcpu seconds,
economically, of compute time) but that’s why there are layers.

tl;dr: your 50 or 100 MB binary doesn’t “cost” much, but a 1 GB container
image without shared layers does.

[1]
[https://github.com/kubernetes/minikube/issues/5013](https://github.com/kubernetes/minikube/issues/5013)

------
narven
Use upx

[https://upx.github.io/](https://upx.github.io/)

------
sys_64738
I think the GO runtime libraries are linked into the binary so there are no
external dependencies for GO itself.

~~~
turk73
Exactly. The whole point of Go is to fix the issue with Python where you have
to have all the right libraries installed wherever you intend to run.

