Hacker News new | past | comments | ask | show | jobs | submit login
The Rust Compilation Model Calamity (pingcap.com)
236 points by WTTT on Jan 30, 2020 | hide | past | favorite | 176 comments

Looking at the TiKV compile time graph, if that's for 2 million lines of real-world (not synthetically generated) code, that actually seems kind of reasonable.

~12 min for a full release build? ~5 min for a full dev build? What looks like several seconds for an incremental build?

I suppose it's good that a (former?) core contributor and large user of Rust holds the language to such high standards, but this doesn't seem especially bad.

I'm kind of curious how much faster compiling a large Go codebase is. How fast, e.g. does the entirety of Kubernetes compile? I'd imagine it's probably under a minute, but is it several seconds?

> I'm kind of curious how much faster compiling a large Go codebase is. How fast, e.g. does the entirety of Kubernetes compile?

I was curious too, so I just tried it. On my 6-core previous-gen MacBook pro, I get 2m18s. That's for SLoc = 3296486 (assuming all deps are vendored and non-deps are not-vendored, which I think is true), so about 16K SLoc/s.

Or to put it another way, if Rust compiled as quickly as Go, we'd expect to compile a release build of TiKV in 2m.

It would be interesting to have a `reverse` blog post "why Go compilation time is fast" e.g. which optimizations are overlooked/done quickly by the go compiler ? Could the binary quality be improved by increasing the compilation time ? AFAIK you don't have optimization flags in go (or the default one are optimum).

And to be honnest, I have no idea what the compiled Go code looks like. As much as I have no idea what actual instructions are executed by the python interpreter.

Well one big reason is the biggest feature people are complaining about: A lack of generics. According to [1], there's a fundamental trade-off with respect to generics:

"The generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?"

At the moment, Go has no generics, which translates to "slow programmers", but not "slow compilers". It's likely that the simplicity of the Go language as a whole -- which leads to many complaints -- is a major factor in the fast compile times.

EDIT: Lest it seem like I'm bashing Go here, the quote above is from one of the core Golang developers (in 2009 no less), and is mirrored in this from the proposal [2] from another core Golang developer. It's meant to be a shorthand for, "Generics do actually save programmer time and effort, and Go programmers are working harder than necessary because the language doesn't have generics yet." The point of this post is to point out that generics have a cost (slow compilation time) and that fast compilation time has a cost (no generics -- at least, not without inventing a new way of doing generics).

It's possible to love something while still seeing its deficiencies and wishing its improvement. In fact, I'd argue that's the only way to truly love anyone or anything.

[1] https://research.swtch.com/generic

[2] https://go.googlesource.com/proposal/+/master/design/15292-g...

What I don't like about the "generics dilemma" framing is that the problem exists regardless of whether your language has generics or not.

Here's what I mean. Say you have a Vector class that can operate on ints or floats. You could make that a generic, in which case the compiler can either (a) duplicate the code for each type (monomorphize) or (b) do dictionary passing and get slower runtime. But if your language doesn't have generics, you have exactly the same problem: you as the programmer must (a) duplicate the code for ints and floats or (b) use an interface and get slower runtime. Not having generics doesn't solve anything. It just means that you, the programmer, have to do things that the compiler would otherwise do for you.

Although, one interesting aspect of the trade-off here is that a programmer who manually monomorphizes only has to do so once(^) -- their effort is reused across multiple compilation runs -- whereas the compiler normally has to monomorphize on each compilation run.

(^) Of course, you then have the burden of keeping multiple monomorphized implementations consistent when you make a change that needs to apply to all of them.

Monomorphization tends to happen "on the fly", and even if it wasn't, it would still be cheaper than duplicating all of the work of parsing and type-checking (which is needed if the user manually monomorphized).

> At the moment, Go has no generics, which translates to "slow programmers", but not "slow compilers".

Also note that one of these assertions is measurable, and the other an unsubstantiated opinion.

That can't be right though. Java and Kotlin have generics, fast programmers, fast compile times and as generics are erased there's no more binary bloat than without them.

I think the assumption implicit in that statement is "generics over value types compiled AOT". But that's not the only way to do it.

But slower than optimal execution times! Since everything is boxed.

For generics over primitives, yes, but the JVM only has a small number of those so they can be easily hand specialised:


The rest of the time you're dealing with references anyway, so there's no boxing.

Well, sure, in the context of a language that doesn't really have value types and is already doing everything w/ dynamic dispatch, you can have "zero cost" generics, in the sense that there's no code bloat and no perf penalty, vs what you already had.

But that's because your language is already leaving some performance on the table!

If you want a language that's as fast as possible, you want something like

    GenericContainer<Foo> foos = ...;
    for(var f in foos) f.DoSomethingFooish()
to be able to transform into a contiguous chunk of "Foo"s in memory, that the loop is traversing, and doing no dynamic dispatch (and potentially inlining!) in each of the "DoSomethingFooish" calls.

I don't think you can have a generics system capable of achieving that level of performance w/o also brining w/ it the downside of more code generation & extended comp time.

(P.S. Also, Java is getting support for user-defined value types, right? How will those interact w/ the generics system?)

Yes, that's a good point. I'd counter though that in C++ and similar languages value types and memory management get conflated in ways that hurt performance. Java has a really, really fast heap and allocations get laid out contiguously by the GC in ways that have a measurable + significant impact on cache hits and performance.

In C++ you see std::vector with large-ish values all the time, even when it doesn't really have any memory layout justification because that way you get semi-automatic memory management and with pointers you don't. This can easily lead to large amounts of pointless code bloat, hurting icache hit rates, compile times, binary sizes and more, even in cold paths where memory layout is the least of your concerns.

Not sure yet how generics in Java and value types will interact. There have been some prototypes of generics specialisation so it'll probably end up like C++ but, hopefully, with way less use of value types - restricted only to places where they make a real improvement. That'll be a lot easier to measure in Java because as long as you stay within the value-allowed subset of features you will be able to convert between value and non-value types transparently without rewriting use sites. So you can just toggle it on and off to explore the tradeoffs between code generation and pointer indirection.

Ada, D, Delphi, Eiffel, .NET Native, Ada all compile quite fast and have generics support.

Is this with gccgo or gc? Compiled code quality is worse for gc, though that is mostly apparent on CPU bound code that most likely won't be written in Go (but might be written in Rust).


Can you see how long incremental builds take (under a minor change)?

it was very slow, almost a minute! but it has a custom build script, that looks like it’s not really made for incremental builds? it looks like there’s also bazel support, which i would guess is much faster, but i couldn’t figure out how to get it to work...

Yeah those build times are insanely good for a 2 million line project. I suspect, however, that the codebase makes judicious use of generics.

A clean dev build for one of my more experimental projects that uses generic heterogenous lists takes five minutes to compile for fewer than two thousand lines of code and a single test case. In that project, however, I'm pushing Rust's type resolution algorithm to its limits (basically as a giant constraint solver looking for a unique solution).

Our ~1M line of C++ codebase take around ~15 mins on a "full" [1] rebuild, and that's with the support of a custom distributed build system. That's bearable considering that running the unit tests takes again as much.

12 mins for a full build of 2M lines of code on a single box is quite reasonable, and it is great if they are pushing for better performance.

[1] because of aggressive caching if is hard to really do a clean build, the closest you can get is touching one of the 'god' headers included everywhere.

Yeah, I'm thinking back to an unnamed game engine that touching any of the core headers incurred a ~1 hour rebuild, ~30 minutes with incredibuild.

Heck, if you've ever had a complex codebase with LTCG, I've seen cases where linking alone took ~25 minutes.

Does it start with Un and end with real Engine?

There will always be applications for doing a fresh complete build. Like, searching for a bug in different code revisions. You're just not doing that if each compile takes 5 minutes.

As a baseline, a non optimizing compiler for a simple language should be able to do 1 million lines of code per second. Of course, most languages are not simple.

Incremental compilation is an essential operation. I probably do it more than 100 times a day. Just like editor responsiveness, it can almost not be fast enough, and if it takes too long it can bring me out of the flow. I would say that over 0.1 seconds any speed improvement is welcome. More than 3 seconds is definitely a nuisance. More than 15 seconds is extremely frustrating when dealing with certain kinds of code.

> You're just not doing that if each compile takes 5 minutes.

I've bisected large codebases for bugs several times where compiles take hours.

Yeah but I bet you wished it took less time.

I'm curious, are there any 1 million LoC projects out there that compile in under a second?

Sure. TCC (the tiny C compiler) runs at over 3 million LoC/s on my machine (on a single core!) and GCC debug builds aren't that far behind, so for C (or very orthodox C++) projects it's doable. I keep my own C++ codebases under an arbitrary target of 10s for fully optimized release builds and over half of that is wasted by the NVCC.

Are there any 1 million LoC codebases compatible with TCC that can be run in under a second with it?

I ask because there's various unexpected things that can make large codebases compile more slowly out in the wild that don't show up in smaller codebases (as well as non-linear scaling of certain components), often making simple extrapolation of how quickly a smaller codebase compiles to a larger one inaccurate.

One million lines is a good if arbitrary point where a lot of those can be sussed out.

I don't know the answer to your question, but with regards to the million LoC/s target it's worth noting that an AMD 3970X can compile the Linux kernel in under 24 seconds [1] (that's an optimized GCC release build), which is around 28 million LoC. Even though I'm sure conditional compilation will cut some of it out, it's close enough to be in the ballpark.

[1] https://www.phoronix.com/scan.php?page=article&item=amd-linu...

I don't have concrete numbers but I keep hearing OCaml's compiler is extremely fast.

Even when bisecting you can still reuse the built dependencies, at least if you're using sccache.

The only relevant info I could dig up was this issue from Kubernetes in 2016: https://github.com/kubernetes/kubernetes/issues/27444

It seems to be discussing the speed of building Kubernetes on their build-servers, targeting multiple architectures at once. Unfortunately all the build servers are internal to Google so I don't know exactly what's being included (or what's being run), but this comment (https://github.com/kubernetes/kubernetes/issues/27444#issuec...) details the breakdown of time spent:

    30m of build
    7m on cluster startup
    12m on tests
    7m on cluster teardown
So, taken with an enormous hunk of rock salt, it seems like Kubernetes might take around 30m to build on a cluster, to build (possibly multiple) release artifacts. Seems in line with Rust.

That's compiling for 10 os/arch combos, so ~3m per.

That makes a lot more sense. Go was specifically designed to compile fast.

On the other hand, these are running on some big cluster with a whole bunch of CPUs, so I wonder if you’d still get 3m per build on something like a laptop?

Any individual build they're referencing there runs on a single vm (the cluster referenced there is the k8s cluster being built up and torn down on that vm to run e2e tests on). It looks like at the time they were using GCP standard-8 instances, which are 4 core/8 thread w/30GB of ram. It's hard to say exactly what the current build time is, as it looks like they do some build caching now; on my desktop a fully clean build of current kubernetes master (3.2 million loc) for linux/amd64 takes just under 2 minutes.

What’s the state of parallel builds? We’re getting 16/32 core cpus. In a few years, we’re all going have Threadrippers.

Of course everything can’t be done in parallel.

Almost all parts of the compiler (at least for LLVM) are still strictly single-threaded. You can't even use threads yourself if you write your own LLVM module pass.

Some of the parts of a compiler are inherently serial. e.g. parsing a file. But if you have to parse multiple files, then hey presto, parallelism. The linker has to work as a spooler for writing the resulting executable or shared objects, but afaik, there's no reason for symbol resolution to not be done in parallel. And if you want to optimize your GOT then that could also be sorted by a page rank style optimization which is the algorithm which was the foundation of the phrase 'embarrassingly parallel'.

Given that most sweet shop ITs provide i7 and i5 with 8GB and HDD, good luck getting that hardware adopted there.


If we keep complaining about Rust build times being "slow" they'll keep making it better!

Yeah, I have a C++ project a fraction of that size that takes longer to compile. I'm quite happy with Rust :)

Glad a core contributor has such high standards though.

Is it template heavy? One thing I've noticed code switching to Rust from C++ is that I can write generic heavy code without seeing massive delays in compile time. Conversely in C++ I'd have to create source files with explicit instantiation to get incremental builds of my template usage which is a pain.

i love scala. it would not only be longer but you’d have to compile it 3 times for each major version!

So you're not cross compiling your code for Scala JS (0.6/1) and Scala Native as well meaning 6 full compile cycles ?

Shame on you.

Linux kernel takes ~60 seconds on TR hardware according to phoronix, SLoC is what ~15-18 million

A clean build of the Linux kernel can be very fast, but the config is very relevant. A custom build specific to your hardware will touch only a small fraction of the overall kernel tree, whereas a generic build including a large swath of hardware will take much longer.

(AFAICT there's no easy way to measure the LoC actually compiled - but one rough way to estimate it would be to take all the .o files listed in the build process and count the LoC in the corresponding .c files and the .h files they depend on).

Easy way is to grab the debuginfo files, run it through addr2line and count.

> AFAICT there's no easy way to measure the LoC actually compiled

I take it you can't just run it through the preprocessor then count the lines of code that are output from that?

The kernel is composed of a bunch of independent source files which share headers; thus, running them through a preprocessor would massively overcount the LoC. (In theory, this would get you the “actual compiled LoC” but that’s a useless metric because most header files are just declarations that are easy to parse and “compile”).

Most of those speed compile benchmarks are done with "make tinyconfig" -- where a small fraction of the kernel is being built (a few million SLoC at most). A kernel build using "make allyesconfig" takes significantly longer than a minute.

Eh, I was wrong on Epyc a make defconfig && make takes _16_ seconds but I don't really follow what is actually in defconfig so it might be bare bones.


Compare to swift it does look pretty reasonable.

It's interesting seeing these kinds of compilation times being "long" from a machine learning perspective.

The equivalent to compiling code in machine learning is training a model. Even on good hardware you can spend hours training a single model. Some of the really big pre-trained models like BERT can take days being trained on a farm of top-of-the-line purpose-built GPUs, which is why people almost never re-train them from scratch without similarly huge amounts computing power and very specific needs.

The equivalent to compiling in machine learning is compiling :). You just ALSO have to train your model.

As far as I can tell most of the people complaining about Rust having painfully long compile times are just parroting second hand information and don't actually know. Kinda like the pervasive "but isn't Java really slow?" thing that still doesn't seem to have died either.

Edit: yes I know this article was written by someone who knows what they're talking about (and as they're a steward of the project I understand why they're calling this out for improvement, and somewhat exaggerating how bad it is). I was talking about the occasional comments I see on hacker news about how people aren't even bothering to try rust because compile times are "unusably" bad. 5 mins for a dev build of 2 million lines of code is not anywhere near reason to not even bother trying the language. No personal project or learning scratchpad is ever going to come close to 2 million lines. And it's not worse than many other languages that are used for large projects.

I have a few beefs with this article, but to suggest Brian doesn’t know much about Rust is just plain incorrect. He is literally the #2 all time (human) contributor to the project: https://thanks.rust-lang.org/rust/all-time/

We do have experience, my C++ projects compile much faster than their rewrite in Rust.

On common laptops that you buy at the shopping mall, not compiler rigs.

I am not doing nothing special, other than all my third party artifacts are binary dependencies, not overusing templates, incremental compilation and incremental linking.

Cargo currently doesn't do binary dependencies, so already here I have to wait for the world to compile.

And while incremental compiling is already supported, lld still isn't supported on stable.

Plus I do have experience using other AOT compiled languages with modules support, since back in the day.

So we know what are talking about, and plenty of people on Rust team are quite aware of the problem, however not every can be fixed at the same time.

> We do have experience, my C++ projects compile much faster than their rewrite in Rust

Wow. And C++ projects can be quite slow to compile

They can, but you have to compile everything from scratch, not use binary dependencies, disable pre-compiled headers, not having an incremental compiler and linker available, and be an heavy meta-programming user.

> As far as I can tell most of the people complaining about Rust having painfully long compile times are just parroting second hand information and don't actually know.

My experience is the opposite. Most rust devs don't work on very large projects from what I can tell, so compile times tend to stay manageable. It's a small percentage that work on projects large enough to hit this problem, and when you hit it it hurts.

I think that it depends on where you are coming from. For Go, C#, Java devs and for those coming from dynamically typed languages, the compilation times will be unbearably slow. C++, scala, swift developers are probably used to slow compilation times.

C++ can be relatively fast when making use of binary dependencies and having a modular build.

This is the one place where Rust differs from many other languages: Dependencies and libraries are typically brought in a source-dependencies and built with your own code.

This typically makes the first build of any project considerably slower than for other languages, since you have to also build all your dependencies, but I think it allows for a really simple and predictable way to work, not to mention support any kind of architecture, across all packages.

It's obviously a trade-off, but I think it's a trade-off which is clearly worth it.

A trade off that requires buying a compiler rig, it is almost unusable to compile Rust stuff from scratch on my travel netbook.

If I plan to do some Rust coding on the go (no pun intended), better do a full build at home before packing.

A 5m C++ build turns into 30m Rust one, for the same project, ported across languages.

Not to mention that it means on a large team project everyone one is compiling the same stuff over and over again, given that cargo does not yet support code cache servers.

Yes there are some workarounds like sccache, but they are extra stuff one needs to install.

> Not to mention that it means on a large team project everyone one is compiling the same stuff over and over again,

That's a bit dramatical. You typically only compile your dependencies once during the project lifetime.

If you on your netbook have done a initial full build, you'll only get incremental builds from there on, just like with C++.

But unlike C++, dependency management isn't hell, and you are almost always guaranteed to have your dependency supported on the target you are building, OOB, without any fiddling or setup.

It's a trade-off, but I definitely think the Rust-team made the right decision here.

Dependency management isn't an issue when using OS packages, and Conan also does binary dependencies.

Compiling from source also does not solve the problem that a crate author hasn't taken a specific OS into account.

Yes I am being a bit dramatic, but this kind of issues do impact adoption, and there are many industries where shipping source is just not an option.

> Dependency management isn't an issue when using OS packages

Come on. What do you find most developer friendly?

1. "cargo build && done"

or 2. Try to uncover what dependencies this project really has, and then proceed to map out what the packages (and dev-packages) for those dependencies are called on the linux distro you are using (debian, ubuntu, fedora, arch, etc), not to mention what they are called on your specific version of that distro, root up, install all that stuff.... and then try ./configure yet again?

> but this kind of issues do impact adoption

Indeed. In 2020 I wouldn't bother adopting any kind of language which prefers the latter flow to the former one.

"cargo build && done" only works if every crate author has taken my OS into consideration and not used OS specific API or file locations.

And it is more like "cargo build && off to lunch".

I am usually on Windows, and most commercial vendors nicely sell us their already compiled binaries. No need to hunt for anything.

Regarding Linux distros, if it isn't on the official repositories, Conan provides exactly the same experience as cargo, only faster because it supports binary libraries.

>Regarding Linux distros, if it isn't on the official repositories, Conan provides exactly the same experience as cargo, only faster because it supports binary libraries.

Don't use the operating system binaries unless you are packaging your application for use by that same repo. Instead, use Conan or you will find yourself in dependency hell trying to get users running it on Fedora, Ubuntu 12.04, 14.04, 16.04, 18.04, 20.04, Debian, etc where they all use different versions of the dependencies you want.

The official repositories are for the sysadmin and for other packages in the official repositories.

> But unlike C++, dependency management isn't hell

There is nothing hell into C++ if you are using the right tools: meaning proper package manager, meaning Nix/Guix, Conan or Spack.

It is currently much more hell to integrate Rust+Cargo behind other build system ( behind pip/gem/npm for instance ) than it is with C++.

I think you wanted to reply to the upper comment. :)

And how many C/C++ projects use Conan by default, versus those who don't?

For comparison all rust projects uses cargo.

The ergonomic value of that is something you simply can not overstate.

For many users of AOT compiled languages, the ergonomic factor is being able to enjoy fast builds, without getting a compiler rig.

Turbo Pascal 7, which was already relatively complex, was already compiling several thousand lines per minute on a single core 4Mhz computer bounded to 640KB.

More modern examples with complexer type systems, are languages like Delphi, FreePascal, Ada/SPARK, D, F# via .NET AOT, OCaml.

So yeah cargo is nice, but not if I have to build everything from scratch, Rust isn't a scripting language.

>So yeah cargo is nice, but not if I have to build everything from scratch, Rust isn't a scripting language.

FWIW, Rust builds faster than node applications as my laptop can only handle so many iops.

And Rust compiles are in line with Ocaml (probably faster by now due to the optimization work over the past year)

I don't necessarily disagree, but this was written by one of the original authors of the Rust compiler, so I think he knows what he's talking about.

Well if you read the article you would have known the author is one of the co-founders of Rust.

I don't know how the Rust compiler is built. However, I am implementing an Ownership/Borrowing system for the D programming language, and to make it work requires Data Flow Analysis. DFA is slow. It's normally only used for optimized builds, which is why optimized builds are slow.

But when DFA is a required semantic feature of the language, it has to be done for debug builds, too, and so they're going to be slow.

IIRC Microsoft released a paper a few years back that pioneered some techniques which significantly improved DFA efficiency.. And TypeScript utilized those in their control flow analysis implementation. Maybe I'm imagining all that.

Can you elaborate on why data flow analysis is slow, necessarily? For example, does it need whole-program information, or does it do some sort of fixed point, or is the algorithmic complexity super-linear, or something else?

It tends to be quadratic, based on the number of variables and the cyclomatic complexity. The DFA equations cannot be solved directly, but only iteratively until a solution is reached.

Debug code generation, on the other hand, involves none of that and the compiler just blasts out code expression by expression. The time goes up linearly with the number of expressions.

DFA (and the O/B DFA) is done at the function level. It can be done globally, but I try to keep the compile times reasonable.

Would this help? <https://dl.acm.org/doi/10.1145/3302516.3307352> GPU-accelerated fixpoint algorithms for faster compiler analyses.

Brute force but maybe that is what is needed.

Can you cache the solution from the last compile and check that it still works in linear time without introducing nondeterminism?

Many implementors try to cache the results of DFA and patch it incrementally. Anecdotal evidence is that they spend endless hours dealing with weird optimization bugs because they patched it wrong.

I decided to redo the DFA from scratch anytime the AST changed, and have had pretty reliable optimization as a result.

Is an expensive DFA something that could be mitigated with source code hints? An 80% solution might do the trick.

People using DFA tend to want a 100% solution. Rust, for example, sets a store on a 100% solution.

I feel like the breakdown in problems should include real world data to back it up. Hopefully future articles will do so. In particular, people like to put blame on the borrow checker when I thought it wasn't all that bad. From my understanding, the worst offenders are LLVM (because rustc is using it in a way that they haven't optimized for) and linking (which there are experiments with using lld which people report huge gains with.

Besides the tone, my main gripe with the article and some discussions I've seen elsewhere is mixing implementation trade offs with design trade offs. For example, LLVM and not doing your own optimization passes can be important for time-to-market. The only reasonable alternative that I can think of without sacrificing time-to-market is between LLVM or a C backend. Delaying Rust would have made it irrelevant.

Now for some context for those not as familiar with Rust:

> Stack unwinding — stack unwinding after unrecoverable exceptions traverses the callstack backwards and runs cleanup code. It requires lots of compile-time book-keeping and code generation.

This is for asserts (panics) and can be toggled with a flag. It isn't inherent to the language though some older code uses it extensively (like rustc) because it predates the current language design (from what I've read).

> Tests next to code — Rust encourages tests to reside in the same codebase as the code they are testing. With Rust's compilation model, this requires compiling and linking that code twice, which is expensive, particularly for large crates.

This is in a list of negative impacts of features but people outside of Rust reading this might not catch the why this is done. Tests inline to your code have access to your private interfaces. External tests are for integration testing and build against your library like anyone else would.

The worst offenders vary a lot by particular codebase. I had a crate where the compile times were bad due to trait bounds. I was able to refactor to drop times from ~2 minutes to ~6seconds! That should'nt have been necessary for me to refactor though.

Sometimes its llvm, sometimes its rust macros encouraging HUGE functions which then leads to slow llvm optimizations. Sometimes it's large use of generics. Sometimes it is the linker. There are various ways to hit slow compile times.

How did you figure out what was responsible for the slowness?

If it is even remotely like C++, most of the time will be spent on template instantiation, for template heavy projects, and linking. Linking is even slower for template heavy code because it tends to generate a lot of extremely long symbols to the point that release builds are often faster to build because they get rid a lot of functions.

Rust does not have C++-style templates.

Rust generics are sanely statically typed instead of duck typed, so parsing should be cheaper, but they can still cause duplicate codegen that the optimizer then needs to sift through and deduplicate, optimize, etc.

Sorry, I didn’t mean to imply that Rust’s generics don’t slow down compilation at all. They surely do, as does any form of code generation.

My bad for not being more clear — I just wanted to point out that Rust doesn’t have anything like the full, accidentally Turing complete template metalanguage that C++ has.

> My bad for not being more clear — I just wanted to point out that Rust doesn’t have anything like the full, accidentally Turing complete template metalanguage that C++ has.

That's irrelevant. Being Turing complete does not mean being slow. Having template with duck typing does not imply being slow. That's completely irrelevant and pretty fanboy.

The main reason C++ template are slow to compile is that they have to be header-based..... Meaning parsed again and again and again in every translation unit.

Which is by itself pretty insane, and when you realize that, you realize that C++ compilers are in fact pretty fast compare to the job they do.

Ideally c++ modules might solve that on the long term.

> Having template with duck typing does not imply being slow

Right, the original Concepts designs in C++0x had type checking of template definitions. It had a huge negative impact on compilation time and was one reason why it was dropped.

Mind you, had it been accepted it is possible it might have been optimizable, and certainly retrofitting it didn't help, but yes, you are right, not checking is not going to slow things down.

They are not parsed again and again when using pre-compiled headers.

But then touching one of those headers basically triggers a full rebuild.

Which, given the fact that a modular build makes proper use of binary dependencies across modules, has very little impact on what is being built, with the incremental compiler and linker giving an helping hand.

Really, package each module on their own lib/dll, no need to compile everything from scratch other than self flagellation.

A full build only becomes necessary when a new version of a low level module gets released into the library staging area.

> I just wanted to point out that Rust doesn’t have anything like the full, accidentally Turing complete template metalanguage that C++ has.

Rust's generics are pretty complete/full/equivalent to C++ templates. The type system is already turing complete, and even if we assume that generics alone aren't turing complete on their own on stable (I suspect this is a bad assumption!), they soon will be thanks to features like const_if_match, const generics, etc. Even ignoring those features - check out the typenum crate for some of the nonsense that can be done with stable rust already.

I did get your fundamental point, but the counterpoint I'm trying to make here is that the differences are perhaps fewer than one might hope, and every single concern gpderetta raised about C++ templates can - at least theoretically - apply to Rust generics as well. While Rust generics might not be abused as much as C++ templates in practice - so far, anyways - that's mostly because Rust macros are more way more powerful than C++'s, very turing complete, and are abused for complicated stuff instead.

At the point, all that matters is if templates are monomorphized or not. Other things don't affect much. Your previous statement just counts like fanboy-ism.

It does type checking of definitions. Apart form that it does monomorphization like C++ and its generic type system is also turing complete (and it is only going to be more complex). I don't know what the state of generic metaprogramming is in rust, I guess that having a powerful macro system helps keeping it under control.

Compilation time is mostly an issue during the edit, compile, debug cycle and inside IDEs where things are compiled on almost every key stroke.

I was reading about the rust analyzer recently, which is a new language server for Rust that is explicitly designed to address compilation latency in IDEs. This also happened with Java back in the day when IBM developed their own incremental java compiler for use inside of Eclipse (technically this predates their IDE; I was using an early version in 1998). It gives that IDE an edge over things like Intellij in terms of compile latency, which is orders of magnitudes slower in intellij (measured in seconds instead of ms.). Intellij does a lot of work to hide the issues through elaborate caching, lots of things happening asynchronously, etc. They even attempted to integrate the eclipse compiler. But it's very noticable if you are used to fast feedback on your code correctness.

Another important aspect that they are trying to address in the Rust Analyzer that the Eclipse Java compiler also addresses is handling partially correct code. If you are editing, the end state is of course correctly compiling code. But in between edits when it doesn't compile is exactly when you need your IDE to be helpful. Eclipse used to be really good at this and update in real time on basically every key-press. The red squiggly lines disappearing basically means "now your code is compiling fine". Intellij works a lot slower and worse, ends up actively lying quite often with both false positives and negatives being very common (the dreaded reset caches feature is a top level menu item for this reason).

So, it's an important problem. Rust is optimized for run time safety and performance. The same infrastructure that enables that should in principle also be able to enable a great developer experience when it comes to IDE friendly features.

[edit] This article what I'm referring to above: https://www.infoq.com/news/2020/01/rust-analyser-ide-support... (interview with one of the Rust Analyzer devs)

The great thing about Rust is that when you switch to Go, the development experience feels incredibly fast and great.

Autocompletion in VSCode works, instantaneously, all the time. Errors are shown in real time. Compilation is fast. Error messages are human readable. Updating Go or dependencies doesn't break existing code.

If the code is modularized and put in libraries, how often do you need to build all 2 million lines of code after making a change. What am I missing?

He does say:

>Per-compilation-unit code-generation — rustc generates machine code each time it compiles a crate, but it doesn't need to — with most Rust projects being statically linked, the machine code isn't needed until the final link step. There may be efficiencies to be achieved by completely separating analysis and code generation.

It was my first thought as well but I have no knowledge how Rust linking works so I don't know how reasonable caching module compilation units would be.

Maybe a slow linker is the problem

The system linked is pretty slow; many folks report massive speed ups by using LLD

Replacing BFD with Gold (ELF-only) or LLD can be a huge win.

On the last major production codebase I did this for, linking a single artifacts went from ~60 seconds (BFD) to ~10 (Gold). Multiply by nearly 100 artifacts (several dozen libraries, several dozen tools, per-library unit tests, benchmarks, applications, etc. all statically linking some common code), and some basic incremental "I touched a single .cpp file" builds went from 30+ minutes to maybe 5 minutes for a single platform x config combination.

This depends on where you are in the stack. The perspective of someone working on a low-level, widely used library is different from someone working on a top-level application.

> TiKV is a relatively large Rust codebase, with 2 million lines of Rust.

I don't understand why so many absolutely wants to inflate this number, as it doesn't mean anything about the product.

This inflation also leads to making part of the analysis of this article just wrong.

Running scc in TIKV's "src" folder:

Lines Blanks Comments Code

78442 7259 5476 65707

The article says the 2 million includes vendored code. I assume that means all the dependencies and their dependencies. (So yes, the project itself is just a small fraction of the total lines, but I'm not sure if the measurement here was of compiling the dependencies too, or not.)

Walter Bright has said on several occasions that compilation speed was one of his design goals for D.

Recently on the "On the Metal" podcast (which is fantastic) Johnathon Blow had a mini rant on compilation speeds and mentioned that the Jai compiler is incredibly fast.

Jai apparently — I think it's mentioned in that podcast — has a dedicated backend for debug builds that's optimized for fast code generation and doesn't use LLVM. Sounds like what Cranelift is trying to do for Rust.

This seems to be the move for languages moving forward.

That is already the approach in Lisp, Scheme, Java, .NET, mainframe language environments, OCaml, Haskell, Eiffel, Dart, D and plenty others.

By having a mix of interpreters, repls, JIT and AOT compilers, one can mix and match, using the fastest ones for development and the slow ones for the final release build.

If I'm reaching for Rust it's more than likely because I want it's borrow checker and pattern matching on enumerated types.

To my knowledge D has neither of those so it's not really in the running here.

A borrow checker is currently in development for d, and pattern matching has been pseudo-implemented with metaprogramming.


It was a personal statement, that's not gatekeeping. A more detached way to make the same point was that D didn't succeed (in the sense of market share) because it didn't reach "far enough" beyond what could already be done with C and C++. It brought real features to the table, but not "killer" ones. Rust did, which is why it's everyone's favorite new toy and why it's been largely (but not entirely, c.f. the linked article) forgiven for flaws like slow build times.

One of the main reason I got into Ocaml (coming from C++) was that it had bytecode compilation (and obviously a bytecode interpreter) next to the native one. This is really fast (5-7 times faster last time I measured) and perfect during development, when you don't really care about performance.

I wanted to use OCaml several types but the lack of parallelism and the GIL have put me off.

Still waiting for Multicore OCaml.

It supports parallelism the same way UNIX clones did for their first two decades, which is quite good for plenty of workloads, and then there is Lwt for concurrency.

And it is safer. :)

In any, case multicore runtime is getting closer, https://discuss.ocaml.org/t/multicore-ocaml-january-2020-upd...

If we have to mention inter-process communication where every process is single-threaded then IMO the battle has been lost.

Best IPC is no IPC, wouldn't you agree? In-process function calls can't be beaten.

Give me a mix of OCaml and Rust with the runtime of Erlang/Elixir and I ain't learning another programming language until the day I die!

Thank you for the link, I found it really interesting.

In the times of Spectre, Meltdown and in-process exploits due to threading issues, everyone doing microservices, I don't see the inter-process communication route as that bad.

Sure, pretty good point actually.

I'm just saying that in these same times many problems fall into the category of embarrassingly parallel and there's no reason to wait 4s for a result that can easily take 0.5s.

But you're also correct.

yes. if you need parallelism, you need to go via the process model. (we did this, but it's quite painful)

Anyway, the official road map states something along the lines of "Multicore: probably next release" (but that was also said for previous releases)

Sadly all you said has been confirmed by other people as well.

I really like OCaml. It's mind-bogglingly fast and well-made in basically almost every regard I can think of. The lack of proper parallelism nowadays however is a huge NOPE.

I know JaneStreet and Inria have a lot of valid usages for it and don't care what the rest of us think but it's very sad to have one of of the highest quality languages and compilers be left to fringe usage of several organisations only. :(

Could the slow compile times be also due to people using laptops with slow CPUs and drives compared to high-end desktop CPUs and NMVes? We're using desktop PCs at my work and it made a huge difference in our dev flow (C#, VS) compared to the Thinkpads we used before.

What I'm more interested in their link times; how often do you build everything from scratch in a day?

You have a monorepo with 2 million lines of code? How long is the compilation supposed to take?

I don't know how long it is "supposed" to take, but you can compile a Linux kernel (27.8M lines of code, though a good chunk of it probably isn't going to be compiled anyway because you don't need it, and another good chunk is architecture specific) in under 10 minutes on relatively modest (but modern) hardware.

On the other hand, something like Chromium (25M lines of code) will take about 8 hours, and bring your machine to its knees as it consumes ALL available resources (granted, last I did this I only had 8GB of RAM, and I was running my desktop at the time... including Chromium). I don't remember exactly how long Firefox takes to build, but I remember it was significantly less time (maybe 3 hours?).

So... it depends? On a lot of things?

(btw, LoC numbers were pulled from the first legitimate looking result I could find on a quick search... take with a grain of salt... also, compilation times are a rough approximation based on my observations... that it with a truckload of salt)

Linking can consume huge amount of memory, especially for C++ code. For a large project 8GB might be very low. On our codebase we saw huge differences between 16GB machines and 24GB machines as the 16GB could not run some linking steps in parallel without swapping.

Does Chromium build use LTO (I'm pretty sure FF does)? That's also a huge resource sink a doesn't parallelize as well (a lot of the optimization will be delayed to linking)

Last time I checked (admittedly a few years ago), a full build of Firefox on my beefy laptop was ~2h while working on something else and Chromium was 10h+ bringing the system to its knees.

The 2 million figure includes vendored code. A quick count suggests the repo itself has less than 200K, https://codetabs.com/count-loc/count-loc-online.html

> When I worked daily on the Rust compiler, it was common for me to have at least three copies of the repository on the computer, hacking on one while all the others were building and testing. I would start building workspace 1, switch terminals, remember what's going on over here in workspace 2, hack on that for a while, start building in workspace 2, switch terminals, etc. Little flow, constant context switching.

And you didn't see the problem!?

When Prof. Wirth was making the Oberon compiler he had a heuristic that any language feature which made the compiler slower at compiling itself was reworked or discarded.

Of all the things mentioned in the article that keep Rust compiles from being fast, the one that sticks out to me is the multi-threaded compilation. That could result in some dramatic reductions in compile time.

With a lot of the other things that are a part of the design of the Rust language itself, I'm happy with how they are, even if that slows down complication somewhat.

> Servo is a web browser built in Rust, and Rust was created with the explicit purpose of building Servo.

I didn't know this. Is Servo the best example of development/coding standard/features/implementation for Rust?

I'm glad the comments mostly get it: only incremental build granularity and parallelism matter. Fully rebuilds I wholely don't care about, and neither should anyone else: the goal is to never need to do them.

Is it slow compared to C++?

No. My experience is that Rust compile times are pretty reasonable compared to typical C++ projects (and it's often an apples-to-oranges comparison: an initial Cargo build does a lot more, building all your dependencies in the configuration that you want, rather than using prebuilt binary dependencies as is common in the C/C++ world). To be honest, I don't really understand the self-flagellation in the Rust community about compile times.

> I don't really understand the self-flagellation in the Rust community about compile times.

I believe the two main factor are the fact that:

- many Rust users come from languages with a faster writing->running cycle (there is a surprising number of Python users starting to use Rust)

- Rust takes prides in its speed, if something is slow it is thus seen as a failure that should be fixed (even if it has perfectly reasonable reasons to be slow)


C++ is horribly slow if you overuse templates, and not take advantage of binary dependencies.

If every module compiles to its own library (not object file), got proper translation units, and a fast linker, compiling the stuff you're currently working on is relatively fast.


Lots of companies have standardized on laptops as development environment since the mid-2000's, their IT department is not going to change back to desktops, specially when the laptops are working just fine for their existing tools.

If adopting Rust means buying new hardware, then they will just keeping using their existing options regarding programming languages.

You might not like it, but lots of developers use laptops as their primary computer. Performance is relative anyway.


Here's a disagreement: you're right that it would be nice if benchmarks included desktops, but I continue to want to do "serious work" on my (expensive, underpowered relative to a desktop) laptop.


Answering your question in good faith, though it seems obvious to me: people prefer laptops because they are portable. You can work from home, or the train, or another city, or a meeting room.

By the way, it is possible to connect a laptop to whatever monitors and keyboard you want

> why on Earth people would prefer laptops over large dual monitors, full sized keyboards, ... and so much better performance and cost effectiveness.

I think I'd struggle to bring that around meeting rooms all day. I have a monitor and keyboard to plug in on my desk, of course...

The reason is obviously portability. People like the ability to take their entire development workstation wherever they please, or perhaps to even sit on a sofa or bed.

I’m not sure why you’d need that explained.

For me it's because it's what the business mandates. I have a docking station so I still have two big monitors (three if I want to keep the laptop open) and a mechanical keyboard, but the business wants me to use a laptop because it can be locked away after work, it can be brought home for home working or on call, and it's part of the disaster recovery plan should the office become unusable.

Because a laptop is possible to carry with you when you go places, and a desktop is not.

Laptops can be carried to meetings, so you always have your whole setup with you, and they make it easier to do work someplace other than your desk.

> I'm not trying to excuse slow builds, but I'm frustrated that people basically always do benchmarks on laptops. Like, come on, if you're doing serious work, use a serious computer;

This reads as a dismissal. That you can only be a serious developer if you have a permanent setup where you sit down, and don't move from it.

Many devs are not just sitting at a desk or cubical writing code.

They may need to interact with clients, move between locations, even across cities. They may even be in the horrifying hot desk situation.

In all of those, working with a laptop makes sense.

They should be provided with SSH access to a proper build machine then.

> if you're doing serious work, use a serious computer

A dual core laptop with 2GB of RAM is a very serious computer, it's a supercomputer compared to what I had 20 years ago for doing essentially the same tasks.

Slow software is not the machines fault.

If you read the article, one of the issues holding back performance is the poor parallelism of the compiler.

How does a single laptop core compare to a single core of the fastest workstation money can buy? It's the same silicon with a touch higher power budget. Unless you time it, the difference is imperceptible.

It is parallel at the crate level. So dependencies are compiled concurently with each other.

A large rust project should be split into multiple crates. It makes the code much cleaner, flexible and compiles faster.

A stronger machine absolutely makes a difference. Single thread performance of a desktop compared to a laptop also does.

Which is something that doesn't help adoption if buying new hardware is a requirement to have a nice developer experience in Rust, while other AOT systems languages work perfectly well on the same hardware.

This is true, but how often are you recompiling 4+ crates simultaneously?

What desktop are you comparing to what laptop? The top end i9 has eight cores, and a max turbo just 100mhz shy of the top end desktop part. The turbo is less aggressive on the laptop, but there's just not that much difference between them for low thread counts.


Upgrading from a true quad core laptop to a true 12 core zen2 desktop sped up my rust builds by a factor of three. Just some anecdata.

That's good to know.

I wasn't really intending to get in to an argument about whether rustc threads well, because although I've used Rust a fair bit I haven't built any large projects in it. If people have and it uses all their cores, I totally believe them.

What I was trying to do is point out that the correct response to "Rust doesn't thread well" is not "but have you run it on a workstation?". Laptop chips for the first time are extremely competitive for single threaded workloads, because the individual cores in large chips now have equivalent power budgets, you just get a bunch more of them.

"Actually, it does thread well" is totally fine as a response, and I'm not going to claim otherwise.

Way lower L2 cache however. That is quite important.

Compiling multiple crates at the same time is very common in my project.

The workstation chips have tons more L2, but the change was made at the same time that they redesigned the L3 to be distributed over a mesh network, significantly reducing its own performance. The boosted L2 compensated to keep the total performance of the cache hierarchy roughly constant while unlocking the future scalability of the mesh network.

There will definitely be workloads that benefit from the new cache hierarchy, but there are also those that suffer.

If you've measured it on a high end laptop for comparison then, of course, the proof is in the pudding and I'm not going to argue (or if you see >~10 threads in use that's proof enough).

I did read the article thank you very much, which is why I said that performance measurements without specifying the hardware is useless.

I think the 5 GHz i9 with fast desktop memory (hardly a big investment for a company paying programmer salaries) would absolutely demolish the average laptop CPU with slower memory and bus speed. Yes, even in single thread; probably factor of 2+.

Moreover, even though parallelism is limited in this case, it surely will use more than one thread, and in other cases it's less limited and you still have this real issue to confront: why pay more for a slower system? Seriously, what are the overriding considerations?

Why are you comparing a top of the line workstation with an average laptop? They're different markets.

The fastest laptop chips are extremely fast at executing lightly threaded programs. I'd own a workstation if it sped up my tools. But it won't.

> Why are you comparing a top of the line workstation with an average laptop?

Because it's something you can absolutely do to improve your computing life with a simple, rational calculation, which I will outline now for expensive (relative to US hardware prices) Germany.

PC dev box parts, prices sourced from geizhals.de:

* Ryzen 9 3900x, 490 euros: https://geizhals.de/amd-ryzen-9-3900x-100-100000023box-a2064...

* Top end X570 mobo, 200 euros: https://geizhals.de/gigabyte-x570-aorus-elite-a2078208.html

* 32 GB DDR4-3200, 144 euros: https://geizhals.de/g-skill-ripjaws-v-schwarz-dimm-kit-16gb-...

It's missing other stuff, but those can be quite cheap and don't need upgrading as often, and you maybe don't need the fanciest motherboard. (Going with Intel at this point is a complicated topic due to security issues and mitigation performance hits, to be balanced against having the most obscenely fast performance possible for a bit more money.)

Anyway, let's call it ~2k euros for a truly badass 12c24t 32GB memory dev box.

Compare this to, say, the common and much-loved Macbook Pro: that's 1.5k to 3.2k euros (just checked). The former for a super weak computer, the latter is 2.3 GHz 8c(16t?), 16 GB 2666 MHz memory. That's really weak computing power for the amount you're spending (yes, I understand you're not buying pure performance, but that's what's being compared here), and you don't even really "want to do better" with higher clocks etc because, speaking in purely physical terms, you need more volume and cooling to get the higher throughput. Laptops also start to throttle way sooner than computers with big coolers (that needn't cost much- I'm using a 15 euro cooler on my i7 8700k at 4.7 GHz which is passive without significant load).

So that's it really; for about the price of a weak Macbook Pro, you could have an ultra powerful dev or build box. You can get some really amazing screens for little money these days, and re-use it between upgrades, along with several other components; how much of your laptop do you usually re-use when upgrading?

I've now spent a truly excessive amount of time justifying my point of view, in the hope that I don't sound completely delusional. 1.5-2k euros once every few years for your developers to have peak dev performance is IMO an easy sell; if your laptop is a thin client you can have it additionally and it won't need upgrading for ages, because it doesn't need to do any heavy lifting.

> They're different markets.

False dichotomy, they can be (and usually are) additive, because you can rdesktop and use remote CI. What I'm questioning is the relevance of laptops as primary build computers.

I have a ~200K line Rust project split into about 50 crates without about 700 dependent crates. It is not much faster to build on an 18-core AWS machine (c5d.9xl) than on my quad-core laptop.

Most likely due to I/o limitations.

I kinda doubt it. That machine has 72GB of RAM and everything can fit in cache. Also, building a very much larger C++ project (e.g. Firefox) can saturate all the cores no problem.

That's one of the biggest problems with rust. In many cases, the dependency graph is such that not a lot of crates can be compiled at the same time. Compound with the fact that building some crates can takes minutes, and overall build times can be terrible.

Just look at the CPU usage on a >= 16 cores machine while building the rust compiler itself. The only time all cores are used, essentially, is while building LLVM.

Laptops are also bad for benchmarks because there's a lot of state that has to be communicated to reproduce the conditions: plugged in or unplugged, power saving settings, and as you mentioned the risk of overheating. But none of that seems relevant here because the ratio of build times should be similar on any modern hardware.

> This "mobility über alles" weighting truly baffles me in the face of how much better and cost efficient PCs are.

I absolutely disagree. The mental overhead and time it takes to synchronize my work over multiple machines stands in no comparison to the few thousand dollars I'd save every few years by having desktop computers everywhere. If compilation is slow, use a distributed compiler infrastructure (if your language supports this, mostly writing C++ here).

If you are doing serious work, do it in the cloud. If you have an actually big codebase to compile, you need to scale that horizontally anyway and an expensive workstation under your desk will take hours anyway.

> If you are doing serious work, do it in the cloud.

In some sense I think we agree, except the cloud is just someone else's powerful computer. Laptop as thin client via rdesktop is how I developed (also C++) for years, but only when I was at home away from my work computer. With the cloud you're always away from your work computer.

> If you have an actually big codebase to compile, you need to scale that horizontally anyway and an expensive workstation under your desk will take hours anyway.

Agreed, and that's further along the same vector: if you have performance issues with compilation, consider also (not only, before someone jumps on me) using bigger machine(s) for the job.

Anyway, I'm going to stop here. Feels like I found the most hated tech opinion ever... at least eventually there were some thought-out replies (and thank you for yours).

Thanks, it seems we agree and I completely misunderstood the intent of your original post (still didn't downvote because it seemed like a helpful contribution to the discussion).

If you applied a liberal amount of benefit of the doubt you might assume that they are compiling on the fastest available hardware because they are already aware that developer time is more expensive than hardware.

Speaking of mobility, a 9980HK in a laptop is basically the same as a 9900KS for a lightly-threaded workload like (apparently) rustc. You might be able to reduce compile time by increasing fan speed, though.

I'm on your side pal! Laptops are a total waste of money, Ryzen has changed the game.


It is reasonable to spend very long times for particularly optimized build (via LTO for example). But it is also important to make dev builds as fast as possible.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact