Hacker News new | past | comments | ask | show | jobs | submit login
Rust is actually portable (ahgamut.github.io)
355 points by ahgamut on July 28, 2022 | hide | past | favorite | 103 comments



> I’d like a bit more flexibility in specifying what I want cargo to do.

Check out Bazel for Rust.

It allows:

* caching of artifacts.

* shareable caches between {developers, build jobs} based on hashes.

* remote distributed builds (on very many cores).

https://github.com/google/cargo-raze


I really really want to like Bazel but I’m having issues with having the amount of bookkeeping I need to do in Bazel build files. And then someone comes along saying something along the lines of “oh but we just use <somecli>” to do the updating of Bazel for us… sometimes that is even internal tooling.

Something else is that most projects tend to build everything from source, even protobuf dependencies, so it takes me an hour to get the initial build of envoy done.


I don’t know what it is these days, but I can remember waiting 12 hours to compile Chromium from scratch while the entire x-windows + KDE and friends taking only 4-5 hours to compile from scratch. This was back in 2014 on a little two core laptop.


Back in Gentoo days, I remember that the largest behemoth when it came to compile times was OpenOffice.


In my gentoo days, chromium would take longer to compile then the rest of @world combined.


On ThinkPad X1 laptop with 8 hyperthreads it takes 6-7 hours to compile Chromium from sources these days.


parallelising the build helps, but RAM is also a big factor here since if you come short (which is more likely to happen when parallelising) swapping will make things grind.

if you're lucky enough to have enough RAM to spare (which would be, like, a lot!) you can also build in a tmpfs ramdisk!


The laptop has 32 GB and the amount of RAM is not a limit here. With sufficiently fast SSD the storage is not a limiting factor either. Even RAM speed is not a big factor, it is still single thread performance that affects compilation speed for C++ most. Well, at least for the type of C++ that Chromium uses.


Chromium does not use Bazel, but GN. I would assume they still build most of their dependencies from source, as this is the way Google tends to setup build systems.


one of these being libv8, which can take quite some time itself.

it can also be very memory consuming so as soon as you hit swap things slow down a lot.


Could it be that the majority of those 12 hours were taken by running tests?


Nope. This was with Gentoo, no tests were run.


The joys of c++


You saying that if rewritten in Rust Chromium will compile much faster?


No, rust is another slow language to compile. But there are faster ones such as C and zig. You can reduce C++ compilation times by following certain strict coding conventions, but it only gets you so far.


I actually use C++ to write business backends. They're of decent size but nothing monstrous. Compile is almost always nearly instant for me. Full rebuild is still under a minute. Sure I use multithreaded build and am trying not to expose unneeded stuff in headers.


You likely use "C with classes" C++, so C++03 for the most part, not the generics heavy C++ where you put everything into the header because it needs to be generic. This has massive advantages for compile time. Also "decent size" is a term with many definitions. E.g. Chromium has 12 million lines of C++.


>"You likely use "C with classes"

I do write templates but not very often. Also when I do I try to consume those in a single file so that they could be declared there as well without propagating into a header. Not a good idea for writing libraries but works for applications.

>"Chromium has 12 million lines of C++."

I am far short of that of course.


in such large projects linking is also a severe bottleneck, which I hear has been repeatedly tried to be addressed with alternative linkers


> the amount of bookkeeping

Any example?


There's also mozilla's sccache, which integrates with cargo (by wrapping rustc) to cache artifacts. A local cache is 2 lines of config in your .cargo/config.toml, and if you want to you can have shared caches in Redis or S3/Azure/GCP.

Not nearly as flexible or powerful as Bazel, but also vastly simpler to setup if all you want is caching.

https://github.com/mozilla/sccache


is there any effort to a public trusted shared cache?


For single projects, it's possible to build such caches and it's done by some, e.g. Firefox uses it.

But for the "any project" use case, the combination of Rust's compilation model and how crates.io works makes this hard. Due to Rust's compilation model, if crate A depends on crate B, then the version of crate B it's been compiled for can't be changed: you have to use that version. It also means that the features of the crate that have been enabled can't be changed. "Normal" Rust projects can quickly get into hundreds of crates.io dependencies in their DAG. Those are the very dependencies you want to use ssccache for. However, due to how crates.io works, basically every day you get a new updated crate in your DAG, and this invalidates all the crates that depend on it, including your own.

In a single project use case, this is no problem because everyone uses the same Cargo.lock so they use the same dependencies and the same features of those dependencies. But if you don't share a Cargo.lock, the problem is too wide.

There are some solutions like freezing crates.io according to a schedule and only pushing out updates once per week or so. But this still leaves the feature issue unsolved. Plus it is probably against the idea of crates.io to have updates available immediately. Another solution would be API only dependencies, but this is harder to pull off.


I’m using Bazel to build my rust project (Using the rules_rust rules) and it’a become quite a pain to use in concert with docker.

This is not a complaint about Bazel specifically, its fantastic, and easily my favourite build system bar none.

However it cannot cross compile Rust. This means if I’m developing on my MacBook, and I want to compile a Rust binary and put it in an Ubuntu docker container, I can’t do it on my host machine. I need to copy the source into the container and build it there, using multistage builds.

But this is -extremely slow- because it cannot take advantage of Rusts build caching. I’m talking 10-15 minutes for my small Rust project.

Has anyone run into this? How do you work around it?

I’ve considered running a Bazel remote execution server on a local Ubuntu VM, but this feels like so much extra complexity just to use Rust, Bazel and containers.


Bazel can cross-compile Rust, but you need a linker that can produce target executables. Cargo has the same limitation.

Apple ld doesn't support Linux as an output target, so you need to use GNU ld or LLVM lld instead.

Code examples at https://john-millikin.com/notes-on-cross-compiling-rust#baze...


Why not just mount the code in the container instead of copying it over? Rust deals very cleanly with build artefacts from different architectures so there's no risk of corruption.


This is a good idea but there is tension between this straightforward approach and the fact that, on macos, docker is running in a VM so it's inherently slower.


Isn't docker trying to use virtualization from the OS to run linux on "bare metal"? If so it should acheive similar speeds to native linux


The virtualized storage layer could actually be a bottleneck here. Though I think they've improved the performance of it to the point where it might no longer be a bottleneck but I'm not sure.


You might want to try using Docker's cache mounts? https://docs.docker.com/engine/reference/builder/#run---moun...

I've been using them locally of late, and they're excellent for storing state that's not obviously identical across invocations of the same layer.


You can cache the Rust target and make it use Rust's cache properly.

There are two common ways i'm familiar with. Cargo Chef and Manually. Manually can be a bit involved, but you basically just build with empty source and prune your fake-lib files from the cache dir. Cargo Chef is very easy.

My docker images build as fast as my local images. Though if you have many crates in a workspace it can start to become difficult.


I haven't used it yet but https://github.com/nadirizr/dazel might help. It runs bazel inside a docker container via a seamless proxy.


"MacBook, Rust, Bazel and containers." what is the essence of what you are trying to accomplish. focus on that.


I am always surprised how good and not-well-known google’s tooling is. Another example would be its closure compiler (with the accompanying j2cl/j2objc tools, which are all ridiculously cool).


What you might not have known is that Actually Portable Executable https://justine.lol/ape.html and Bazel's Closure Compiler tooling were written by the same person. https://github.com/bazelbuild/rules_closure


@jart, just wanted to say thanks for both APE and cosmopolitan, those projects have made my life infinitely easier as well as giving me a lot of my time back not having to figure out windows executable minutia. It's a no brainer for me to create either a WASM module or an APE to get the job done.


And also tysm @jart for derasterize https://github.com/csdvrx/derasterize, which was a great basis to work on, and helped me make sixel-tmux https://github.com/csdvrx/sixel-tmux

We achieved textmode supremacy, and jart achieved binary-portability supremacy.

There seems to be a pattern here :)


> I am always surprised how good and not-well-known google’s tooling is.

The problem is that Google is well-known for unceremoniously dumping stuff.

So my first question about any Google tool is: "Does it have enough non-Google people supporting it?" because, if it doesn't, it will die as soon as it becomes a political football.

My second question about any Google tool is: "Are there small projects using it and singing its praises?" Can I use the tool to do what I need to a handful of files after reading a couple of web pages for 15 minutes and have it work. I don't need to scale to 4 gazillion foobars--I do need to get stuff done without an IT staff of 100 people. From a cursory look, Bazel fails that question.

My final question about any tool is more personal: "Does it use C++?" because, if it does, then integrating with anything other than C++ is going to be a gigantic PITA. At this point there are plenty of fine alternatives to C++ that don't suck. If you don't have a clean, pleasant way of talking to things that only understand C library conventions, you fail and I'm moving on.


> My final question about any tool is more personal: "Does it use C++?" because, if it does, then integrating with anything other than C++ is going to be a gigantic PITA. At this point there are plenty of fine alternatives to C++ that don't suck. If you don't have a clean, pleasant way of talking to things that only understand C library conventions, you fail and I'm moving on.

I think you've got that the wrong way around; if you can talk to things using the C library, you automatically get reusability from Java, C#, Python, various Lisps, Ruby, Rust (Go?), and almost every halfway popular language in existence.

What alternative do you know off that is more widely supported by all programming languages?


C API != C++ API

Vulkan development, for instance, drives me up a tree. Vulkan has probably the best C API I've probably ever seen.

Vulkan support libraries like the Vulkan Memory Allocator all tend to be C++ because that's what GameDev tends to use. And they're an absolute nightmare to integrate with anything not C++.

I mean, even the C++ guys eventually gave up. Look at the proliferation of "Header-only" C++ libraries because integrating with the C++ compiler and linker is simply too painful.


I think you are agreeing. C APIs are broadly useful. C++ APIs, not so much.


You are quite correct and I am confused why my comment was upvoted :-/

I'm in full agreement with the GP, I just misunderstood the text (that's all on me, he was very clear).


From the point of view of Java, .NET and JavaScript, integrating anything other than C++ is exactly the reason why I stick with C++ for native libraries used on those ecosystems.

It adds yet another indirection layer on what those runtimes expect, the IDEs aren't prepared for mixed language debugging with anything else, and their build systems also need some additional nurturing.


I think the newer version of this is https://github.com/bazelbuild/rules_rust which lets you either vendor the dependencies or pull them from your Cargo.toml directly every time.

Per the article: bazel + rules_rust should have the flexibility to override the linker flags that Cargo may take as required since that would be a property of the bazel toolchain used.

It's a nice amalgamation of how cargo works and how bazel works.

In general bazel supports hermetic builds, multiple toolchains, cross complilation, and ways to compile multi-language projects.

I still wish that Cargo.toml didn't support build.rs as it can cause a lot of system-dependent problems that bazel sidesteps entirely by being hermetic.


I do not know why you're downvoted for this (crazy!?) -- this is exactly what I wanted to know [1]. I have a Rust monorepo with a bunch of "library"-type crates and about a dozen binaries (jobs, servers, userland programs)

I need this in my life.

[1] https://news.ycombinator.com/item?id=29745426


Did you eventually get a ref to an example Bazel build for Rust in that thread? (Writing here because you'll probably not look at the other thread again.)

I see the linked resources are to docs only.

If it's any help, here's a bazelized Rust demo implementation:

https://github.com/mihaigalos/code_templates/tree/master/rus...


Because at first glance it lacked context. Now the OP has updated it with relevancy, by quoting a bit from the article author.


Sorry about that, I realized it once I saw the downvote (thank you - it made the post better).

Empathy is essential.


I didn't downvote you, for what it's worth :P


These seem orthogonal to the flexibility desired in the post. My understanding is cargo-raze doesn't provide a way to trim -lm from the link line, for example -- it doesn't seem like it would provide any such features over what cargo provides.


also worth noting is Buck, which is a clone of Bazel. Not only does it support Rust, interestingly it seems the next version seems to be moving toward using Rust as the language of choice for developing it [1]

[1] https://developers.facebook.com/blog/post/2021/07/01/future-...


Which makes sense given yesterdays announcement that Rust is now an endorsed server-side language at Meta and Rust the recommended language for CLI tools: https://engineering.fb.com/2022/07/27/developer-tools/progra...


> It allows: > > * caching of artifacts. > > * shareable caches between {developers, build jobs} based on hashes.

This sounds like something that nix is optimised for. The inputs into building each package is captured so having different feature flags would just create different artifacts.


Last thing I want is more things to use Bazel. I can do without the headache. Perhaps Cargo can improve.


I’m hoping by more projects adopting Bazel, we get more powerful libraries, and it becomes trivial to use.


I wish I knew about this earlier! It's interesting that this project doesn't have a much higher visibility. Also wondering, what's the current relationship of this project with Google? (if you are involved with it)


It does, it's just that it... doesn't work well in practice. It's another case of "this works well for Google but doesn't translate well outside of Google". Plus, seems like Bazel is a very neutered version of Blaze anyway.


buildfarm takes away a lot of the neutering.


Not sure what to say concerning buildfarm's remote execution.

Reading an issue I've opened almost 2y ago [1], seems the backend requires the client to have a specific gcc version.

That's a strong limitation imho.

[1] https://github.com/bazelbuild/bazel-buildfarm/issues/545


This is related to non-hermetic build graph, so the builds have system-provided dependency in GCC, and to make the build sane requires that they are in sync.


Never run into this issue in the real world so.... /shrug


Wow, I didn’t know Basel was this powerful, gotta try it out now.


I feel some of the OP points. I was working on a profiling agent lately, and one of the issues was running it on multiple platforms (just the four big ones linux/mac-x86/arm) on FFI (because it'll be run directly from python/ruby/etc...) and preferably having the thing just work without having to install or configure any dependencies.

Like OP I hit two walls: libunwind, and linking. For libunwind, I ended up downloading/compiling manually; and for linking there is auditwheel[1]. Although it is a Python tool, I did actually end up using it for Ruby (by creating a "fake python package", and then copying the linked dependencies).

It was at that time that I learned about linking for dynamic libraries, patchelf and there is really no single/established tool to do this. I thought there should be something but most people seem to install the dependencies with any certain software. I also found, the hard way, that you still have to deal with gcc/c when working with Rust. It does isolate you from many stuff, but for many things there is no work around.

There is a performance hit to this strategy, however, since shared dynamic libraries will be used by all the running programs that need them; whereas my solution will run its own instance. It made me wonder if wasm will come up with something similar without affecting portability.

Finally, the project is open source and you can browse the code here: https://github.com/pyroscope-io/pyroscope-rs

[1]: https://github.com/pypa/auditwheel


Is compiling once and running on 6 platforms really that compelling? One of Rust’s super powers is that it’s really easy to write code once that can be compiled N times for N platforms without making any changes.

I’m all about writing code once. But compiling a few times doesn’t seem like that big of a deal to me?

The article says it runs on “six operating systems” but I can’t find them listed?


I'm not sure if the "actually portable executable" stuff is really practical for anything in its current state, but I find it neat the way the development of the project encourages people to try to find unifying abstractions between software environments and practice writing build tools for new software environments.


10 years ago when almost all computers except smartphones were x86 this would have been a gamechanger. Nowadays, we need to account for ARM no matter what. Either the developer machine or the deployment target could be ARM. Therefore the build system needs to be aware of cross compiling.


ARM isn't just a single architecture either, there's ARMv6, ARMv7, ARMv8/ARM64, etc. Most of the platforms folks are using are ARM64 but not all of them!


The sooner we ditch ARMv7 and predecessors, the better.


I'm not a big fan of how actually portable executable doubles down on x86-64. I think anyone really wanting to make run-anywhere executables that don't bind us to specific CPU architectures should consider something WASM based, like WASI on Wasmtime.


That requires a wasm runtime to be preinstalled everywhere. The beauty of APE is that it's just a valid executable file on all supported platforms - it doesn't have any dependencies or requirements.


Hopefully wasm will take off and a high quality wasm runtime (with JIT) will be pre-installed on every system, but we'll see if that happens.


The Java Runtime is pretty ubiquitous, so maybe compiling to Java bytecode would be a better approach for a "universal" executable, if runtimes are allowed.


I haven't seen a PC with JRE installed in many years now.


Just be sure to uninstall the browser toolbar Oracle bundled.


Sun*

And it has not been the case for close to a decade now.


Agreed, though I guess it's that I'm not too interested in the general use-case of generic independent executables that can run on any system by themselves, which APE is aimed at. It's probably good for me to be explicit that I'm interested in a more specific use-case that I think is shared by some people interested in very portable executables: where you have some application that contains redistributable plugins, and those plugins must be executable by the application on any OS/architecture that the application runs on and be sandboxable. WASM is very well-shaped for this specific use-case; it might be okay for the general independent-executable use-case if you embed a few small WASM interpreters or if your target platforms each include one (which seems increasingly reasonable on modern systems that have one because it's needed for the system webviews).


Go users seem to value a single static executable quite highly, a single portable executable would be even better in the same direction.


It should be noted Go hasn't spit out static executables by default for at least 5 years now (if not longer), and if you want static executables from Go it requires several arcane incantations (not unlike getting a static binary from GCC).

Basically every Go executable is dynamically linked against libc these days unless the builder goes seriously out of their way to get a static binary.


I'm not sure that's true.

    $ go version
    go version go1.18.3 linux/amd64
    
    $ cat static.go 
    package main
    
    import "fmt"
    
    func main() {
     fmt.Println("static")
    }
    
    $ go build static.go 
    
    $ file static
    static: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=TSoFKMWZzy5G6qyFvxx5/dwEbvdQns9f-iffR-lbW/iTmqy3UF8tbT-GGwVH61/lUZUt7_XebTifpVvE_52, not stripped
    
    $ ldd static
     not a dynamic executable
But it might not be the case for all systems. And some things, like using CGO or performing DNS lookups, will necessitate dynamic linking.


Mostly because many of them weren't around when building static executables was the only thing our compilers could spit out.


As a Go user who values statically linked executables I can definitely say I was around when this was the only thing our compilers could spit out.

The reason I value it is because it makes deployment so much easier. I've re-lived DLL Hell so many times on so many systems by now that I'll happily take the memory/filesystem/pactching pain to have something I can deploy and know it will run on the system I deploy it to any day of the week.


There is a big difference between valuing it, and acting as if it was a wonder that Go brought into the world never achieved before.

Which is how most in the community act like.

There is also a lesson on why dynamic linking is used, and why in its absence one needs to resort to IPC.


The way go uses system API is completely unportable - and largely unsupported.


Not just Go users. Nearly all products be it desktop GUI applications that self-update or backend servers I've ever developed are like this. Saves me a lot on deployments.


The python article mentions those: https://ahgamut.github.io/2021/07/13/ape-python/

> This post describes a proof-of-concept Python executable (2.7.18 and 3.6.14) built on Cosmopolitan Libc, which allows it to run on six different operating systems (Linux, Mac, Windows, NetBSD, FreeBSD, OpenBSD)


I too couldn’t find the magix six but furthermore, the portability dimension I’m most excited about is the transposed one: microprocessor architectures. I daily compile the swap between x64/arm64/rv64 without a hitch and know that there are other options too, but it always Unix (90% Linux).

It would have been nice if the OP had spent a few words on the motivation here.


Good catch! I updated the post to mention the six operating systems (Linux, Windows, MacOS, FreeBSD, NetBSD, OpenBSD).


> Cosmopolitan Libc ... that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS

So some subset of those


I’ve found rust incredibly portable. I’ve hacked around running the same server side app on the web (WASM), PC/Mac/Linux, iOS, and Android. Another project is a web app running on iOS and Android leveraging a SQLite DB.


     I’d change a configuration flag, some part of std would break because my 
    flag was wrong, and I’d learn something new about Rust and how std worked.
The project was probably worth doing just because of this. Breaking things in a safe environment is such a great way to learn how it all works.


While I love this sort of portability and in particular how it just makes Rust even more useful to me.

The library this is built on does have a bit of a weakness with respect to GUI software https://github.com/jart/cosmopolitan/issues/35 if this can be fixed this will be an amazing tool for building simple cross platform utilities and tools.


Aside form neatness factor and hacker street cred, I don't exactly get the practical point for the vast majority of software. What am I to do with such a binary? Do I put it live on my website and allow my clients to download it? If I leave it with an .exe extension so that it runs in the Windows shell, wouldn't that confuse users of other platforms? What if I need a directory structure as 99% of programs do? Do I use a zip or a tgz? In the first case, how do I preserve permissions on Unix targets? Do I need to instruct my clients into how to use tgz on the command line and/or create permissions?

Software distribution is by its nature a very platform specific problem; even if we accept the premise of an x64 world, an universal binary solves just a very small portion of the practical problems of portable deployment.

Ironically, the best use case I can imagine is creating an universal binary installer that can run on any Unix system and then proceed to make platform-specific decisions and extract files contained in itself, sort of like Windows binary installers work. But that's an utterly broken distribution model compared to modern package managers.


Just one question, as suggested by the title: the rust compiler itself could be made portable using this? I guess not, because of its use of multi-threading.


You probably could, but that would be less useful than you think.

There are two machines that you care about with a compiler: the machine the compiler is running on ("Host"), and the machine the compiler is producing code for ("Target").

Generally we use a compiler with the same Host and Target - if you use Rust on x64-Windows you get a binary that runs on x64-Windows. If you use it on ARM-Linux you get a binary that runs on ARM-Linux. What you are talking about is making a compiler that would run on all Hosts, but it would take different work to make it be able to produce code for all Targets. So you'd produce a compiler that targeted x86-Windows and it would run on x86-Linux but still produce code for x86-Windows. It would also NOT be able to run on ARM-Linux.

[For completeness there's actually three machines we talk about with compilers - in addition to Host and Target there is also "Build". This allows you to cross-compile your compiler. For example you want to build your compiler on x86, you want the resulting compiler to run on ARM, and when it runs it produces code for RISC-V. Here Build is x86, Host is ARM and Target is RISC-V.]


> What you are talking about is making a compiler that would run on all Hosts, but it would take different work to make it be able to produce code for all Targets.

This is already how rustc works, it is not like GCC. Any rustc can (cross-)compile for any target, as long as you have the rlibs for that target, your libllvm has those targets enabled, and you have the appropriate linker. Rustup usually manages all that for you.


But the “actually portable executable” is a single target and runs on many platforms, surely a rustc compiled as an APE for APE would be useful?


Oh I see! You mean "built as an APE and targeting APE". That would indeed be an interesting idea.


> I just built a Rust executable that runs on six operating systems

I help maintain a kernel in C that runs on nine architectures, some of which don't even have LLVM backends, much less stable rust toolchains.

"Portable" means rather different things. This blog post is focused on the easy stuff.


They didn't build six executables that run on six operating systems. Rather, it's a single nativity compiled executable that runs on six different operating systems unmodified.

Cosmopolitan is an incredibly cool project that does more than you think.

https://github.com/jart/cosmopolitan


While this project might focus on the “easy” definition of portable, I’ve never seen this done before. This post was both interesting and informative.

I don’t think you comparing your (unlinked and unnamed) kernel to this is very constructive. It feels like you’re gate-keeping.


node.js is also portable with pkg


Nitpicking on terminology. Portable used to mean that software can run on another platform with minimal modifications. Typically by relying on abstraction layers that then has multiple implementations. It's cool that a single executable can run on both Windows and some Unixes but that's something else than what portable used to mean.

portable = able to port


Thanks for the clarification. So I learnt executable isn't the same as portable.

In this sense, we cannot say "C is portable"? Since we need to compile it for each platform; the same for java, we need to run the bytecode on each platform's virtual machine, and the compiler/virtual machine isn't built-in along with most os.

Then back to node.js, since v8/bun is not built-in on most os, so we cannot say it's portable right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: