
Building Tiny Rust Binaries for Embedded Linux - jamesmunns
https://jamesmunns.com/update/2018/04/01/tinyrocket.html
======
kibwen
Note that one of these steps, the one where you replace jemalloc with the
system allocator, is quite likely to become the default behavior sometime
after the global allocator API stabilizes (which should happen this year).
It's a bit of a shame because jemalloc really is faster than the system
allocator, often by a significant margin, but in practice jemalloc is a very
large burden to package in-tree, plus it largely contributes to binary bloat
as seen here, and also tools like valgrind don't play nicely with anything
other than the system allocator. Of course, we won't make the switch until we
have an easy way to get jemalloc back via crates.io, for our users who really
want jemalloc's performance benefits.

~~~
GrayShade
I think GLIBC 2.26 added a per-thread cache, so it might no longer be that
slow when compared to jemalloc.

~~~
kibwen
I'm under the impression that what also makes jemalloc fast is that it has a
richer API than glibc malloc (in a way that isn't drop-in compatible, so libc-
compatible malloc can't ever provide it), thereby giving it more info that it
can use for optimization (but also stymying tools like valgrind).

~~~
GrayShade
That's sized deallocation, right? Unfortunately, GLIBC doesn't seem to provide
it.

------
jamesmunns
Hey all, I was able to attend the Rust All Hands this year, and this is a
write up of a cool project I worked on there. Let me know if you have any
questions!

~~~
Lerc
The article mentions the Flash storage size at 8meg. What was the ram size of
the target?

UPX can be a bit of mixed bag. It doesn't necessarily cause load delays, If
decompression is faster than data loading it can actually speed things. You
can't mmap it though which could be an issue depending on RAM. If using an
underlying compressed filesystem it might be providing on-paper-only gains.

It would be interesting to see breakdowns of the size of the text and data
section at each stage of the size optimization.

------
arghwhat
I find the number for xargo, without stripping or UPX to be by far the most
interesting. It almost gets you all the way (1/6th of the default release
size), while not breaking anything.

Stripping removes useful things from your binary, and UPX tends to be a no-go
for many things.

~~~
steveklabnik
We've been talking about "unforking xargo", and treating libstd/etc more like
any other crate, which would mean that you could do this in a more easy way.
It's gonna take some time though.

~~~
nixpulvis
This sounds like a really good idea! Would there still be prebuilt versions
for the sake of faster builds out of the box?

~~~
kibwen
Details are still up in the air. All we know for sure (decided at the Rust
All-Hands last week) is that we definitely want to extend cargo in a way that
obviates xargo (a plan which xargo's maintainer has enthusiastically
endorsed).

------
dom96
For comparison's sake I just compiled the Jester hello world example using
Nim:

    
    
        import asyncdispatch
    
        import jester
    
        routes:
          get "/":
            resp "Hello World"
    
        runForever()
    

Command: nim c -d:release --opt:size file.nim

The size on my Mac is 349K and that's with very little tweaking.

~~~
littlestymaar
Many people have complained about the Rust Evangelism Strikeforce, but the few
nim users here on HN are taking it to the next level…

~~~
uryga
I don't think that's fair towards GP – they only provided a data point using a
language that is competing with Rust in some areas.

(Unless I misread your comment, and "RustEvStrFo" doesn't have the same
negative connotation to you...)

~~~
littlestymaar
You're right, it not really nice on that specific comment. I just noticed this
trend the past few months, and this time I reacted.

Edit: if you look at the commenter's profile, you'll see that he does that
really often, on threads talking about Rust, Go or C. I think the comparison
to the Rust Evangelism Strikeforce holds.

~~~
dom96
I think there is a difference between a core developer of Nim (me) trying to
raise some awareness, by showing how easy it is to achieve what the article
achieves in Nim, vs. a random developer proclaiming "Why haven't you written
this in Rust?".

Isn't that what the Rust Evangelism Strikeforce is known for?

~~~
dbaupp
It's what people think the "RESF" does, but as is consistently mentioned
whenever it is brought up, there seems to be more people replying to any
comment about Rust with complaints of "RESF" (even if the original comment is
just "trying to raise some awareness", etc.) than people actually proposing
rewriting anything/everything in Rust.

In any case, I think being a core developer on something means a much higher
bar for accuracy/relevance/general public behaviour when discussing it and its
competitors: everything one says about the project is a semi-official
representation of it. Certainly the Rust core developers try to be careful
about conveying an accurate picture of costs as well as benefits when
answering questions or correcting misconceptions[1]. That is certainly
something I found weighed on my mind a lot when I was on the Rust teams:
anything I said about it could become part of "the Rust project" (as an idea
people have) itself.

[1]: One extra point is that, IME, this is often how Rust team members
interact with social media: they won't be the first person to bring up Rust in
a thread. Of course, it's fair to say that Rust has a bigger mindshare and
more people outside the project team members will bring it up/compare to it
than Nim, but one possible alternative approach to awareness is more whole
threads (i.e. submitted articles) about Nim rather than just comments within
threads about other technologies.

(To be clear, this is just a reply to the parent comment, I'm not trying to
say the original Nim comment was "NESF"-ish or "RiiN".)

~~~
doom_Oo7
> there seems to be more people replying to any comment about Rust with
> complaints of "RESF"

[https://www.reddit.com/r/programming/comments/83n32i/the_def...](https://www.reddit.com/r/programming/comments/83n32i/the_definitive_c_book_guide_and_list/)

~~~
dbaupp
I didn't say it _didn 't_ happen. I acknowledge it, and I've even personally
written many, many comments calling out/correcting comments that are too
enthusiastic in their promotion of Rust.

In any case, for that specific comment, see Steve's analysis in response to
your ping:

 _> But a brand new account with three trolly comments is different than an
actual Rust advocate. Just like I'd ignore our local anti-rust trolls, I'd
also ignore any pro-Rust trolls._

------
ajross
This gets confused right off the bat between file size and mapped size. The
whole "strip" nonsense could have been skipped if the author looked at
.text/.data/.rodata (and .bss too -- often the resource you're trying to
conserve is RAM and not storage!) sizes instead.

Likewise the playing with upx isn't really relevant to a rust article per se.
That's a generic executable compressor that will work with anything.

But some stuff here was really interesting. I honestly had no idea that rust
was statically linking its own heap implementation into every binary it
created! Come on, guys, jemalloc isn't even in rust and doesn't benefit from
static linkage. If you're going to distribute your own runtime (and if you're
using your own heap, you're distributing your own runtime!) at least
distribute it in a shared-by-default way.

~~~
pcwalton
> Come on, guys, jemalloc isn't even in rust and doesn't benefit from static
> linkage. If you're going to distribute your own runtime (and if you're using
> your own heap, you're distributing your own runtime!) at least distribute it
> in a shared-by-default way.

Our users consistently indicate that they prefer the convenience of being able
to copy all-in-one binaries from system to system to the space savings of
dynamic linking. Rust used to dynamically link by default, but based on
popular demand this was changed prior to 1.0.

Look at what people say they like about Go, for example. Static linking for
easy deployment is high on the list.

~~~
monocasa
[http://harmful.cat-v.org/software/dynamic-
linking/](http://harmful.cat-v.org/software/dynamic-linking/)

~~~
pcwalton
I don't agree with this. Dynamic linking is perfectly reasonable for OS
libraries like kernel32.dll, for instance. Dynamic vs. static linking is a
tradeoff like any other.

~~~
monocasa
Sure, but NT is a totally different story since you just aren't given another
stable ABI other than linking against .dlls.

But there is a lot of value on Unixen of static linking (or only linking
against VDSO equivalents).

------
rhn_mk1
What is left in such a binary? 800K of code seems on the high side even for a
http framework.

~~~
zokier
While not answering the question directly, you can peek into Cargo.lock to see
what all libraries are used by the project:

[https://github.com/spacekookie/tinyrocket/blob/master/Cargo....](https://github.com/spacekookie/tinyrocket/blob/master/Cargo.lock)

Of course what of those end up in the final binary is an open question. It
would be kinda cool to see a dependency tree with size annotations, but I
suppose that might be bit tricky to produce.

But I did find "cargo-bloat"[1] tool that gives following output:

    
    
         File  .text     Size Name
         4.8%  19.5%  98.2KiB core
         4.0%  16.3%  81.9KiB std
         3.9%  15.8%  79.4KiB rocket
         2.4%   9.9%  49.9KiB hyper
         2.0%   8.1%  40.6KiB [Unknown]
         1.7%   7.0%  35.3KiB alloc
         1.6%   6.5%  32.7KiB toml
         1.1%   4.5%  22.5KiB url
         1.0%   4.3%  21.4KiB yansi
         0.5%   2.0%   9.9KiB tinyrocket
         0.5%   1.8%   9.2KiB time
         0.3%   1.3%   6.4KiB ring
         0.2%   0.8%   3.8KiB ordermap
         0.1%   0.5%   2.4KiB idna
         0.1%   0.4%   1.8KiB percent_encoding
         0.1%   0.4%   1.8KiB unicode_normalization
         0.1%   0.3%   1.3KiB serde
         0.1%   0.2%   1.2KiB std_unicode
         0.0%   0.2%     880B pear
         0.0%   0.1%     576B log
         0.0%   0.1%     338B cookie
         0.0%   0.1%     307B unwind
         0.0%   0.0%     198B unicode_bidi
         0.0%   0.0%     161B state
         0.0%   0.0%     160B httparse
         0.0%   0.0%     124B memchr
         0.0%   0.0%      94B alloc_system
         0.0%   0.0%      77B typeable
         0.0%   0.0%      52B smallvec
         0.0%   0.0%      45B compiler_builtins
         0.0%   0.0%      28B rustc_tsan
         0.0%   0.0%      25B unicase
         0.0%   0.0%      11B panic_abort
         0.0%   0.0%      11B panic_unwind
        24.6% 100.0% 502.6KiB .text section size, the file size is 2.0MiB
    

When the stripped binary on my system is 979K. So it manages to account for
about half of the binary size. From what is accounted, I suppose yansi (ANSI
terminal lib) is the most superfluous, followed by toml parsing. Other than
those, most of the stuff seems kinda relevant for a web framework.

I'm now bit curious what the 979 - 502.6 = 476.4 remaining kilobytes are.. and
totally nerdsniped. Running size gets me following:

    
    
        [zokier@zarch tinyrocket]$ size -A -d target/release/tinyrocket
        target/release/tinyrocket  :
        section                size      addr
        .interp                  28       624
        .note.ABI-tag            32       652
        .note.gnu.build-id       36       684
        .gnu.hash                28       720
        .dynsym                2568       752
        .dynstr                1532      3320
        .gnu.version            214      4852
        .gnu.version_r          288      5072
        .rela.dyn             40320      5360
        .rela.plt              2328     45680
        .init                    23     48008
        .plt                   1568     48032
        .plt.got                  8     49600
        .text                519185     49664
        .fini                     9    568852
        .rodata              294652    568864
        .eh_frame_hdr         15580    863516
        .eh_frame             82544    879096
        .tdata                   48   3060064
        .tbss                   136   3060112
        .init_array               8   3060112
        .fini_array               8   3060120
        .data.rel.ro          34984   3060128
        .dynamic                576   3095112
        .got                    872   3095688
        .data                   201   3096576
        .bss                    464   3096784
        .comment                 69         0
        Total                998309
    

So almost 290k in .rodata, that seems suspicious. Quick look at it shows about
17k of html as steveklabnik mentioned, but other than that I can't really
identify any major blocks. I suppose this is the end of the dive, unless
someone knows some tools to get more insight into .rodata data. I suppose in
theory it should be possible to track down where in the code each bit of
.rodata is accessed from, but that seems bit of a stretch.

[1] [https://github.com/RazrFalcon/cargo-
bloat](https://github.com/RazrFalcon/cargo-bloat)

~~~
haberman
> I suppose this is the end of the dive, unless someone knows some tools to
> get more insight into .rodata data. I suppose in theory it should be
> possible to track down where in the code each bit of .rodata is accessed
> from, but that seems bit of a stretch.

My tool Bloaty
([https://github.com/google/bloaty](https://github.com/google/bloaty))
attempts to do exactly this. It even disassembles the binary looking for
instructions that reference other sections like .rodata.

It doesn't currently know anything about Rust's name mangling scheme. I'd be
happy to add this, though I suppose Rust's mangling is probably written in
Rust and Bloaty is written in C++.

~~~
zokier
Cool. So I ran bloaty (with -d sections,segments,rawsymbols) on tinyrocket and
used rustfilt[1] to demangle the symbols, and we have numbers for .rodata:

    
    
        demangled                                                   Filesize (KiB)
        idna::uts46::find_char                                            89.25
        unicode_normalization::normalize::d                               69.52
        [384 Others]                                                      43.48
        rocket::config::RocketConfig::override_from_env                   23.59
        unicode_bidi::char_data::bidi_class                               15.16
        idna::uts46::decode_slice                                         12.23
        unicode_normalization::normalize::compose                         10.29
        core::num::dec2flt::algorithm::power_of_ten                        5.97
        unicode_normalization::tables::normalization::canonical_comb       3.91
        [section .rodata]                                                  1.99
        idna::uts46::validate                                              1.95
        <hyper::status::StatusCode as core::fmt::Debug>::fmt               1.79
        tinyrocket::main                                                   1.46
        core::num::flt2dec::strategy::grisu::CACHED_POW10                  1.27
        rocket::config::config::Config::set_raw                            1.15
        time::display::parse_type                                          0.99
        hyper::server::listener::spawn_with::{{closure}}                   0.96
        percent_encoding::percent_encode_byte                              0.76
        <hyper::status::StatusCode as core::fmt::Display>::fmt             0.71
        rocket::catcher::defaults::get::handle_431                         0.66
        rocket::config::init::{{closure}}                                  0.65
    

Lots of unicode and idna stuff there. I think the binary size would be reduced
significantly if we could drop those somehow.

Most likely those embedded HTML (error) pages are accounted in
"rocket::config::RocketConfig::override_from_env", which sort of makes sense.

> It doesn't currently know anything about Rust's name mangling scheme. I'd be
> happy to add this, though I suppose Rust's mangling is probably written in
> Rust and Bloaty is written in C++.

I suppose you don't want to have a (optional) dependency to Rust code? It
should be pretty easy to provide C interface to rustc-demangle[2] which would
be usable from Bloaty.

[1] [https://github.com/luser/rustfilt](https://github.com/luser/rustfilt) [2]
[https://github.com/alexcrichton/rustc-
demangle](https://github.com/alexcrichton/rustc-demangle)

~~~
zokier
This was easier than I thought:

[https://github.com/google/bloaty/compare/master...zokier:rus...](https://github.com/google/bloaty/compare/master...zokier:rust_demangle?expand=1)

and here is the wrapper for rustc-demangle:

[https://github.com/zokier/rust-demangle-clib](https://github.com/zokier/rust-
demangle-clib)

Seems to work just fine for my simple use at least:

    
    
        [zokier@zarch bloaty]$ ./bloaty -d sections,segments,symbols -C rust /tmp/tinyrocket/target/release/tinyrocket
             VM SIZE                                                                                        FILE SIZE
         --------------                                                                                  --------------
          52.0%   506Ki .text                                                                              506Ki  24.8%
             100.0%   506Ki LOAD [RX]                                                                          506Ki 100.0%
                  65.6%   332Ki [967 Others]                                                                       332Ki  65.6%
                   5.1%  25.7Ki hyper::server::listener::spawn_with::{{closure}}                                  25.7Ki   5.1%
                   3.6%  18.3Ki <yansi::Paint<T> as core::fmt::Display>::fmt                                      18.3Ki   3.6%
                   2.9%  14.5Ki rocket::ignite                                                                    14.5Ki   2.9%
                   2.5%  12.8Ki rocket::config::config::Config::set_raw                                           12.8Ki   2.5%
                   2.1%  10.7Ki core::ptr::drop_in_place                                                          10.7Ki   2.1%
                   1.9%  9.78Ki std::sys_common::backtrace::output                                                9.78Ki   1.9%
                   1.9%  9.62Ki tinyrocket::main                                                                  9.62Ki   1.9%
                   1.6%  8.16Ki <alloc::raw_vec::RawVec<T, A>>::double                                            8.16Ki   1.6%
        ...
    

I can try to clean it up bit more etc if you want a proper pull request for
this?

~~~
haberman
Cool! If the dependency is optional I think this would be great and I'd love
to see a PR for it. It could be configured at CMake time.

I think I'd prefer to just make this part of shortsymbols/fullsymbols instead
of making a separate "rustsymbols". I assume that Rust symbols won't
successfully demangle as C++ (and vice-versa), so we can just try both
demanglers and use whatever works. That seems like it will be more graceful
for mixed C++/Rust binaries.

~~~
zokier
I created
[https://github.com/google/bloaty/issues/110](https://github.com/google/bloaty/issues/110)
to track this. I'll try to find time to clean up the patch, but no promises.
Lets continue the discussion in the GH issue.

------
_o_
I know it is about rust but I just don't see the point. If you want for
whatever reason small binaries, rust is just wrong, take c (or asm). The
closest you go to processor instructions, the smaller the binary is and rust
is compleatly wrong lang. for that.

Actually I think go has beaten rust at anything relevant... (I am c/c++ fan so
I have no preference to either. Rust has become hacker news click bait.)

~~~
buster
Can you explain why? How is C closer to processor instructions then Rust?

~~~
ShroudedNight
My experience with using C was that, while it isn't necessarily closer to
processor instructions, the instructions it does generate are far more
predictable than the ones generated by Rust. And when it does surprise you,
it's usually either a pleasant occurrance, or a compiler defect.

This is a combination of many things, the big ones being: 1) My temporal
experience with C dwarfs my temporal experience with Rust by at least an order
of magnitude 2) The Rust ABI is [still - last I checked] opaque. 3) Every
operating system environment I have interacted with has been _deeply_
C-oriented, so the platform utilities for exploring the C to ELF / PE
transition are much more accessible.

I would _love_ to be able to have the same familiarity, predictability, and
indegenous feeling with Rust that I have with C, but each time I attempt to
interact with the ecosystem, I come away feeling alienated and like I've just
interacted with a 'the compiler knows best' cult.

~~~
buster
\- "the instructions it does generate are far more predictable than the ones
generated by Rust."

I am wondering how that might work? Isn't this extremely depending on the
compiler and flags? To be honest, i would assume that a modern compiler with
many optimizations would generate a very different list of instructions then
some simple compiler. C has a large list of available compilers..

\- "I would love to be able to have the same familiarity, predictability, and
indegenous feeling with Rust that I have with C,"

To be honest, all that seems to be mostly time and experience. Surely you
won't get the same familiarity with Rust compared to C in a fraction of the
time. It just sounds like you feel comfortable enough with C and see no reason
to change, which is good for you in a way :)

~~~
ShroudedNight
The best I can come up with is examples:

    
    
      enum State {
        UNINITIALIZED,
        STARTING,
        RUNNING,
        SLEEPING,
        STOPPING,
        TERMINATED
      };
    
      struct Thread {
        State state;
        int32_t threadID;
        MemoryMap * memoryMap;
        LockTable * lockTable;
      };
    

The memory layout of a Thread object is highly predictible: +0 state, +4
threadID, +8 memoryMap, +(12/16) lockTable

I can predict the symbol name for a function:

    
    
      Thread *
      createThread(void *stackArea)
    

I know that stackArea is going to be in RDI, and the resulting Thread* is
going to be in EAX (or their well-specified equivalents on another platform)

    
    
      switch (thread->state) {
        case UNINITIALIZED:
        case STARTING:
          error("Not ready yet");
          break;
        case STARTED:
          sleep(thread);
          break;
        case SLEEPING:
          error("Already asleep");
          break;
        case STOPPING:
        case TERMINATED:
          error("Thread is terminating");
          break;
        default:
          critical_error("The world is on fire");
          break;
      }
    

Given a compiled set of instructions, I can link the compiled instructions
corresponding to the switch statement (like a jump table) back to the C code.

    
    
      for(size_t i = 0; i < SOME_MAXIMUM; ++i) {
        unsigned char const working_value = input1[i];
        unsigned char const mask = ((unsigned char) -1) - (working_value >> 7);
        output[i] = input2[i] &= mask;
      }
    

I've probably screwed that up, and I don't claim that it's doing anything
useful here, but assuming it was doing something useful, I would be able to
identify the instruction associated with it, figure out if there was some
deficiency present (like say, a conditional) and have a decent idea of how to
fix it.

Rust, on the other hand, does a bunch of things that are (so far) opaque to
me.

One example is passing a borrowed slice of something, and somehow, not just a
pointer to the thing itself is getting passed, but also seemingly, potentially
one or more boundary indices, as well as a lifetime (maybe?)

Another example is a match on the type of an object, clearly [on second
thought, perhaps not] there's some run-time state being kept around along with
the explicitly declared fields inside that object, what does that object look
like in a memory dump?

As final example: generics, how and when do they get expanded into discrete
functions? What do the symbols get named? How do I know if a generic expanion
is likely going to overly prolific? Inlining and templates can do wonderous
things for performance, but they can also drown it.

Anyway, hopefully that wasn't too overwhelming, or ranty. You asked nicely
twice; I felt compelled to give you a real answer.

