
How Rust is tested - brson
https://brson.github.io/2017/07/10/how-rust-is-tested
======
losvedir
I love the concept of the "crater run" (or I guess now "cargo bomb"?), where
the entire crate (i.e. "package") ecosystem is compiled, as a test for
speculative compiler changes.

It's one of the huge benefits of being both a statically-typed compiled
language and offering a modern package system. A lot of languages have the
types & compiling but no standard package ecosystem (Haskell, Go, Java), or
the package ecosystem but not the compiling (Ruby+Gems, JS+npm, etc). Rust is
the only language I can think of that has both, although I'm sure there are
others. I'm not sure if there's tooling for compiler writers in other
languages to compile them all, though.

And this feature will only get more and more powerful as the ecosystem grows.
I think it's a seriously important tool to keep the rust language feeling
stable.

~~~
twic
What stops Java doing this is not that it doesn't have a "modern" package
system, it's that the package system ships compiled bytecode rather than
source code.

Conversely, what stops other languages with source-shipping package managers
doing this is that packages don't include tests. One notable exception to this
is Perl, where packages can, and typically do, include tests, and these are
run as part of the normal installation process. There are even some public-
spirited souls who download and test everything that gets uploaded:

[http://www.cpantesters.org/](http://www.cpantesters.org/)

This incorporation of tests is both a consequence of and a justification for
the fact that Perl inter-package dependencies aren't versioned, so you can
never really be sure that you're going to download a combination that works.

~~~
saghm
An interesting thing I learned recently about cpantesters is that they'll also
test packages against the bleeding edge versions of the interpreter, so
sometimes a package maintainer will get an issue filed (or a PR) saying "your
package is going to break in about eight months", which I find both funny and
cool.

------
Scaevolus
> There’s a big downside though in that landing patches to Rust is serialized
> on running the test suite on every patch, and it takes a particularly long
> time to Run the Rust test suite in all the configurations we care about.

One solution is speculative batched testing. Kubernetes uses this to keep up
with its very high rate of PRs (>40 merges/day) and ~1hr maximum test times.

Given a queue of patches A, B, C, D, start the normal testing against A, but
_also_ start testing the merge of A+B+C+D against the master. If the batch
passes first, merge them all. If A is merged first, it's still "compatible"
with A+B+C+D, so they can go ahead. Doing a batch in parallel with a single PR
like this slightly increases testing load, but ideally doesn't cause any
slowdowns versus the fully serialized case.

It looks like bors already supports something like this with "rollups", where
simple fixes can be marked for testing together: [https://internals.rust-
lang.org/t/batched-merge-rollup-featu...](https://internals.rust-
lang.org/t/batched-merge-rollup-feature-has-landed-on-bors/1019)

~~~
loonyphoenix
Doesn't this process break when, e.g., master + A passes tests, master + A + B
fails, but master + A + B + C + D passes again (having been fixed by C + D
accidentally)?

I mean, it's pretty unlikely to happen, but it's possible. Plus it doesn't
matter if all you cared about was that the HEAD of master passes tests, but
not the whole history.

However, the current approach, if rebasing is used to squash every merge
request into a single commit on top of master, allows you to have a linear
history of the master branch where every commit is known to pass the whole
test suite that was in existence at that moment, which, I think, helps a lot
when bisecting.

I'm not sure if Rust compresses merge requests into a single commit on top of
master, though.

~~~
Scaevolus
Yes, it's theoretically possible. When it happens in Kubernetes, it's far more
common that m+A+B failed because of a flaky test than because B was actually
broken, and C+D fixed it.

What we often see is m+A fails because of a flake, and m+A+B+C+D+E passes and
merges as a batch. We have a lot of flaky tests. :(

~~~
saghm
I remember my first experience with That's flaky tests! (or more accurately,
Cargo's:

[https://github.com/rust-lang/cargo/pull/3715](https://github.com/rust-
lang/cargo/pull/3715)

------
JoshTriplett
> At some point though LLVM began doing a valid optimization that valgrind was
> unable to recognize as valid, and that made its application useless for us,
> or at least too difficult to maintain.

What optimization was that? Some searching didn't turn up any relevant results
(other than this article).

~~~
steveklabnik
I couldn't quite remember either, and brson told me that
[https://github.com/rust-lang/rust/issues/11710](https://github.com/rust-
lang/rust/issues/11710) was probably it. Can't seem to find where it was
actually finally disabled though.

~~~
kibwen
Money quotes:

 _" This is a known valgrind false positive and Rust already has a lot of
suppressions for it. It's reported upstream on the LLVM bug tracker but is not
really an LLVM issue, just a limitation of valgrind. LLVM is allowed to
generate undefined reads if it can not change the program behavior."_

[...]

 _" Just for the record, the undefined read doesn't change the behaviour
because it's just a check whether free should not be called, i.e. either it
jumps over the free call right away, or it performs the second check."_

------
wfunction
> The thing we do differently from most is that we run the full test suite
> against every patch, as if it were merged to master, _before_ committing it
> into the master branch, whereas most CI setups test _after_ committing, or
> if they do run tests against every PR, they do so before merging, leaving
> open the possibility of regressions introduced during the merge.

Hm, Travis CI on GitHub runs tests on a pull request before the merge. Is what
they're doing really that unusual?

~~~
saghm
As it says in the quote you provided, the difference is that it tests what the
result would be of the merge, which differs in that a merge might introduce
regressions; if you've never had a bug introduced by a merge not doing what
you expected, consider yourself one of the lucky ones.

~~~
wfunction
I'm not sure I'm understanding things correctly. What I meant was that the
test chronologically occurs before the merge takes place. But the test
__includes __the merge itself; it 's testing what the result _would_ be after
the merge. Have you used the system I'm talking about/do you know what I mean?
Am I still misunderstanding something?

~~~
tyoverby
Here's an example scenario:

1\. Your PR is tested with the merge against Master. All green!

2\. Another PR is tested with the merge against Master. All green!

3\. The other PR goes in first. All good!

4\. Your PR goes in (merge happens, but introduces a bug). Oh no!

~~~
wfunction
Thanks! Someone pointed it out in a sibling comment afterwards too. Kind of
sucks, I wish they had a "do final check and merge" button that would merge
PRs sequentially into master after re-testing when there's a collision like
this. Any idea why they don't?

------
pornel
I appreciate the thoroughness of this. I've been using Rust for 2 years now,
and the compiler has been rock solid.

------
hsivonen
It would be really nice if Travis had Linux-on-ARMv7 and Linux-on-aarch64
options on real ARM hardware. I wonder what the economics of Travis building
such a thing on top of an existing ARM cloud offering like Scaleway would be.

------
JoshTriplett
> Today the longest-running configuration takes over 2 hours.

I'm curious which configuration that is, and how it could be accelerated.

~~~
steveklabnik
Looking at the build that's building right now, all of them have passed but
one, which has been going on for 2 hours and 23 minutes: [https://travis-
ci.org/rust-lang/rust/builds/252052639](https://travis-ci.org/rust-
lang/rust/builds/252052639)

> "env": "RUST_CHECK_TARGET=dist
> RUST_CONFIGURE_ARGS=\"\--target=aarch64-apple-ios,armv7-apple-
> ios,armv7s-apple-ios,i386-apple-ios,x86_64-apple-ios --enable-extended
> --enable-sanitizers --enable-profiler\" SRC=. DEPLOY=1
> RUSTC_RETRY_LINKER_ON_SEGFAULT=1 SCCACHE_ERROR_LOG=/tmp/sccache.log
> MACOSX_DEPLOYMENT_TARGET=10.7\n",

iOS, it seems.

Looking at the other builds,

* s390x Linux

* i686-apple-darwin

* x86_64-apple-darwin

Seems apple is an overall slowpoke.

~~~
ihsw2
This has more to do with Travis-CI than Darwin et al.

[https://www.traviscistatus.com/](https://www.traviscistatus.com/)

Notice how the active OS X builds reaches the limit of 216 jobs and the
backlog of OS X jobs increases without decreasing. This is a capacity issue
that is likely related to high OS X build costs.

Another thing to notice is that the backlog correlates strongly with time of
day in the EST timezone, where the backlog starts to take off after 9am EST.

------
sudeepj
I have been taking inspiration from the Rust project and the way they go about
things in general. In my workplace, we have a bot similar to bors, which does
the merge request testing and is the gatekeeper.

Tools like bors act as a "force-multiplier".

------
TotallyGod
Heads up that your Google fonts (in site.css) trigger a warning about insecure
scripts, and in Chrome at least they therefore don't load by default.

------
crncosta
I dream with the day we will have a GCC Rust compiler.

~~~
pas
What would the benefits of that be?

------
Outrageous
Anyone care to enlighten me why fuzz testing and not mutation testing?

------
nercury
I don't get it, because you have just explained how to unit test everything
else. Which is way better than nothing. Not sure how it is moot, also not sure
how it is related to the article :)

------
jorgec
Unit test in a nutshell, for a business system:

Practically every business system consists of a view, the logic and the
database, plus some webservice and whatnot, but mainly its those 3 parts (the
so called 3 layers).

Can you unit test, test the view layer? Not really. Can you unit test, test
the database by doing real test? Again, not really and its not always
possible. So, Unit test is mainly focused in the layer in between of the view
and database and both, by their nature, can't be automatically tested.

I like the idea of Unit Test but, for business system, its moot.

