
Bugs You'll Probably Only Have in Rust - Gankro
https://gankro.github.io/blah/only-in-rust/
======
erickt
One of the most important tools when writing unsafe rust is compiletest [1].
It's a tool extracted from the compiler project that lets you write tests that
are supposed to fail compilation. Since safe abstractions rely on the type
system to make unsafe code safe, it's critical to make sure the compiler is
properly rejecting code. I wrote a post about this years ago when I got hit by
one of the bugs Gankro wrote about [2].

[1]: [https://github.com/laumann/compiletest-
rs](https://github.com/laumann/compiletest-rs)

[2]: [http://erickt.github.io/blog/2015/09/22/if-you-use-
unsafe/](http://erickt.github.io/blog/2015/09/22/if-you-use-unsafe/)

------
wyldfire
> Making unsafe a big scary "all bets are off" button is only compelling if
> most of our users don't need to use that button. Rust is trying to be a
> language for writing concurrent applications, so sharing your type between
> threads requiring unsafe would be really bad.

It would be neat if we could decompose unsafe like so
"unsafe[this_feature,that_feature] {}". The unqualified "unsafe" could still
refer to a global "free reign", but you could opt-in to "only let me violate
these specific rules." It would be a hint to maintainers and might help make
the std lib and other core libraries be/remain defect-free.

Another interesting "oh shoot" w/unsafe that I'm curious about: when I
intentionally/unintentionally alias two variables in my unsafe block, this
will invalidate assumptions made elsewhere in safe code. This is my unsafe
block's bug, but it seems like something that could take a good while
debugging to attribute back to my unsafe block. I don't think there's a good
resolution to this one other than perhaps documentation/best practices.

~~~
Gankro
Re: parameterized unsafe -- I think it's been discussed and rejected, I don't
remember where. I think it was mostly a matter of "yes this would be more
powerful, but the complexity isn't worth it".

Note that we sort of made a "new" kind of unsafe with the UnwindSafe trait:
[https://doc.rust-lang.org/std/panic/trait.UnwindSafe.html](https://doc.rust-
lang.org/std/panic/trait.UnwindSafe.html)

That's probably how we intend to solve these kinds of problem in the future.

Re: aliasing -- if it's a serious enough problem, one of two things will
happen:

* Someone will develop a version of asan/ubsan for Rust.

* The Rust devs will be forced to reduce the extent to which they apply alias analysis by default (possibly with a flag to opt into it). At least temporarily.

The rust devs have backed off optimizations in the past when they break stuff
in the ecosystem (struct layout optimization). But they also work with the
affected devs to fix those bugs so they can turn the optimization on.

~~~
wyldfire
> Someone will develop a version of asan/ubsan for Rust.

This already happened (japaric [1]). But ASan won't save you from a bug due to
optimization-because-I-assumed-these-locations-dont-alias (maybe TSan might?).

[1] [https://users.rust-lang.org/t/howto-sanitize-your-rust-
code/...](https://users.rust-lang.org/t/howto-sanitize-your-rust-code/9378)

~~~
dbaupp
As you say, none of the existing sanitisers catch Rust-specific problems,
which is, I assume, what the parent meant by "for Rust". That said, they will
likely catch many of the consequences of such violations, just not pinpoint
the cause as precisely.

~~~
comex
A Rust-specific sanitizer has been proposed, though. See my other reply (which
I was in the middle of writing when this thread popped up, so I didn't see
it):

[https://news.ycombinator.com/item?id=14553679](https://news.ycombinator.com/item?id=14553679)

~~~
dbaupp
That isn't yet an _existing_ sanitizer. :)

It is definitely an important missing piece to bridge the gap to ASan/TSan,
but it's still just a proposal/work in progress, not least because, AIUI, the
precise rules it needs to enforce aren't yet entirely clear.

------
kibwen
So happy that Gankro is back writing things about Rust, and especially
delighted to hear that the Rustonomicon is going to be fleshed out more. :)

------
barsonme
If you—like me—were interested in Diesel ORM's zero sized types thing, here's
a pretty decent explanation:
[https://np.reddit.com/r/rust/comments/3ur9co/announcing_dies...](https://np.reddit.com/r/rust/comments/3ur9co/announcing_diesel_a_safe_extensible_orm_and_query/cxi143z/)

edit: Go also has zero-sized types (struct{}), so I wonder if this is also
possible? Probably not, I don't think, since the compiler doesn't see through
interfaces.

~~~
tatterdemalion
> Go also has zero-sized types (struct{}), so I wonder if this is also
> possible?

No. It specifically uses Rust's generics system, and the fact that generics
are monomorphized at compile time, whereas Go interfaces are not.

C++ templates can be used in similar ways.

~~~
dom0
Yes, but differently; C/C++ explicitly prohibit zero-sized types due to object
identity.

~~~
Gankro
To clarify: C says it's UB, C++ rounds up to 1.

------
bluejekyll
I have to say, these RCA's of the various bugs are great for getting a better
understanding of the internals of the language.

In a lot of ways it makes me trust Rust even more, because there is a deeper
understanding of exactly how these guarantees are made.

------
halestock
Question for the rust folks - are there any features that wouldn't have been
possible without "unsafe"? That is, if rust never had unsafe, would it have
been fundamentally limited in any way? Or is it required for e.g.
interoperability with C?

~~~
steveklabnik
I'll give you the shortest example: in order to build an operating system in
Rust for x86, you need to do this:

    
    
      let p = 0xb8000 as *mut u8;
    

VGA drivers use the memory mapped at 0xb8000 to drive the device. This creates
a pointer, p, at that address.

In order to demonstrate this is safe (okay so unsafe isn't in this example,
creating p is safe, but writing to/reading from it is not), a language would
have to know:

1\. That your code is running in kernel mode, that is the entire concept of
ring 0 vs ring 3.

2\. That the VGA spec specifies that location in memory.

Yeah, in _theory_, you could have a language that does this, but that'd tie
your language so, so, so deeply to each platform, that it's not feasible.

This can be extrapolated to all kinds of other low-level things.

~~~
jacquesm
> That your code is running in kernel mode, that is the entire concept of ring
> 0 vs ring 3.

That need not be the case though. You could have a kernel side allocator that
sets up the MMU to map that memory to a pointer that you return which lives in
the space of the process. The MMU would take care of the required arithmetic
to access the memory at its actual location using an offset.

That way you can map resources from real addresses into arbitrary addresses on
the user side.

I think the correct term for this mechanism is 'system address translation'.

~~~
steveklabnik
The language would still have to understand all of that in order to write that
kernel side allocator in safe code.

~~~
jacquesm
I don't see how that follows. The language can't possibly understand the
intricacies of what the MMU is capable of (besides, every MMU is different),
and as far as the language is concerned what is returned is simply a valid
offset and a length to go with it to indicate where the allocated segment
ends.

~~~
steveklabnik
I think you're strongly agreeing with me. It's not feasible to have in the
language.

------
mcguire
" _The bug was a missing annotation, and the result was that users of Rust 's
stdlib could compile some incorrect programs that violated memory safety._"

IIUC, technically, the bug was a missing _implementation_ of a trait and the
result was a data race (which I (weirdly, maybe) don't think of as memory
safety).

In other words, TL;DR: magic is neat, except that sometimes it really sucks.

I may have misunderstood Ralf's bug. Is it really the case that MutexGuard<T>
was seen as Sync if T was _Send_ , rather that Sync? Wouldn't that be a bigger
problem than just the case of MutexGuard?

~~~
vitalyd
> I may have misunderstood Ralf's bug. Is it really the case that
> MutexGuard<T> was seen as Sync if T was Send, rather that Sync? Wouldn't
> that be a bigger problem than just the case of MutexGuard?

So T: Sync if &T: Send. MutexGuard internally contains a &Mutex<T> (and
Poison, but that's irrelevant here). T was Cell<i32>. If you follow the rabbit
hole, you'll net out that T was Send, and therefore MutexGuard was Sync.

~~~
grogers
My confusion (and I suspect others) is about what it means for &T to be Sync.
Cell<T> isn't safe to be shared across threads (so isn't Sync) but it is Send
if T:Send. But that means &Cell<T> is Sync? You can share a reference to
something across threads but not the thing itself? What does that even mean?

You could imagine an alternate world where MutexGuard is Send, to allow
transfer of ownership of a lock to a different thread while keeping the mutex
locked. But that would mean &MutexGuard is Sync, WTF?

~~~
dbaupp
The syntax was a bit confusing: T: Sync means (&T): Send and also (&T): Sync.
T being Send or not doesn't affect the threadsafety of &T (Send is about
transferring ownership which cannot happen with a &T).

You are correct to be confused about (&Cell<i32>) possibly being Sync, because
the assumptions that were implied were wrong: when Sync talks about sharing a
T, that can be entirely thought of as transferring a &T to another thread (aka
Sending the &T). In this sense, sharing a &T between threads (as in, (&T):
Sync) is the same as transferring a &&T to another thread, but the inner &T
can be copied out so the original &T was also transferred (not just shared)
between threads; that is to say, (&T): Sync is 100% equivalent to T: Sync.

Anyway, back to the example here, Cell<i32> is not Sync, so neither is
&Cell<i32>, but M = Mutex<Cell<i32>> is Sync (this is a major reason Mutex
exists in that form: allowing threadsafe _shared_ mutation/manipulation of
types that do not automatically support it), and thus &M is Sync too. Since
MutexGuard<Cell<i32>> contains &M, it was thus automatically, incorrectly
Sync.

For your second confusion, it is okay for &MutexGuard to be Sync, if
MutexGuard itself is. The problem here was MutexGuard was Sync incorrectly in
some cases. (MutexGuard is semantically just a fancy wrapper around a &mut T,
and so should behave the same as that for traits like Send and Sync.)

------
mcguire
Wait just a minute. Ralf Jung writes,

" _This means that the compiler considers a type like MutexGuard <T> to be
Sync if all its fields are Sync._"

Is that true in general? Is a type thread safe if all its fields are thread
safe individually?

~~~
dbaupp
Send and Sync are about data races, which lead to memory unsafety, not other
forms of thread safety (like dead lock freedom, or maintaining non-unsafe
relationships between fields). If there's no unsafe code, then there's no way
to have a data race when the individual components are also data race free.

~~~
vitalyd
Somewhat tangential, but what ensures memory visibility in Rust? Say I
allocate a struct (heap or stack), and then pass an immutable reference to a
function that takes T: Sync. Assume the struct itself is Sync (e.g. bunch of
integer fields). What ensures that the other thread sees all writes to this
struct prior to the handoff?

~~~
dbaupp
It is the responsibility of cross-thread communication abstractions to use the
right fencing (if it is touting itself as safe), probably with the various
things in std::sync (especially ...::atomics) if it is pure Rust. For
instance, spawning a thread, using a channel (std::sync::mpsc) or a mutex all
do such things.

Just calling a function taking T: Sync doesn't need to do any of this, since
that call happens all on a single thread. The function might do it internally
if it needs to, but that is its own explicit implementation decision.

~~~
vitalyd
Ok, that's what I figured - thanks.

That does bring up the question, though, whether it's correct to say that a
Sync type doesn't permit data races. In the example I gave above, publishing a
Sync struct incorrectly can exhibit data race like symptoms on the receiving
thread. So even though the type itself is Sync, that's not enough of a
guarantee in the face of "unsafe" publication.

~~~
burntsushi
It is a guarantee---or else it's a bug (which is the same as every other safe
foundation). A type T gets to be `Sync` in one of two ways:

    
    
      1. It is "auto derived" when all of its constituent types are Sync.
    
      2. It is explicitly implemented using `unsafe impl Sync for T {}`. Note the use of the `unsafe` keyword.

~~~
vitalyd
Right, but my question isn't about T itself, but rather how it's published to
another thread. The example I gave is of a plain struct with no atomics or any
other synchronization types internally. A &T is auto-derived to be Sync. But,
if a publisher incorrectly publishes this reference, the other thread may see
a partially initialized value.

~~~
Manishearth
There are three ways of sharing data across threads.

One is by sharing the data with the thread when it is spawned via a closure.
Spawning will fence. No problem there.

The second is to use a good 'ol Sender/Receiver channel pair. These are
effectively a shared ring buffer that you can push to and pop from. They also
have a fence somewhere.

Finally, you can stick your data into a mutex shared between threads (and let
the other thread wait and read it). This will IIRC fence, or do something
equivalent.

You can of course build your own ways to do this, but they will need unsafe
code to be built (the three APIs above are also built with unsafe code). It is
up to you to ensure you handle the fences right when doing this.

The responsibility here is on the publishing mechanism. Most folks use one of
the three ways above using primitives from the stdlib depending on the use
case.

~~~
vitalyd
Yeah, I understand and what I expected to be the answer. My point is that when
people talk about Sync not allowing data races, there's the asterisk attached
to that statement. That footnote is that publishing code, which is completely
separate from the type itself, needs to uphold its responsibility. Unsafe code
is usually discussed in light of raw pointers and more generally raw memory
ops, but I rarely see this aspect mentioned.

~~~
burntsushi
My point above was subtle but important: the asterisk you're mentioning here
is _not specific_ to Sync. This is true of _all_ safety guarantees in Rust.
unsafe code must uphold the invariants that safe code will rely on, otherwise
it's buggy. For example, if the implementation of `Vec<T>` accidentally got
its internal `length` out-of-sync with the data on the heap, then nothing bad
in and of itself necessarily happens immediately. The bad thing only happens
the next time you try to do something with the `Vec<T>`, which will be in safe
code.

The safety of things in Rust is built on abstraction. If abstraction gets
something wrong in its `unsafe` details, then there is a bug there. In other
words, the asterisk you're mentioning is "You can trust that safe Rust is free
of memory safety, _unless there are bugs_." I feel like that's discussed and
acknowledged quite a bit.

> Unsafe code is usually discussed in light of raw pointers and more generally
> raw memory ops, but I rarely see this aspect mentioned.

I guess I don't see a difference. The safeness of Rust code depends on the
correct use of `unsafe`, and this applies to everything, not just Sync.

This idea of `unsafe` being "trusted code" is one of the first things that the
Rustonomicon covers: [https://doc.rust-lang.org/nomicon/safe-unsafe-
meaning.html](https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html)

~~~
vitalyd
> I guess I don't see the difference

At a high and general level, yeah, it's all "unsafe". But, most conversations
about unsafe don't talk about this aspect. So while what you say is true, I'm
merely pointing out that this concurrency aspect doesn't seem to be mentioned
much. And while it's implied at a high level, I think it's worth mentioning.

Basically, there's no issue - I just think this should be called out more when
concurrency is discussed.

~~~
Manishearth
You need raw pointer ops (or, well, dealing with UnsafeCell, which does raw
pointer stuff), or syscalls for these too.

Concurrency is not special here. There are all kinds of invariants unsafe code
might be required to uphold. So yeah, we could mention concurrency, but then
we could also mention UTF8, noalias, initialization, the vector length
invariant, the HashMap robin hood invariant, various BTreeMap invariants, etc
etc. "Make sure you have fences" is just another semi-specific invariant.

I disagree that "most conversations about unsafe don't talk about this
aspect", compartmentalizing unsafe invariants is a major part of these
discussions (it's like the first chapter of the nomicon, even)

~~~
vitalyd
> Concurrency is not special here I beg to differ. Concurrency comes with its
> own bag of hazards, as I mentioned in my reply to burntsushi. Comparing its
> invariants with Vec's length invariant/HashMap's RH invariant, and any other
> single threaded/internal invariants misses the point.

Unsafe discussions that I've seen rarely talk about fences - they tend to
focus on raw pointer ops, ffi, transmutes, unbound lifetimes, and in general,
are single thread focused.

~~~
Manishearth
Right, but I can make the same point about other invariants. Each comes with
its own bag of hazards. You can write pages about the robin hood invariants.

Concurrency is particularly complex, perhaps. I think one of the reasons you
don't see that much discussion of this is that in general folks in Rust don't
write that many internally-unsafe concurrent abstractions. There are a bunch
of great safe building blocks out there (stdlib ones, rayon, crossbeam) which
folks use for concurrency; it's very rare to build your own. So that might be
it.

At least with the stuff I work on like 50% of the unsafe Rust discussions have
been around thread safety and ordering and fences, but we're in that
relatively rare situation where we need to build those abstractions, so
perhaps it's just me who sees these discussions happening.

\-----

It's also probably just that discussions introducing unsafe will deal with
problems people are used to -- and memory safety is a far more "normal"
problem than thread safety.

------
lightedman
And this is why I stick with ASM - I don't have to rely upon everyone else not
screwing the pooch when it comes to them developing a language - I just talk
straight to the computer, nothing gets lost in translation, my programs are
200x smaller and 400x faster than anything written in Rust.

2D Second Life clone, with full programming capability with built-in database
- 2 megabytes. Solid ASM. Rust can't even come close, and never will.

~~~
runeks
> [..] my programs are 200x smaller and 400x faster than anything written in
> Rust.

And take 100x longer to develop :)

~~~
lightedman
Nope. Once I start typing the code simply flies.

