
The Rustonomicon: The Dark Arts of Advanced and Unsafe Rust Programming - jvns
https://doc.rust-lang.org/nomicon/README.html
======
Gankro
Hey there, I wrote pretty much all of this. Sadly I wasn't ever able to finish
it, and won't ever be able to due to contractual obligations. It has since
languished due to lack of maintainership, even though it's part of the
official rust docs.

If anyone wants to take up the mantle and clean it up, it would be greatly
appreciated!

~~~
binarycrusader
Are you writing a book instead? If so, looking forward to it.

If not, I'm completely a loss to understand why contractual obligations would
prevent you from writing something.

~~~
loeg
Perhaps the contractual obligations tie up all of his or her time.

~~~
Gankro
These answers basically cover it.

~~~
coolsunglasses
I, too, fear the wrath of Lord Cthulhu.

------
Animats
_" However BTreeMap is implemented using a modest spoonful of Unsafe Rust
(most collections are)."_

Unsafe pure Rust code reflects a lack of expressive power. How raw memory
becomes a typed object is a touchy area for arrays. To do collections safely,
you have to be able to talk about an array which is partially valid. This is
quite possible, but you need formal verification like constructs to do it. You
need to be able to express that a slice of an array is valid, even though the
whole array isn't.

If you can talk about a slice being valid, and an element being valid, you're
almost there. It's a theorem that if an element is adjacent to the end of a
slice, the element has some property, and all elements in the slice have a
property, then a slice containing the original slice and the element has that
property. Using that, you can prove by induction that as you grow a
collection, the elements in use remain valid. If the access primitives don't
let you access outside the valid range, they're safe.

With some extra predicates and a modest theorem prover, much unsafe code in
the collection area could probably be proven safe.

Rather than describing unsafe code as a "dark art", it would be more useful to
try to formalize safety in this way. The theory is well understood, and there
are many proof of correctness systems around for other languages. It might
require a lot of annotation in unsafe code, to help the prover along, but
that's not a bad thing.

~~~
hinkley
Ouch. If data structures require unsafe operations, doesn't this make Rust's
safety a 90/10 solution, and leave a small but credible threat of what are
effectively buffer overflow attacks?

Look at all of the programming languages that have had to reimplement their
hash tables because someone figured out that you can DDOS a web server by
sending requests with just the right query parameters. If there's a bug in
hash resizing and it's unsafe code, what could someone make of that?

~~~
dbaupp
_> If data structures require unsafe operations, doesn't this make Rust's
safety a 90/10 solution, and leave a small but credible threat of what are
effectively buffer overflow attacks?_

Somewhat gibly, every solution is a 90/10 solution (with different exact
numbers of course): there has to be some assumptions/assertions buried in the
system somewhere. For instance, correctness of the runtime/built-in types in
"safe" languages like Python or Java or Haskell, behaviour of the operating
system when doing syscalls (or behaviour of specific machine instructions), or
even bug-free-ness of a theorem prover used. Obviously different classes have
different rates of errors, but it is still very useful to reduce and focus the
places where incorrect assumptions can lead to "critical failures" (memory
safety ones) even if one doesn't/can't sink the effort/money into reducing it
to (essentially) zero. All of these systems are a tradeoff in some sense,
between guaranteed-correctness and things like performance, "productivity",
and cost of development.

Rust's power is its ability to significantly narrow the places where memory
unsafety is a risk without imposing a cost ("zero cost abstractions"), but
still giving all the convention control and power needed for systems
programming. There is of course the risk of those places having bugs, but
they're explicitly marked and exhaustively testing/auditing the 100 lines of
code that build safe abstraction is easier than a whole 1000 or 10000
application that uses it.

 _> that you can DDOS a web server by sending requests with just the right
query parameters_

The HashDOS problem is not a memory safety one. In some sense, it isn't even a
bug in the implementation but rather the design. The problem arises from using
a poor/predictible hash function that allows an attacker to construct many
values with the same hash, even if the hash table and the hash function itself
are implemented 100% to spec. It is... difficult for a programming language to
protect against spec bugs, especially because what is a bug/not helpful
sometimes might be desirable at other times.

(Incidentally, Rust's HashMap actually defends against this by default, using
SipHash with random keys, which is why it lags behind, say, C++ in some
benchmarks that use the default data structures.)

~~~
XMPPwocky
For example:

I found a trivially exploitable buffer overrun in the Source game engine. The
cause? Somebody used strcpy instead of strncpy... when adding an animated
ellipsis at the end of a string.

You really have to _try_ in Rust to get pwned adding an ellipsis to a string.

(And yes, tools should (and probably would) have caught that bug. But they
either weren't used, or didn't catch it, so...)

------
rattray
I really hope nobody mistakes the sexy title of this book as a reason to learn
unsafe rust. If it can be done without this advanced feature, it probably
should.

~~~
pjmlp
Looking at Rust code with the eyes of a Wirth and Xerox Parc language fan, I
still get the feeling that Rust requires too much uses of unsafe code versus
Ada, Modula-3 and similar.

Specially for basic data structures like trees, graphs and double linked
lists.

~~~
geofft
How do Ada and Modula-3 let you implement doubly-linked lists without unsafe
code? I'd guess they don't enforce the constraint Rust does that mutable
references must be unique, but my experience with both C and C++ is that you
want this rule anyway for your code to have any chance of being correct.

In particular, removing a node from a doubly-linked list has three steps:
repoint prev->next at next, repoint next->prev at prev, deallocate. If you can
write this in safe code, it's equally possible to write a procedure that fixes
prev->next and deallocates, but leaves next->prev dangling. What do Ada and
Modula-3 do to prevent this?

(If you're willing to sacrifice efficiency, you can always use reference-
counting or actual GC. Or you can just stuff each node in a dynamically-
allocated vector, and have the prev and next pointers be indexes into that
vector, instead; you'll have semantically-dangling numbers, but you won't have
memory unsafety. Both of these approaches can be done in safe Rust with the
normal standard library.)

~~~
moosingin3space
If I'm not mistaken, Modula-3 is garbage-collected, so doubly-linked lists are
implemented with GCed pointers like in Go.

------
arc0re
I'm a Lovecraft fan, never really touched Rust, and a name like "Rustonomicon"
could really get me into it, hehe.

