Oh, that's interesting. I follow rust development with interest but have only built a few toy projects in myself and was definitely under the impression that these sorts of issues resided solely in `unsafe { ... }` blocks. But, yeah, with this example it makes sense.
I'm relieved to hear that at least it doesn't pollute beyond module boundaries.
But now I'm questioning what the difference is between `unsafe {...}` code and within-module code that accesses it, if the latter also has to deal with memory safety issues? Should `unsafe` only be declared at the module level then?
The idea with unsafe{} is that, while you might temporarily break an invariant (which Rust assumes for its memory-safety) while inside the unsafe{} block, you will restore that invariant before the end of the block.
In other words, when you use unsafe{} code idiomatically, there shouldn't be anything like uninitialized-yet-accessible memory buffers "leaked out of" the unsafe{} code into safe code, even within the same module. The safe code should only see things it's possible to get from talking to safe code.
The problem, of course, is that there's no way to enforce only this idiomatic (and rather strict) usage of unsafe{}.
Well, that, and there may be some data structures that are fully impossible to implement with their promised complexity guarantees if you have to restore memory-safety invariants after every atomic step. This is justification enough for leaving unsafe{} the way it is, from my perspective; this is the same "leave it to the professionals" kind of code you tend to see in every standard library's data structure implementations, and not at all similar to the kind of code you'll ever need to write yourself (in 99% of cases.)
I guess there's a lesson there, though: no, Rust will not magically protect you any more than C will when you wade into the "thorns" of domains like scientific computing or GIS. If you're implementing data structures that "must" reveal dirty or indeterminate state to the world to achieve full efficiency, there's not much you can do to be "fully" memory-safe. You can be safe at the higher level of your module boundary, but there's really no way to achieve the same elegant lexical bound on your invariants that you can with idiomatic unsafe{} code. When you leak things out of an unsafe{} block, you're taking the type system into your own hands, and it's up to you to re-establish the invariant at all dynamic module entry- and exit-points.
No, I don't think that it should. Note that the only reason why the semantic invariant that len/cap aren't mutated is relevant to type safety is because the unsafe blocks depend on that to do the right thing.
The reason for the phenomenon described in the article is not that it's possible to do unsafe things outside unsafe blocks. It's that, in practice, unsafe blocks often depend on invariants that are only enforced by module-level privacy.
It could easily go past module boundaries. The point of the article is that the "unsafety" extends however far the invariants the unsafe code depend on extend.
I could have an unsafe function that depends on a set being non-empty. And it could share that set with, say, a factory in a different module. So someone could mess with that code and cause the invariants to be violated, and suddenly the code is broken.
Unless you have a language that is capable of expressing proofs (ie, Idris) you have to depend on good documentation and careful/tasteful design to prevent this kind of problem.
This is not a counter-example to the claim you are citing.
If you added this form of `evil` to `Vec`, then since `evil` is within the module, the proof of correctness has to cover `evil`. Doing that proof will fail. So, it is not necessary to look outside of the module to notice that something is wrong.
Of course, if `len` were public, that would be a different story. Unsafety doesn't pollute beyond module boundaries if all the extra invariants are about private fields.
I've been wondering if it would make sense to encapsulate internal fields that are part of the unsafe machinery in a type forcing you to use unsafe blocks, even inside the type impl. [edit: Removed note about UnsafeCell since it's not really like the thing described above]
I haven't really played with it though, most of my unsafe use is FFI.
You can, but IMO it's not so necessary. The module boundary thing doesn't cause many issues in practice is straightforward to reason about if you're aware of it.
(And if you weren't aware of it you would probably not use `UnsafeField` either, since the compiler can't force you to do that)
I'm more thinking about the documentation effect of such a wrapper type. Both in the data structure and in the parts of the code that are foced to use unsafe.
So, one question might be whether it's possible to encode some of these implicit invariants into the type so that the "evil" function would actually fail to compile.
Another possibility that doesn't require such heavy language machinery is something like OCaml's private types, where the scope of unrestricted access is unsafe blocks rather than the body of a module. In this approach there is no type-level correspondence between the value of the length member and the extent of the array: private types only ensure that any potentially invariant-breaking mutation is restricted to a certain scope, not that the code is correct.
That's quite possible, and most proof of correctness systems can do it, but there's no syntax in Rust for talking about the necessary object invariants.
Object invariants are assertions which must be true whenever control is not within the object. The object invariants must be true when control leaves the object by exiting any public function. Nothing outside the object must be able to alter the variables involved in the object invariant. Only one execution thread can be in the object at a time.
If you can demonstrate all of those conditions, assuming the invariant is true at entry to the object, you have proved the object invariant.
This is a useful and well-understood concept in the proof world, but isn't seen much in programming languages.
Since Rust supports concurrency, it is worth mentioning that these techniques have been scaled to handle objects that can be accessed from multiple threads at once - while maintaining the modularity of the approach. See, for example, <http://plv.mpi-sws.org/iris/> and <http://software.imdea.org/fcsl/>.
Does it makes sense to think if it takes a mutable instance of the "unsafe" structure, then whatever it does to it could be "usafe".
So it seems to fix it by hand, one could wrap all the calls which touch instances of this Vec with safe validation code -- basically guard code that runs at runtime to prevent corruptions from spreading. Can it be easily done during compile time?
&mut vs. & isn't fundamentally about mutability: having unique access (as in &mut) does imply that mutation will be "safe" (as in, act of mutating won't cause problems, although it may violate invariants in whatever is being mutated), but types with internal mutability allow safe mutation even when shared (as in &), examples are Cell, RefCell, Mutex and the atomics.
Fields wrapped in these types can have invariants that can be violated, e.g.
> So it seems to fix it by hand, one could wrap all the calls which touch instances of this Vec with safe validation code -- basically guard code that runs at runtime to prevent corruptions from spreading
Yes, but unfortunately one can't always detect the corruption easily/cheaply, e.g. there's no guarantee of a detectable runtime difference between push_good and push_bad without having to introduce overhead as there's no way to look at an arbitrary memory location and tell if it has been initialised or not:
/// Push the value x onto the end of the vector
fn push_good(&mut self, x: T) {
// need space for one more element
self.reserve(1);
unsafe {
// move x into the space we just reserved
ptr::write(self.ptr.offset(self.len), x);
}
// register that there's a new element
self.len += 1;
}
// just like `evil`
fn push_bad(&mut self, x: T) {
self.reserve(1);
// don't bother writing `x`, too slow!
self.len += 1;
}
Also, there are various opt-in modes that will try to detect problems in this vein automatically (opt-in because they have non-trivial overhead). For example, debug_assertions mode built into rustc (enables integer overflow checking, and any `debug_assert!`s in libraries), and sanitisers/checkers designed for C/C++ such as valgrind (works with Rust), TSan and ASan (don't yet work).
That's called "Design By Contract". You formally specify pre/post-conditions of functions and invariants of classes (such as "cap >= len" in this case), and have the compiler enforce that automatically, e.g. by inserting runtime checks.
I don't think this is very surprising; if you do something unsafe hapharzardly within an unsafe block you can put the program in a state a safe program would not be able to reach, so further computation is in an unsafe state.
In fact isn't this why we have unsafe in the first place; so that such code can be more easily verified? I disagree with the notion the author suggests that the problem isn't in the unsafe code.
The thing is that it's not always possible to scope unsafety to unsafe blocks. The example of vec in the post above isn't about haphazard unsafe code, it's about code within Vec's module boundary being able to mess up invariants without needing to use unsafe.
This could be fixed with things like "unsafe fields" and "unsafe types", but IMO that's not really necessary.
To reach a conclusion here, we will have to agree on what it means to be the "cause of the problem". It is true that if a program crashes, then there has to be some unsafe code in that program. This statement is however not very helpful, because pretty much all programs use libstd which contains unsafe code.
So, I'd define the "cause" of a bug to be the piece of code that you fix to get rid of the bug. If `evil` was in libstd, the fix would be to remove `evil` (or somehow make it take a default value, grow the `Vec` to be large enough and fill the two new slots with that default value). In that sense, the problem here clearly was in the safe code `evil`, since it's that code which had to be changed to get rid of the problem.
The guarantee I prefer is "If there's a segfault and you didn't write `unsafe`, it's not your fault". This is not necessarily referring to a single change, but rather the application as a whole. Dependencies like libstd are ignored here .
As soon as there's any `unsafe` in your code, you've unlocked hard mode and everything is probably your fault.
I tried suggesting this directly on GitHub. Here was the response:
> There's an old idea of unsafe fields that would come up for datatypes. It can be emulated with a Unsafe<T> wrapper that has unsafe methods to get references to its interior, or to read/write. It's not widely used, or used at all afaik, since the gains aren't that great. People writing unsafe code generally know the issues, and that privacy is the #1 way the abstraction boundary is maintained.
How exactly is this notion of “semantic types” formalized? Is it something like NuPRL, where the base language's type system isn't sufficiently precise to rule out all misbehaviors, and on top of that a more precise type system is added, whose typing rules might be defined in terms of the base language's evaluation/execution rules?
The scope of unsafe ends at the next abstraction boundary. This means that
everything outside of the std::vec module does not have to worry about Vec.
...
Of course, this also means that everything inside std::vec is potentially
dangerous and needs to be proven to respect the semantics of Vec.
So wait, in a nutshell, a module is 'safe' if and only if you formally prove that every public interface into the module and every way that public interface can be invoked... is error free.
How is that any improvement on any other language?
If the burden of 'safety' is formal proof of the entire module, then you're (surely) no better off than using C++ and doing exactly the same thing.
I mean, obviously the borrow checker can help to some extent, but what you're basically saying is that it's not enough; you can't trust the borrow checker for safety; you must formally verify a module in order to know it's safe, if it contains any unsafe code.
In other words, if your rust program has any module in any dependency that has unsafe code (ie. every rust program), it is potentially unsafe, regardless of the borrow checker (because some code path may invoke a 'safe' function that has not be formally verified to be safe, and results in undefined behaviour despite being safe).
I'm not really sure what the problem is here: having to formally verify a module doesn't seem fundamentally different to formally verify a single function. I mean, sure, it's some more code, but there's still well-defined containment. And, being "allowed" to reason about a whole module (well, usually one just cares about a whole type, but these often match, especially for unsafe code) seems far more useful: one can build far more interesting abstractions. If one was forced to reason about a single function at a time, Vec couldn't exist in a useful way, as every function would have to assume the incoming Vec value could be arbitrarily invalid.
In any case, there are many many modules with no `unsafe` code, e.g. std::option has none [1], iron::response has none [2], image::jpeg has none [3] (just some random examples). These are automatically safe, if you assume that that the code they call is safe, which seems to me like the only assumption that can make sense: if the safe modules call unsafe ones that aren't safe... well, the problem is in the unsafe ones, not the safe ones.
What I'm trying to say is this paragraph doesn't seem to change anything fundamental about the objections to safe vs. unsafe you sometimes write about; it is not introducing anything new, and so the refutations you often receive still apply.
Well, given the most common refute of my concerns is 'you just have to make sure your unsafe blocks are verified to be correct', and the point of this article, is explicitly that this isnt the case...
Now its that you have to just make sure all your code is correct, not just the unsafe blocks.
/me shrugs
I think thats lame, and breaks the promises rust is trying to make about writing safe secure software.
If safe code may not be safe, whats the point of it at all.
> Now its that you have to just make sure all your code is
> correct, not just the unsafe blocks.
You appear to be misunderstanding the article, which likely explains all the downvotes.
Rust keeps you from doing memory-unsafe operations unless you use an `unsafe` block somewhere. Once you do, then it is your duty to ensure that those `unsafe` blocks don't cause memory unsafety, regardless of what the rest of your program does. Following from that, if the correctness of your `unsafe` block relies on invariants in your data, then it is your duty to keep your data private so that those invariants cannot be broken by consumers (or alternatively, provide safe accessor functions that allow consumers to fiddle with things while performing runtime checks to ensure invariants are upheld).
No, it doesn't break the promises in any way. If all your code does potentially dangerous memory manipulation, then there's very little hope that there will ever be an automatic checker for the safety of your program.
The promise of Rust rests on the assumption that most code is not like that. There are a few fundamental data structures that do crazy stuff, and okay, we have to use `unsafe` there - and if we want to have any formal guarantees, we have to do a formal proof. If are happy with less, we just manually audit that data structure extra carefully. We also put every such data structure in its own module, which carefully limits the amount of code that is exposed to this unsafety.
However, most of the code will just use those data structures, and that's where Rust gives you a safety guarantee. Combing `Vec` and `HashMap` and all the other types from the standard library in any way you want, hammer them from multiple threads, send stuff across channels, whatever - you can do all of that safely, and if your code crashes, you don't have to look at all this code because you did not write `unsafe`, only the standard library did - and you are in a separate module.
> Now its that you have to just make sure all your code is correct, not just the unsafe blocks.
This is not correct. As I wrote in the post, you have to check all code within a module that contains `unsafe`. Most of the modules people write do not contain `unsafe`, and hence are not checked.
I'd be curious to learn why you think this is the case.
The point I'm making is that, and this surprised me, the onus on soneone writing a module with unsafe code, to ensure it does not violate memory safety is much higher than I realized.
It is not 'all rust' you must verify; just all rust that touches unsafe code.
You might argue this is nothing new, and indeed if you read this comment thread you'll see the suggestion that the difference between verifying an entire module and verifying only the unsafe blocks in it is insignificant.
My point is that there is at least an order of magnitude more safe code than unsafe code in an unsafe module, and I suspect it is not scrutinized nearly as much as the unsafe code.
There are a lot of crates that use unsafe code in one form or another; and it troubles me (although no one else cares) that they are probably more dangerous than I realized.
'Just verify the unsafe blocks' is something I've literally heard people say.
> There are a lot of crates that use unsafe code in one form or another; and it troubles me (although no one else cares) that they are probably more dangerous than I realized.
Be careful here, module != crate. Most crates consist of many modules. Unsafe leaks into the module, not into the entire crate.
> 'Just verify the unsafe blocks' is something I've literally heard people say.
Well, so did I - that's why I wrote the post.
> My point is that there is at least an order of magnitude more safe code than unsafe code in an unsafe module, and I suspect it is not scrutinized nearly as much as the unsafe code.
I think many developers are aware that the safe code inside an "unsafe module" needs just as much scrutiny. The Rustonomicon documents this, now my post does, too.
If you used to assume that only literally those blocks need verification - yes, you underestimated the effort. I think it is still manageable though, modules (like Java classes) are kind of a unit of programming that people can get in their head "as a whole". This is important not just for the safe/unsafe discussion, this is important because modules form the privacy boundary in Rust. Even when you only do safe stuff, you care about the abstraction boundary because you want to hide implementation details behind it.
A lot of people care: e.g. the Rustonomicon[1] spends a lot of time caring about unsafe code (that is one of the main reasons for it to exist), and the massive RustBelt project[2], which Ralf (author of the blog post) is part of, is because people care. Even the lowest level developers (working on OSes) care about being very careful about how they use `unsafe`, e.g. [3]. One thing that some of us are hopeful will come out of RustBelt are more advanced `unsafe` checkers (i.e. some sort of proof assistant, possibly with lints built into the compiler).
In any case, talking about "order of magnitude" misses the point somewhat: there might be a large amount more code, but a lot of it is trivial to verify, e.g. the declaration of a struct doesn't need any verification, nor do comments, and if T, the type with invariants, has no internal mutability (which is fairly typical for unsafe code outside std, IME), then any safe function that takes &T is automatically safe.
As people keep saying, one only has to verify functions that can break invariants, e.g. if you define a type that has invariants needed by `unsafe` blocks, then one has to take a close look at any functions that can modify its internals (i.e. things inside the module the type is defined).
Functions that can't break these invariants because they don't have access to the internals of a type do not need to be checked: they can be assumed safe.
> Well, given the most common refute of my concerns is 'you just have to make sure your unsafe blocks are verified to be correct', and the point of this article, is explicitly that this isnt the case...
And, as my comment says twice, my point is just that the two cases are not very different: verifying all unsafe blocks and verifying the modules that contain `unsafe` aren't that different.
One of the fundamental differences is that in Rust, you only have to prove correct every module that contains unsafe code, and the proof then automatically expands to your entire program. The experience in practice is that most module - the ones that are "higher up" in the stack of abstraction - do not need unsafe code, and hence they are automatically safe.
If someone proves the Rust libstd correct, and you write a Rust program using only libstd that does not contain unsafe, then your code is guaranteed to be safe.
If someone proves the C++ libstd correct, and you write a program using only the C++ libstd - you still have to prove your code correct. Rust provides safe encapsulation, C++ does not.
It seems like you're assuming the post is saying that you need formal verification to write safe Rust? I don't think that's the case.
Note that the blog post is explicitly in the context of formally verifying Rust and its stdlib. This is not something that's necessary for a "good enough" guarantee that stuff works and is safe. The conclusion you can make is that formally verifying Rust is just as hard as C++ (I'm not sure I agree with this either, but it's really irrelevant), but this doesn't make Rust a worthless language since formal verification is not what you care about for everyday safety guarantees.
> If the burden of 'safety' is formal proof of the entire module, then you're (surely) no better off than using C++ and doing exactly the same thing.
Well, formal verification is nice, but it's not necessary.
Take the Vec<T> interface for example. There are a couple of unsafe methods in it, and the invariant "len is always a valid length". It's easy to check if len is being modified elsewhere. It's easy to reason about these invariants. And it's easy to reason that it upholds the invariants that Rust expects for safe functions. Formal verification is harder, but practical verification is easier since it's contained in a small surface area.
On the other hand, there's no "containment of unsafety" in C++ and other languages and even practical verification can be hard especially when generics get in the way.
> In other words, if your rust program has any module in any dependency that has unsafe code (ie. every rust program), it is potentially unsafe, regardless of the borrow checker (because some code path may invoke a 'safe' function that has not be formally verified to be safe, and results in undefined behaviour despite being safe).
"Potentially unsafe" for a very small probability. You don't need formal verification to be convinced that something is safe.
It's not practical to formally verify everything. We'd get nothing done if that was the case.
Rust's memory safety promises can be reasoned about clearly without formal verification (which is an arduous process requiring a team of researchers -- basically what RustBelt is doing). This is enough to be convinced that Rust is protecting you from memory corruption. Formal correctness proofs are valuable, but not necessary to ship software.
Also, as others have mentioned, Rust is a strict plus over C++ in formal verification as well since once you've verified a module containing unsafe code, you can forget that it contains unsafe code and use it wherever, and modules with safe code can be used without verification. Once verified, Vec<T> is safe to use however you want as long as it's not used in an unsafe block (if it is, just verify that block or module depending on the context).
> How is that any improvement on any other language?
Rust provides more facilities for exposing efficient, safe interfaces to unsafe code than do many other languages. The existence of safe and unsafe are not particularly interesting by themselves.
You don't have to formally verify the entire module. The module "just" has to be correct :P.
Additionally, very often, unsafe code is used in ways that cannot contaminate the entire module (that's the ideal, in fact), and the module that the unsafe code is used in is made deliberately small.
> How is that any improvement on any other language?
In Rust, outside of the std::vec module, unless you type "unsafe" there is no way to corrupt a vector that you have. In C++, that's quite easy: undefined behavior can corrupt anything, private fields or not.
I'm relieved to hear that at least it doesn't pollute beyond module boundaries.
But now I'm questioning what the difference is between `unsafe {...}` code and within-module code that accesses it, if the latter also has to deal with memory safety issues? Should `unsafe` only be declared at the module level then?