If anyone wants to take up the mantle and clean it up, it would be greatly appreciated!
If not, I'm completely a loss to understand why contractual obligations would prevent you from writing something.
Unsafe pure Rust code reflects a lack of expressive power. How raw memory becomes a typed object is a touchy area for arrays. To do collections safely, you have to be able to talk about an array which is partially valid. This is quite possible, but you need formal verification like constructs to do it. You need to be able to express that a slice of an array is valid, even though the whole array isn't.
If you can talk about a slice being valid, and an element being valid, you're almost there. It's a theorem that if an element is adjacent to the end of a slice, the element has some property, and all elements in the slice have a property, then a slice containing the original slice and the element has that property. Using that, you can prove by induction that as you grow a collection, the elements in use remain valid. If the access primitives don't let you access outside the valid range, they're safe.
With some extra predicates and a modest theorem prover, much unsafe code in the collection area could probably be proven safe.
Rather than describing unsafe code as a "dark art", it would be more useful to try to formalize safety in this way. The theory is well understood, and there are many proof of correctness systems around for other languages. It might require a lot of annotation in unsafe code, to help the prover along, but that's not a bad thing.
On the other hand, collections are embarassingly testable. It's never been clear to me that formal methods gain much over thorough fuzzing/unit testing in this context.
(rust's testing falls a fair bit short of thorough unfortunately and there have been a handful of mem safety errors in their impls. although I'm not aware of this ever leading to an exploit in an application. mostly caught quite early with the release trains as a buffer)
Yes, you can asymptotically approach correct with testing, but it turns out that surprises can lurk for quite a long time if that's your approach. Of course, that's how most collections code works and what most of the world is built on, so yes, it is generally "good enough", but it is still worth pointing out that it is not necessarily correct, even after being out in the field for decades.
> I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug.
Rust does a great job of error checking in the 99% case and the 1% of unsafe should be from well vetted libs. No different than trusting the kernel to do the right thing.
Formal proofs: there are specific applications (see Amazon's use of TLA+). I personally feel nearly every company has a small piece of code that you can justify spending a few weeks/months modeling to ensure its correctness. Also, engineers working on medical devices, avionics, robotics, finance, etc. probably get to prove the correctness of their code.
Some might, but the relevant ISO standards are ok with testing if you reach sufficient coverage, so generally testing is all that's happening.
However, the average writer of unsafe code will not use formal verification. For them, the nomicon is a very useful resource. And describing it as a dark art is the best way to do it, because unsafe code can be hard and fraught with pitfalls.
Look at all of the programming languages that have had to reimplement their hash tables because someone figured out that you can DDOS a web server by sending requests with just the right query parameters. If there's a bug in hash resizing and it's unsafe code, what could someone make of that?
I examined every MS security bulletin over a 2 year period. Nearly all the serious ones were regular C/C++ code memory corruption issues. From what I could tell, Rust would have prevented every one.
MS's .NET had a few serious bugs. They were basically all class loader logic bugs (XBAPs getting more permissions, ways to bypass the CAS system). It's similar to the Java plugin issues: Java applets themselves were never exploitable. But the plugin might accidentally let you do things you're not supposed to. The most user-impacting was stuff in ASP.NET, which is more like an app framework, not a runtime. (There were some "load up arbitrary code/read arbitrary file" bugs IIRC, caused by bad crypto and logic.)
I actually found probably the first memory safety CLR bug. If you loaded a function pointer of an un-jitted function, you could end up with a safe/verified pointer (delegate) to unmanaged code. So there's a true issue, where the implementation is letting you escape and do unsafe things while saying it's verified. But, there's basically no way for this to turn into an app-level security issue. The only use was to escape the .NET code-level sandboxing (CAS).
1: Well to my knowledge -- none were published before. But neither was this one so who knows how many there actually were.
Somewhat gibly, every solution is a 90/10 solution (with different exact numbers of course): there has to be some assumptions/assertions buried in the system somewhere. For instance, correctness of the runtime/built-in types in "safe" languages like Python or Java or Haskell, behaviour of the operating system when doing syscalls (or behaviour of specific machine instructions), or even bug-free-ness of a theorem prover used. Obviously different classes have different rates of errors, but it is still very useful to reduce and focus the places where incorrect assumptions can lead to "critical failures" (memory safety ones) even if one doesn't/can't sink the effort/money into reducing it to (essentially) zero. All of these systems are a tradeoff in some sense, between guaranteed-correctness and things like performance, "productivity", and cost of development.
Rust's power is its ability to significantly narrow the places where memory unsafety is a risk without imposing a cost ("zero cost abstractions"), but still giving all the convention control and power needed for systems programming. There is of course the risk of those places having bugs, but they're explicitly marked and exhaustively testing/auditing the 100 lines of code that build safe abstraction is easier than a whole 1000 or 10000 application that uses it.
> that you can DDOS a web server by sending requests with just the right query parameters
The HashDOS problem is not a memory safety one. In some sense, it isn't even a bug in the implementation but rather the design. The problem arises from using a poor/predictible hash function that allows an attacker to construct many values with the same hash, even if the hash table and the hash function itself are implemented 100% to spec. It is... difficult for a programming language to protect against spec bugs, especially because what is a bug/not helpful sometimes might be desirable at other times.
(Incidentally, Rust's HashMap actually defends against this by default, using SipHash with random keys, which is why it lags behind, say, C++ in some benchmarks that use the default data structures.)
I found a trivially exploitable buffer overrun in the Source game engine. The cause? Somebody used strcpy instead of strncpy... when adding an animated ellipsis at the end of a string.
You really have to try in Rust to get pwned adding an ellipsis to a string.
(And yes, tools should (and probably would) have caught that bug. But they either weren't used, or didn't catch it, so...)
How many production server side apps do you know of that have been compromised due to memory safety problems in the core hash table data structure?
Agreed, however I think the latter part of the intro is worthwhile to anyone who wants a deeper understanding of Rust.
I've always found that a in depth understanding of what you use pays off in unforeseen ways.
Though I do have another project that exports a C API, and that requires unsafe, though it's literally only for wrapping/unwrapping pointers.
It's a WIP, but mostly feature complete at this point. I'm working on DNSCrypt at the moment.
[ed: just found src/config/test - looks more like a "better bind" than "djbdns in rust"]
My insiration started with yet another Bind exploit, and then grew from there. Unlike djbdns, I want this really to have as little manual operation as possible. After DNSCrypt I'm going to look at implementing a version of Raft on top of DNS updates/notifies, etc, to make server resiliency even better. DNSCrypt is next, mostly so thy I can use it for private comms between DNS nodes.
djb's is amazing, but what I'm looking for is to build something more flexible and more feature rich.
The code is well documented ;)
Operational docs will come, but I haven't really had an opportunity to focus on those yet.
Additionally, if nobody writes unsafe Rust, who's going to maintain the abstractions that the rest of us rely on? :)
Specially for basic data structures like trees, graphs and double linked lists.
These data structures are write once. You write them once, verify them, and use them as much as you want. The doubly linked list is in the stdlib. A tree without parent pointers is possible with zero cost in safe rust. A tree with parent pointers IIRC exists in a crate somewhere. So you will rarely be implementing these, just using these with safe code.
While we use datastructures to teach the "basics" of languages like C and C++, that doesn't mean that they are basic and any language where they are hard is crap. Rust makes them hard so that the rest of your code is easy to write. And since these are write-once, they have a minimal to zero impact on the amount of unsafe code in your codebase.
Keep in mind basically all of Rust's libcollections uses unsafe code for perf and low-level representation reasons. They can all be made safe using the strategies every other safe language uses (and eating the associated downsides).
Also keep in mind Rust's data structures are basically language primitives in a lot of other languages!
So unless proven otherwise by the profiler, no need to make use of unsafe code.
Ada requires explicit use of unsafe code for memory deallocation, but it is not required when using RAII pools.
So unless a custom allocator is required, the standard pools can be used.
In Rust even a basic double linked list requires unsafe.
Then there are those that make use of unsafe to try to express code that cannot be proven by the type system, instead of plain memory corruption issues.
Thankfully the core team is against this trend.
> So unless proven otherwise by the profiler, no need to make use of unsafe code.
> Ada requires explicit use of unsafe code for memory deallocation, but it is not required when using RAII pools.
So, if you want to write hash tables like this without unsafe code, you're in luck: Rust lets you do it!
(As an aside, I keep seeing these fantastical claims about Ada that, when you dig into them, come with the exact same or more restrictive semantics than Rust. Ada is a good language, but it isn't magical, and the region/lifetime/uniqueness systems in Rust actually solve real problems that Ada doesn't have a solution to.)
Ive been looking forward to seeing such a point-by-point comparison of safety features between Rust, Ada, and SPARK. I gave you or someone else involved in Rust the Safe and Secure Ada 2012 book listing those features to help. Just link to it in a reply to me if and when you publish such analyses.
Btw, even if I agree on restrictive semantics, the thing you didnt mention is results where Ada and esp SPARK are immune by default to so many error classes. Rust definitely had it beat on dynamic safety and probably concurrency. However, unsafe programming in SPARK is still quite safer than Rust. :)
Well this just isn't true. You can write a doubly linked list using Rc and Weak pointers and no unsafe code. This is the same as saying you can write a doubly linked list using GC'd pointers.
In particular, removing a node from a doubly-linked list has three steps: repoint prev->next at next, repoint next->prev at prev, deallocate. If you can write this in safe code, it's equally possible to write a procedure that fixes prev->next and deallocates, but leaves next->prev dangling. What do Ada and Modula-3 do to prevent this?
(If you're willing to sacrifice efficiency, you can always use reference-counting or actual GC. Or you can just stuff each node in a dynamically-allocated vector, and have the prev and next pointers be indexes into that vector, instead; you'll have semantically-dangling numbers, but you won't have memory unsafety. Both of these approaches can be done in safe Rust with the normal standard library.)
There's also formal verification work on linked lists. If any have simple VC's, they could be encoded in containers like above as well. Tools exist to do similar stuff for C, too.