Hacker News new | past | comments | ask | show | jobs | submit login
Safety and Soundness in Rust (jacko.io)
90 points by oconnor663 on March 5, 2023 | hide | past | favorite | 90 comments



Oh, if only that worked in practice.

There's far too much use of "unsafe" in Rust crates. Sometimes, for really good reasons. Too often, because someone can't figure out a safe way to do something.

Here's a thread I started on Reddit on this subject: "We're not really game yet".[1]

Someone commented there: "There are people here who would volunteer to help you out, dig up the internals and address those issues, given you provide them with sufficient information."

My reply:

I've done that three times now:

* jpeg2000-decoder -- JPEG 2000 decoder test fixture. This exercises jpeg2k->jpeg2000-sys->OpenJPEG. The first two are in Rust, the last one is in C, and valgrind shows it referencing un-initialized memory. It randomly segfaults. OpenJPEG has a long history of doing this, and has been the subject of several CERT security advisories. The author of jpeg2k has managed to contain the the problem by running OpenJPEG in a WASM sandbox. This keeps the program from crashing, but there is a 2.6x performance penalty. A bug report has been submitted to the OpenJPEG maintainers, who are funded by universities and companies but over 200 issues behind.

* ui-mock -- game GUI test fixture This exercises rfd->egui->rend3->wgpu. It's a game GUI with menus and dialogs, but no game behind it, just a 3D drawing of a cube. It's useful for making bugs in that stack repeatable. That's been helpful in wringing out obscure bugs in egui.

* render-bench -- scene update performance test fixture. This exercise rend3->wgpu->vulkan. It draws a city of identical buildings, then, from a second thread, periodically deletes half of them and re-creates them. If the stack is performing as intended, the updates from the second thread should not impact the frame rate from the main thread. But due to lock problems at the WGPU level, the frame time goes from 16ms to 700ms when the update happens.

(Those are all projects on my Github [2], by the way, if anyone wants them.)

Each time I have to do one of those bug-reproduction test fixture projects, it costs me substantial time not spent on the main project.

Right now, I'm totally stopped by a race condition crash bug not in the list above, one for which I don't have a standalone project which can duplicate the bug. They're trying to fix it, but for now I'm stuck. I may have to build a custom test project. But it will be tough, because it's a timing-dependent bug that only appears under load.

This is why no one has successfully done a major game on Rust. The foundations are too weak to support it. The right stuff is there, but much of it is stuck at "sort of works". It's a real question whether game development in Rust will ever have tools that Just Work, or whether parts of the ecosystem will go directly from "sort of works" to "abandoned" without ever reaching Just Works.

Safe Rust really works. My own code is 100% safe Rust, 36,000 lines of it. No obscure crashes in my own code. When I've needed gdb or valgrind, it's always been due to a problem in someone's unsafe Rust or C code.

There are people who think they're so good they can write unsafe code.

They're not.

[1] https://www.reddit.com/r/rust_gamedev/comments/11b0brr/were_...

[2] https://github.com/John-Nagle


Your story points out the unfortunate corollary of the OP's point. He states that if there is unsafe code that is called, it's never the caller's fault. Conversely, it means the caller can't do anything about it, and must wait for an upstream fix.


I don't want to trivialize anything but I most of my time in gamedev was dealing with similar either engine uplevel-integration hell(the team I was on still has scars from having to rebaseline UE3 on a 3 month cadence which took... 3 months every time we did it so it was back-to-back uplevels) or dealing with these sorts of things even in established engines.

Yes Unity does better but there's a ton of momentum with established commercial engines that have tens of millions of dollars invested in them. It was only recently that Godot has started making some inroads on the indie side of the engine space.


> This is why no one has successfully done a major game on Rust. The foundations are too weak to support it.

The Finals open beta is happening on the 6th. Check it out https://www.reachthefinals.com/

Here is a blog post about their usage of Rust: https://medium.com/embarkstudios/inside-rust-at-embark-b82c0...


This game uses Unreal engine so it's most likely all in C++ or most of it.

We don't really know about the state or what game is built in Rust a Embark, but 2 majors games they work on are in C++.


A member of the team explained that they have a game called Creative Playground (beta testing is open now) where 90% of it is written in Rust, where they still rely on physx and MoltenVK in C++.

So it seems like their other games then are more like 10% rust and 90% c++.

But I’m assuming that the Creative Playground is like a testing ground for using Rust in larger capacities down the road then


While the original commenter starts his rant by stating he can't get a safe Rust wrapper for a memory-corrupting C library, and blames that on "far too much use of "unsafe" in Rust crates". Same thing.

Consider this: maybe now the game's servers won't crash easily, even if the graphical parts might.


> This is why no one has successfully done a major game on Rust. The foundations are too weak to support it.

That's patently false. https://veloren.net/

People have made games in it. And in Zig, and C, and C++ and so. And assembly. And machine code. Unsafety isn't a problem it's lack of libs and integration.

If you refer to AAA the issue is lack of engines where Rust is a must. I.e. network effect, you need engines in Rust to make games in Rust. And to make engines in Rust, more games need to require Rust.


Veloren is not a major game, it's not even considered a successful indy game.


Not sure what you consider successful, but it's still in pre-alpha and has had over 30k downloads in the last week.


If major game means popularity/financial success (and not complexity), the argument is even weaker. Success boils down to luck and number of games for language X.

You could go back in time and say - well no one ever made a major game in C/C++/Java, etc. And you would be right at that moment, but wrong in general.


C and UNIX were created so the authors could port Space Travel to new systems.

So that point in time is very narrow


Yeah, but original point was major game is like a AAA or "successful".

If Veloren isn't successful, neither is Space Travel.


Space Travel was more successful in the sense that 90% of the "gamers" of that area played it.

Barely 1% of today's gamers even knows what Veloren is.

:)


I might have misunderstood what you've written, but is there a reason you can't use https://github.com/etemesi254/zune-image as a JPEG decoder? It has minimal use of unsafe and is performant.


That's a classic JPEG decoder, not a JPEG 2000 decoder. They're completely different compression systems. JPEG is discrete cosine transform, while JPEG-2000 is wavelet. JPEG 2000 never really caught on; it's a good compression algorithm but has too many features. JPEG 2000 is used mostly for medical imagery, because you can efficiently zoom in on an area and the most detailed level can be compressed losslessly. It's also used for some virtual world content, because you can read the beginning of the file and get a lower-rez version.


Yeah fair enough. It might be worth requesting the maintainer of zune-image to add a decoder for jpeg2000. The repo already supports a whole bunch of formats already.


Jpeg 2000 is not the same codec as Jpeg, it was an attempt to make a more modern image format, but it never really caught on outside of science and medical imaging. For example satalite images are often jpeg 2000


This is a very good explanation.

It's sad that these terms are so counterintuitive. I still regularly have to clarify the confusion between undefined and unspecified behavior. And to be fair, "define" and "specify" are basically synonyms in everyday language, it's evil that these terms have so wildly different meanings. Quite a bit of confusion is caused by people not fully understanding that "unsafe" in Rust actually just means "unchecked", i.e. not verified by the compiler, and not necessarily "insecure".

Now we add "unsound" to the mix, and I fear that all of this is simply too hard to learn.

Maybe we should reverse all those terms to be positive, and also use words in their intuitive meaning, and also avoid confusing almost-synonyms. Here's a quick attempt:

1. "Well-behaved" code would be code that only accesses memory correctly. (That's not fully precise enough. What I actually mean of course is "no UB", but that's hard to put into words if you don't already know what UB is. Let's say well-behaved code only does "allowed stuff".)

2. "Checked" code is code verified by Rust's compiler to be well-behaved if certain invariants are fulfilled. Code is always checked by default unless you opt out with a special keyword (confusingly called "unsafe").

3. "Sound" code is well-behaved code (that may or may not be checked by the compiler) with the additional restriction that will remain well-behaved no matter how it's embedded into checked code. This is what enables "safe abstractions". Checked code is automatically sound by this definition, but alternatively, unchecked code could also be sound, which would have to be manually verified somehow. To do this, you obviously need to know which assumptions the checker makes in its reasoning.

(Edit: Reworded a bit to address an unexpected source confusion. What's now called "well-behaved" was called "safe", but that has multiple meanings. Some of the responses in the thread apply to the old version.)


I don't think your attempt would be quite accurate: safe code, even though it is checked, can cause UB. This can only occur if some earlier unsafe code was unsound, by the current definition.

For instance, suppose you have an ordinary Box<i32>. Then, you dereference it to create a local &i32 reference, and use unsafe code to turn it into a &'static i32 reference. Then, you drop the Box<i32>, which deallocates the backing memory. Finally, you attempt to read the value from the &'static i32.

The UB occurs only at the last step, when you read from deallocated memory. But reading from a &'static i32 reference is completely allowed in safe, checked code. That's why we use the term "unsound" to cast fault on the earlier unsafe code which allowed us to do this.


Yeah this is a great example of why formally defining "soundness" is so tricky. Not only do we have to worry about unsafe code committing UB, we also have to worry about it setting up a situation where safe code might commit UB later. Soundness is a property of functions and modules, but that property is a statement about the entire program that contains them, or actually about any program that could contain them.

In retrospect, I kind of wish I had asked somebody like Ralf Jung to review this post before I published it. On the other hand, the fastest way to get an answer on the internet is to post the wrong answer :-D


That's a very good example. I think that's what I meant by:

> "Sound" code is safe code (that may or may not be checked by the compiler), that will remain safe no matter how it's called by checked code.

Maybe "called" was too specific, it could be embedded in other ways. But it's clear in your example that there's a way of integrating the unsafe cast into otherwise fully checked code that leads to UB. Hence, it is not sound.


Just holding an invalid reference is UB, even if you don't do anything with it (the example still works as the reference gets invalidated by safe code).


A reference is never really "held": from a language-semantics standpoint, it only exists when it is actually used. In this example, copying, reborrowing, or accessing the reference would be UB, but simply letting it fall out of scope would not be UB (modulo the UCG issue oconnor mentioned; but I personally doubt that this status quo will change). You can try this yourself with Tools > Miri on the Playground (https://play.rust-lang.org/?version=stable&mode=debug&editio...). The distinction is far more relevant for unsafe code than for safe code.


I think there are some aspects of this rule that are still undecided. See for example:

- https://github.com/rust-lang/unsafe-code-guidelines/issues/8...

- https://github.com/rust-lang/miri/issues/2732


[flagged]


As a former member of the Rust mod team, and a continuing member of the Rust project, this narrative is complete bullshit.


It'd be awesome to hear the real narrative around the infighting of Rust's former maintainers and moderators. So far, I've just been watching the sparks fly from the outside. Like your resignation note where the Core Team was alleged to "unaccountable" https://github.com/rust-lang/team/pull/671 and implied that to be untrustworthy. Or like how maintainers called Zig a "massive step backward" for the industry https://news.ycombinator.com/item?id=32783244.

Memories change awfully quickly when positions are up in the air though. If there's an investigative report, I'm sure it can settle some of the open questions around nepotism and the downfall of the original leadership https://news.ycombinator.com/item?id=28633113.


You're obviously a troll and have already made up your own mind. And I've already said publicly what I'm going to say. So I'm going to take a hard pass buddy. Stop spreading misinformation.

Even your comment here where you're "just asking questions" contains a pile of bullshit.


Labeling the past year as a "troll" is one way to deal with the pain of failure. For the previous maintainers and moderators, giving up on managing Rust and being sidelined from Rust's leadership must have been painful. But it's necessary since they just weren't mature enough nor honest enough to be trusted with Rust.


You continue to troll by psychoanalyzing huge groups of people you don't even know. More bullshit.


> To follow your recommendation would require defining Rust as a memory-safer language, not as the language of memory safety.

I'm actually suggesting changes in explanation only, hoping to make these terms less confusing for beginners. Rust veterans know that "safe" code just means "code not in an unsafe block", beginners are often confused by the more common meaning of the word.


This is absolute nonsense.


> Rust is designed around safety and soundness. Roughly speaking, safe code is code that doesn't use the unsafe keyword, and sound code is code that can't cause memory corruption or other undefined behavior.

I'd be a bit more precise here. Safety is indeed a property of some particular lexical part of the code, but soundness is really a property of an entire interface. A sound interface never causes UB when you invoke it according to its documented preconditions; an unsound interface may possibly cause UB in some situations even when invoked according to its preconditions.

Some take a narrower definition of soundness, treating it primarily as a property of a safe interface: a sound interface that can be invoked from safe code can never cause UB under any circumstances. But personally, I prefer the broader definition, since it also includes unsafe interfaces which don't document all the preconditions they rely on.

Knowing your interface boundary is an important part of writing unsafe code in Rust. For instance, suppose you have a data type with a method that unsafely makes a certain assumption about its fields. Then you have to be very careful to uphold this invariant in every other safe method that creates or modifies those fields. Furthermore, you have to ensure that no language-level operations on the object as a whole (e.g., swapping, dropping, forgetting, or covariant transformations) can break this invariant. But once you have locally checked all of those things, then you don't need to check any other part of the code base to know that your data type is sound.

As another example, contrary to what some people say, it's not necessarily unsound to have a safe function that can cause UB on some inputs. You just have to ensure that external code cannot call that function (without having previously unsafely promised not to call it incorrectly), and no internal code calls it incorrectly. Here, the interface boundary is the entire section of your code that has access to the function.

In particular, you sometimes see this with data types that have to implement safe traits using unsafe operations (for performance or for other reasons). Often, the function to create an object is unsafe, since you have to promise that the rest of your safe code does not invoke those trait implementations in such a way as to cause UB. Practically, you can do this by limiting how much of your code has access to the created object.


> ...soundness is really a property of an entire interface.

Not really. I mean, maybe... if you limit the scope of your consideration to language design itself.

If Rust call out to C code and they were using incompatible understanding of an array length due to mismatching build rules (like preprocessor definitions), you have a soundness bug it won't show up obviously in any interfaces anywhere.

Soundness is a property of a single build of a program. You cannot generalize it. That is why in safety critical software, you cannot buy "safe" libraries or tools. You can buy libraries and tools with certification packets but context is very important when it comes to program correctness in all its forms.


> If Rust call out to C code and they were using incompatible understanding of an array length due to mismatching build rules (like preprocessor definitions), you have a soundness bug it won't show up obviously in any interfaces anywhere.

In that scenario, I'd say that the build rules are a poorly-documented part of the C code's interface, and by calling out to it with mismatching build rules the Rust code is unsound.

In general, C interfaces tend to have a lot of poorly-documented or undocumented preconditions, which is one of the things that make Rust-to-C FFI so tricky. One of Rust's big ideas is to make all these preconditions fully explicit, either through typestate or through "Safety" sections on unsafe functions and traits.


Bad docs and brittle build expectations make C to C "ffi" tricky as well. But it's realistically how this kind of programming, generally speaking, tends to happen.

I don't expect we'll have a great end to end solution until there are language agnostic build and dependency management systems available. As much as I like cargo, I don't expect porting everything to cargo will cut it. At least not in its current form.


> Soundness is a property of a single build of a program. You cannot generalize it.

I think it's important to be able to talk about sound functions, without needing to know anything about their callers, other than what's expressed in the function signature.


I'm saying even for that case, you have to know how the implementation of that function was compiled, linked, and possibly even deployed. You cannot promise an interface is "sound". You can promise certain expectations were communicated clearly within some parameters (like object types and lifetimes assuming a coherent build process).


I think "soundness" is meaningful and useful in terms of the Abstract Machine that Rust is defined against. Or you know, the one that it will be defined against someday :) It's a way of separating things that are "under the language's control" in a sense from things that aren't.

Just to pick an extreme example, if the `add` instruction is broken on my CPU, and it gives the wrong answer 0.001% of the time, and that causes my code to crash, is my code "unsound"? I think soundness/unsoundness just won't be a very useful idea if that's what we mean by it. Of course I do want my program to work, and I do need to fix things one way or another. I'm probably going to download a microcode update, or else throw my CPU in the trash, or possibly even turn on some special compiler workaround that avoids emitting `add`. But I think it's important and interesting to be able to talk about how I'm probably not going to fix my program by changing how I use the addition operator. There's really nothing wrong with my application code, even though my program isn't working. This is the sort of distinction that soundness is trying to draw.


There's something to that, but when an application and the libraries it uses don't agree on interfaces (ABIs, APIs, codecs, etc.) then the problem is unique to that project, and possibly that specific build of that project, so it's a little different than a buggy CPU.

It's fair to try to simplify and tackle sets of distinct problems one at a time. Hence an abstract machine to assist in reasoning. But it's not the same as practical soundness or safety or whatever adjective people want to give. It's important to make that distinction because lightly technical people seem to ignore the subtlety here and think "well, it's a Rust program, therefore it's safe!". Reality is a lot more complicated. Unfortunately.


> Knowing your interface boundary is an important part of writing unsafe code in Rust. For instance, suppose you have a data type with a method that unsafely makes a certain assumption about its fields. Then you have to be very careful to uphold this invariant in every other safe method that creates or modifies those fields. Furthermore, you have to ensure that no language-level operations on the object as a whole (e.g., swapping, dropping, forgetting, or covariant transformations) can break this invariant. But once you have locally checked all of those things, then you don't need to check any other part of the code base to know that your data type is sound.

This is simply incorrect. If it is possible to, in any safe code, invoke the invariants that can cause the undefined behavior then your code is unsound. UnSafe code _cannot_ rely on safe code invoking it correctly. ALL invariants must be covered at the boundary between safe and unsafe code, anything else is a soundness bug in the code.

> As another example, contrary to what some people say, it's not necessarily unsound to have a safe function that can cause UB on some inputs. You just have to ensure that external code cannot call that function (without having previously unsafely promised not to call it incorrectly), and no internal code calls it incorrectly. Here, the interface boundary is the entire section of your code that has access to the function.

This is again completely incorrect. If it is possible to invoke unsound behavior from safe code then the code is fundamentally unsound. This is the problem that the Actix developer kept trying to justify that got him in a lot of hot water with the community. Please never write code this way. If you have any public repositories please mention them so bugs can be filed against them.

Put another way, a bug in any piece of safe code must never cause any piece of unsafe code to commit undefined behavior. That’s the fundamental guarantee that the entire language is based on. Please don’t violate that by lying to developers that code is safe when it‘a not.


> This is simply incorrect. If it is possible to, in any safe code, invoke the invariants that can cause the undefined behavior then your code is unsound. UnSafe code _cannot_ rely on safe code invoking it correctly. ALL invariants must be covered at the boundary between safe and unsafe code, anything else is a soundness bug in the code.

That's roughly what I'm saying? The point is that you have to carefully design your interface so that safe code from the outside is unable to break your invariants, no matter how it invokes your interface.

> This is again completely incorrect. If it is possible to invoke unsound behavior from safe code then the code is fundamentally unsound. This is the problem that the Actix developer kept trying to justify that got him in a lot of hot water with the community. Please never write code this way. If you have any public repositories please mention them so bugs can be filed against them.

If you have a public safe function that can cause UB, then it is indeed unsound. But it is possible to have a private safe function that can only possibly be accessed from a small part of your code base. If you are careful in preserving this invariant within the parts of the code that can directly access the function, then creating a sound interface without direct access to the function is entirely possible. (But this obviously gets more risky, the more code that has access to the function. I'll grant that it's not an uncommon pitfall to assume a crate-wide invariant somewhere that you later forget about.)


> But it is possible to have a private safe function that can only possibly be accessed from a small part of your code base.

This would also be unsound. Multiple people develop software in a library. Any future developer on your library will automatically assume that any safe Rust function can be called with any parameters and not invoke undefined behavior. That is the default assumption of any Rust developer. If you have such code in your libraries, you should fix them, or mark them as clearly unsafe such that any caller of that function ensures they keep the invariants that are required of that unsafe code that is propagated upward. You're falling back to old ideas from C.

> If you are careful in preserving this invariant

This is C/C++ mentality. The entire point about Rust is that you do not need to be careful. It's guaranteed and checked by the compiler. Again, if you can invoke undefined behavior by calling ANY safe function with ANY form of input, then the function is unsound. This is simply how Rust assumptions work.

> But this obviously gets more risky, the more code that has access to the function. I'll grant that it's not an uncommon pitfall to assume a crate-wide invariant somewhere that you later forget about.

I suggest you re-learn how to program Rust, personally. You've picked up some really bad habits that don't line up with general consensus on how to code in Rust.


Since unsoundness is a property of an interface, that private safe function would be unsound but could be used as part of the implementation details of a sound public API.

It's not something I would do personally, and would be something I'd put down as a black mark against using a library as a dependency, but it's nowhere near as bad as doing the same thing on a public API.


Every single function call is using an interface. Your public API is not the only interface.

It's an extremely bad habit to pick up as this ruins one of the main purposes of Rust, namely it's ability to allow collaboration and while not causing undefined behavior. Any sort of internal "don't do this or it'll cause undefined behavior" is just falling back to C/C++ land.

Some developer later will modify that code later, maybe even the same developer and will forget about the invariants that need to be held if they're not documented in unsafety comments around unsafe code. Anything that relies on code review to catch unsoundness outside of unsafe code is automatically wrong.


> This would also be unsound. Multiple people develop software in a library. Any future developer on your library will automatically assume that any safe Rust function can be called with any parameters and not invoke undefined behavior. That is the default assumption of any Rust developer.

That's quite the broad generalization. Obviously, if you're at all sane, you're internally documenting your internal invariants, and keeping the section of code upholding them as small and tightly-packed as possible. Rust's unsafe keyword is simply a tool that lets us use interfaces without being fully familiar with their implementations.

It's generally a great idea to encapsulate unsafe code into a sound internal interface whenever possible, but this doesn't necessarily require that every single function be its own standalone interface. What matters is that it's clear to the maintainers that the callers (which should best be nearby in the code) are bound by its requirements.

> Again, if you can invoke undefined behavior by calling ANY safe function with ANY form of input, then the function is unsound. This is simply how Rust assumptions work.

Suppose I have a function that internally creates an Option<T> in such a way that a certain invariant is upheld. I call Option::map() on it, passing it an inner function with an unsafe block assuming that the invariant is upheld. The inner function is called nowhere else. By necessity, this inner function is not marked unsafe, since Option::map() requires a safe FnOnce.

Is there unsoundness in my code? I say not, since the inner function is not a standalone interface; it is an implementation detail inseparable from its outer function. There is absolutely no way to invoke my code as it stands so as to cause UB.

> I suggest you re-learn how to program Rust, personally. You've picked up some really bad habits that don't line up with general consensus on how to code in Rust.

I would prefer that you not make baseless allegations about my code. I mention that pitfall only from experience reviewing others' unsafe code. Again, if you're sane, you keep the parts of the code manually upholding your invariants as small as possible. Marking your internal functions unsafe and clearly documenting their preconditions is a perfectly fine way to go, and that's how I write my own code, being a sane person.

But having an invariant held across multiple functions does not automatically make your code unsound. In certain situations (internal implementations of external traits, callbacks from external code), it is even necessary to have an unsafe precondition pass through the call of a function or closure marked safe. All it takes is the recognition that not every single function is its own standalone interface.


> It's generally a great idea to encapsulate unsafe code into a sound internal interface whenever possible, but this doesn't necessarily require that every single function be its own standalone interface. What matters is that it's clear to the maintainers that the callers (which should best be nearby in the code) are bound by its requirements.

If a safe function has requirements to maintain soundness then it's either an unsafe function or an unsound improperly-labeled safe function. It's a contradiction in terms to label a function safe while also having unsoundness that can be created by misuse of the function.

> Suppose I have a function that internally creates an Option<T> in such a way that a certain invariant is upheld. I call Option::map() on it, passing it an inner function with an unsafe block assuming that the invariant is upheld. The inner function is called nowhere else. By necessity, this inner function is not marked unsafe, since Option::map() requires a safe FnOnce.

The inner function (assuming it's within the scope of the outer function) I would say is still "part of" the outer function. Inner functions are in-effect just more complex scoping within the outer function. They're not first-class citizens. As long as it is impossible for the inner function to ever escape it's scope then I would still consider this safe as long as the outer function can safely accept any input. Again, making sure that all unsafe blocks are properly commented. I would make it extremely clear by putting the unsafe {} as starting at the beginning of the function and ending at the end.

However there's a slight problem here after thinking about this a bit more. It must be solely the responsibility of the outer function that the inner function is used correctly. It cannot rely on the implementation of any piece of code outside the inner function.

The acid test here is whether it is possible that if the implementation of Option::map() were to ever change (while itself still being sound) whether it is possible that it could expose unsoundness without violating the FnOnce preconditions, i.e. without causing a compiler error. If a language (or library) version upgrades can expose unsoundness in your code, then your function is unsound. It's unclear from your example if that would be the case.

In general though I would say this is a poorly written function however and this type of thing should be avoided.

> But having an invariant held across multiple functions does not automatically make your code unsound.

Again, to that I would disagree.

> In certain situations (internal implementations of external traits, callbacks from external code), it is even necessary to have an unsafe precondition pass through the call of a function or closure marked safe.

I would need a more concrete example to look at. However the callbacks from external code example sounds especially fraught.

> All it takes is the recognition that not every single function is its own standalone interface.

I would go even further and say it is an invariant of the Rust language itself that any safe function will never expose unsoundness.


Sorry for taking a while to respond; I've compiled a list of examples of this pattern in common crates [0], including the standard library. For each crate, safe functions and closures above the horizontal rule depend on the behavior of other functions, and those below the horizontal rule depend only on the behavior of their containing function. Closures* marked with an asterisk capture data from their environment, so it would take some effort to refactor those to use unsafe fn() pointers.

One example of the safe trait implementation scenario is in tracing-log [1]. The <LogVisitor<'a> as Visit>::record_str() method would leak dangling &str references if the value argument didn't live as long as 'a. LogVisitor<'a> is a private type in the tracing-log crate, and Visit is a public trait in the tracing-core crate. The Visit trait had existed before tracing-log was first released, so it could not be modified without breaking backward compatibility.

As written, there is no actual way for someone to cause UB with this library. tracing-log is written so that LogVisitor<'a> values only ever go into Event::<'a>::record(), and tracing-core is written so that it passes the visitor &str references directly from the Event<'a>, that live at least as long as it. If someone added a function to tracing-log that otherwise called <LogVisitor<'a> as Visit>::record_str(), or if someone modified tracing-core to pass a copy of the string to Visit::record_str(), then someone could cause UB with this library; but no one has done so, so no one can cause UB.

> The acid test here is whether it is possible that if the implementation of Option::map() were to ever change (while itself still being sound) whether it is possible that it could expose unsoundness without violating the FnOnce preconditions, i.e. without causing a compiler error. If a language (or library) version upgrades can expose unsoundness in your code, then your function is unsound.

Are you saying that unsafe invariants must never be broken, even if external functions have behavior completely different from how they are documented? In that case, it would seem like nearly all nontrivial unsafe code would disappoint you, since it depends at least on the safe standard library functions like Option::map() doing what they say on the tin. Public documentation of behavior is a binding promise under the SemVer regime that Cargo follows.

More generally, I think we both agree that a sound library is one that "cannot be used to cause UB from safe code". But I say that it's fine as long as the public interface as it stands (or well-defined internal interface) cannot be used to cause UB, whereas you say that it must not cause UB even if someone were to hypothetically add another function that calls any safe private function.

To me, that standard seems somewhat arbitrary. Suppose I were to add a new function inside the alloc::vec module:

  pub fn safe_set_len<T, A: Allocator>(vec: &mut Vec<T, A>, new_len: usize) {
      vec.len = new_len;
  }
This function is perfectly safe to write, and it would be perfectly safe for external users to call. However, it would allow half of the safe Vec methods to cause UB. So since I could hypothetically add this safe function, does that mean that the Vec API is already unsound? I'd say not, since there is an invariant attached to Vec that the length is no greater than the capacity, and adding this function would allow this invariant to be violated.

Would you disagree, and say that the Vec API as a whole is unsound, since the soundness of some of its unsafe code depends on the nonexistence of certain functions that would be safe to write? Or would you instead say that such an invariant can be attached to a type, but only begrudgingly to an outer function, and never to a module or a crate? I can't see the latter distinction as being anything but arbitrary.

You claim that your position "is the default assumption of any Rust developer", but I think my list of examples provides evidence against that claim: plenty of large crates' maintainers have little issue defining "safe" private functions or closures that reach outward for their invariants. What's important is that the maintainers don't actually merge any PRs that break the invariants. (Ideally, this would be by way of thorough documentation, but that is far from strictly necessary.)

[0] https://gist.github.com/LegionMammal978/7654f484b495d67c63d0...

[1] https://github.com/tokio-rs/tracing/blob/tracing-log-0.1.3/t...


You even have this phenomenon inside single functions. Let's say you write a function that's supposed to be a safe/sound abstraction. You might be setting up a pointer, maybe determined by a dozen lines of codes, that is then finally dereferenced. You'd typically only put the unsafe block around this final pointer access, but its safety depends on the whole calculation being correct.


Unsafe blocks are very different from unsafe-marked functions. A function that can indirectly cause UB when called in safe code must be marked as unsafe, so that the invariants it relies on can be properly documented in a Safety: comment, and manually checked by callers.


Only if it's public, and you don't know who the callers are. If you do know exactly who the callers are (since it's a private function), and you have internal documentation that lets contributors know of the proper invariants, then it's entirely possible to prevent UB from ever occurring even if the function is marked as safe.

By itself, this is usually not a good idea. But in some cases, it's necessary, e.g., an unsafe implementation of an external safe trait on an internal type that only your internal code will ever see.


Why? If you have a non-public function and all its potential callers are known, what's the harm in marking it as unsafe and documenting its invariants in the idiomatic way (a Safety comment)? Your case of implementing a safe trait via unsafe code seems like it should be quite rare, and even then the trait implementation should document the assumptions it's relying on as part of an unsafe block. If you neglect to mark other possibly-UB functions as unsafe, this becomes harder to ensure.


I agree that it's rarely a good idea to do this; it's a red flag at minimum when you see it. I'm just trying to reject the notion that it's categorically unsound to do this. I've personally seen the safe-trait example while reviewing a crate, and there were indeed plenty of comments documenting the assumptions. As long as there's absolutely no way for anyone to cause these assumptions to be broken from the outside, there's no way to cause UB, and it's not really useful to call the interface unsound.


It doesn't matter if it's public or not. Every function is "public" to the person modifying the code. The purpose of unsafety/safety is primarily for the developer, not for users of a library.


I completely agree. The contract of calling any safe function is that it is impossible to cause undefined behavior by calling that function (assuming there are no unsoundness bugs). That is the inherent built-in contract in any piece of Rust code.


I think what you're describing might be similar to footnote/marginnote #6?


Sorry, I was reading the article from my phone and didn't notice that footnote. It definitely ties in to the point I'm making here, but I still feel that the distinction is important enough to elaborate on a bit.


> Then you have to be very careful

The problem.


Well, in principle, you could place each field in its own wrapper type that is unsafe to access. But you'd still have to be nearly as careful: there's no way around that, short of formal verification. Rust is ultimately a pragmatic language, and it lets the user pick their preferred balance between decreased performance, extra care, or extra line noise that people will get tired of reading. At best, it can give suggestions in the design of its standard library of where that balance ought to be.


I feel like #![forbid(unsafe_code)] deserves a mention?


This starts to open up a whole topic that I though about including but ultimately decided not to: How do we deal with unsafe code in our dependencies? This gets into all sorts of tools like cargo-geiger and blessed.rs and things like that.

Luckily, I think the Python/Java metaphors work well here. Rust apps have to worry about unsafe code in their dependencies, like Python and Java apps have to worry about C code in their dependencies. So hopefully not _too_ much is lost by leaving those questions for another article. Maybe I should add another footnote though.


> Instead, the only way I can think of to make foo1 commit UB is to give it an uninitialized index

Nah a `fclose(stderr)` before calling it will do it!

> A safe caller can't be "at fault" for memory corruption or other UB.

An example of where a safe caller can be at fault is after fork, or in a signal handler. Functions like malloc must not be called, but Rust does not model this and freely invokes malloc from safe functions.

In these regimes safety is inverted: File::open is NOT safe, and may invoke UB; while the "unsafe" libc::open IS safe and should be used instead.


I wouldn't agree with this. In order to register a signal handler in Rust, you have to utter unsafe somewhere. That's the point at which you form the contract that your signal handler is correct. The fault is not with the safe code that allocates. The fault is with the creation of the signal handler.

It is true that Rust could model signal safety in some richer way, but that's just a matter of shifting where the contract is formed.


The author gives the following function:

    static BYTES: &[u8] = b"hello world";
    fn foo1(index: usize) -> u8 {
        BYTES[index]
    }

and then writes we can't make foo1 commit UB just by giving it a large index ...foo1 will never cause UB without the caller committing UB first.

But strictly speaking this is wrong! If I plug foo1(large_index) into `process::Command::pre_exec()`, then the resulting panic may invoke UB, without the caller having committed UB first.

So you have to decide whether your function is designed to be run within these environments. If it is, and you add an allocation, then that's your fault!


That's exactly the case I responded to, but with different operations. Instead of a signal handler, you're talking about calling pre_exec[1], which is marked unsafe.

This is the key bit:

> A safe caller can't be "at fault" for memory corruption or other UB.

The fault is in the programmer writing 'unsafe' to call pre_exec and not upholding the contract dictated by pre_exec. If the caller uses 'unsafe' to invoke code that doesn't uphold what pre_exec demands, then it's the fault of the point at which 'unsafe' is written.

The word "fault" is obviously overloaded, and one can reasonably use it to describe the safe code that allocates or panics as where the "fault" lies. But I feel like it's pretty clear that isn't what is meant in the OP, and is not really what is meant when one talks about using unsafe markers in Rust code to delineate where UB might occur.

If what you said was true, then (outside of Rust bugs and other environment trickery like /proc/self/mem) "safe by default" becomes meaningless. But it clearly isn't. And it does unfortunately depend on what you mean by "fault." In this context, it has to mean somewhere where you utter unsafe. If UB occurs in your program and you can't trace it back to a place where unsafe was uttered incorrectly (again, outside of Rust bugs and environment trickery), then something is seriously wrong.

[1]: https://doc.rust-lang.org/std/os/unix/process/trait.CommandE...


The "fault" lies with any function which fails to uphold its contract. I would argue that if `foo1` were documented as async-signal safe, then it would be at fault for the memory corruption it causes, if called from a signal handler.


> I would argue that if `foo1` were documented as async-signal safe

Agreed there, certainly. But that doesn't have much to do with Rust's safety/soundness stuff, and is more just about adhering to documented contracts. And 'foo1' is not documented as async-signal safe here.

I generally wouldn't rely on any function to be async-signal safe unless it's explicitly documented to be. I'm sure there's probably a heap of std methods where it would be reasonable to assume async-signal safety, particularly ones that don't do I/O, wouldn't reasonably be expected to allocate and are documented to never panic. (Possibly other things, I don't have the requirements for async-signal safety memorized. I just know to be super paranoid and carefully pay attention to it if I'm writing a signal handler. Which is... one of the many reasons I avoid signal handlers if I can help it and instead use something like the `signal-hook` crate.)


Then your code calling `pre_exec` is unsound, it is calling into `foo1` which is not documented to be async-signal-safe. Without any documentation saying otherwise you can't assume a function can be called within weird contexts with non-standard restrictions, and if they update to do something like add an allocation that causes your code to be UB, that's allowed by their documented API.


> But strictly speaking this is wrong!

I agree, and I'm making a correction. Instead of "foo1 will never cause UB without the caller committing UB first", the article now reads "foo1 will never cause UB without the caller writing unsafe first." Thank you!


IMO that correction isn't needed, because it goes without saying in every circumstance anyway. You always need to write 'unsafe' in order to introduce UB. (Modulo Rust bugs and environment trickery.)

But I do agree that your correction is, in fact, correct. :-)


> fclose(stderr)

Good point!


I really like that in Rust the API safety is defined by the language, not by each library individually.

There are no gotchas to be found in small print in the library documentation, no disclaimers that the code is meant only for single-threaded programs, or haggling over how rare the edge cases are. If it doesn't say `unsafe`, it has to meet Rust's universal requirements for safety.


That holds only for properties which are describable by the Rust's type system, including mentioned thread- and memory-safeness. Things get worse when it comes to async and futures, since RAII doesn't really work with them, and you have to consider cancellation-safeness of libraries' futures, which are noted in documentation only.


Safety includes everything that could cause UB. Things that cannot be safely expressed are forbidden.

Async too: you're not allowed to make cancellation-unsafe futures that could keep using a buffer after being dropped. This is why io_uring abstractions only work with custom buffer pools and not slices.


Would it be feasible to create a type which signifies that unsafe computation has happened? Like the IO Monad in Haskell, for example. This type would have to have a constructor that wraps a value and no way to take the value out of the type.

That way, any code “infected” by unsafe computation will also be infected by the type.


Practically everything will be marked by it. Using `Vec`? Marked. Reading file? Marked. Printing something to stdout? Marked.

The idea behind Rust's unsafe is to cover unsafe bits in safe API which (assuming correctness of the unsafe code) can not be misused to cause memory issues in safe Rust.


Got it, thank you


You could just use cargo-geiger and #![forbid(unsafe_code)] in order to only have safe code. We use this in our CI/CD in order to ensure that no unsafe code is used.


I have a library in which I set #![forbid(unsafe_code)]. It relies on a small number of carefully chosen dependencies, almost all of which contain unsafe code. I mean first-rate libraries like bincode, bstr, and once_cell. It feels a bit disingenuous for me to emphasize the safety of the top layer that I've written. I guess what I can actually express is that I'm setting risk boundaries and, in a way, taking myself out of the equation.


That's why cargo-geiger exists.


I replied to a related comment here: https://news.ycombinator.com/item?id=35034184


What happens if you find unsafe deeply nested in some dependency that you need? Like a web framework or maybe a graphics engine that's not so easy to find a replacement for?


Well, the simple answer is you don't use it. The more complex answer is that you do use it but have safe guards in place, which `unsafe` does to some extent even more than C or C++ code. You could also ask the creators of said dependencies why there is `unsafe` in their code and perhaps submit PRs to mitigate or remove that, as in the case of actix-web a few years ago.

But as Rust crates become more prevalent, I do expect that we'll see fewer and fewer crates with `unsafe`, or have enough safe alternatives to `unsafe` crates.


> you do use it but have safe guards in place

Isn't this the same thing that everyone else is doing, except they're not pretending to be special by falsely claiming that they're using no unsafe?

My impression of `forbid(unsafe)` users in practice is that they think they've cleverly discovered a way to encapsulate unsafe that everyone else is missing, when in fact they've missed the point of unsafe to begin with and then belatedly rediscovered it.


I'm not sure what you're referring to with your 2 sentences exactly. What is "the same thing that everyone else is doing?" And what does this mean: "they've cleverly discovered a way to encapsulate unsafe that everyone else is missing, when in fact they've missed the point of unsafe to begin with and then belatedly rediscovered it?"


> Rust is designed around safety and soundness.

Then why are crates automatically updated from internet ? This is not safety, this is remote code execution.


Cargo doesn't update crates automatically.


`cargo install` updates crates automatically, unless you add the non-default --locked parameter.


`cargo install` is a one-shot build of a program and all of its dependencies from source; in what sense does it "update" any crates?


`cargo install` doesn't install the locked dependency versions tested by the program author, but ignores Cargo.lock entirely and automatically updates all dependencies to the latest versions (usually semver) allowed by the Cargo.toml file.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: