Hacker News new | past | comments | ask | show | jobs | submit login
Secure Rust Guidelines (anssi-fr.github.io)
171 points by xmmrm 15 days ago | hide | past | web | favorite | 58 comments



I personally strongly disagree with:

Functions or instructions that can cause the code to panic at runtime must not be used.

First of all, they kind of dodge this later by saying "Array indexing must be properly tested" (else it can panic) -- everyone thinks they write code which is "properly tested".

Personally, I often write panicing code -- if the code gets in a state where I have no idea how to fix it, I panic. For some code it is important it does quit, but I'd much prefer more code paniced than tried to carry on and ended up creating other problems.

Also, in rust if we don't want to panic we need to never use array indexing and never use integer division, just to start.

Of course, it's fine to write code which plans to never panic, but I think it would be better to say "Have a formal 'panic plan'", where either one panics early, or tries to never panic.


Panicking is, in many ways, the best-case scenario for code that contains a bug. A bug that causes a panic is much easier to find through fuzzing, property testing, even static analysis or analyzing failures in production. It also prevents the bug from "infecting" other code by allowing the program to proceed in an invalid state.

If you don't want a panic to take down the whole system, you can isolate the code in a thread and use a supervision tree, or use `catch_unwind` to let the thread perform cleanup and then continue from a known state.

Using `slice.get()` and returning `Option` or `Result` for failures that should never happen in correct code is not an improvement. It leads to the same unwinding behavior, as the failure gets passed up the call stack, but with more manually-written code, and more error-handling paths that are hard or impossible to test (because if the program is correct then they are unreachable). It infects calling functions and changes public APIs.

Clearly, the best practice is "don't write code with bugs." If your code is bug-free, then it is also panic-free. But we don't really have the tools and techniques to do that all the time in general. The second-best option may be to write code that panics when there is a bug. (People saying "write code that can't panic" are often really just saying "don't write bugs.")

(See also "crash-only software," from the Erlang school of reliability engineering.)


> If you don't want a panic to take down the whole system, you can isolate the code in a thread and use a supervision tree, or use `catch_unwind` to let the thread perform cleanup and then continue from a known state.

Playing devil's advocate: with unwinding panics (which are necessary for these two approaches), it's harder to make sure all the data structures the thread was using are left in a coherent state. It's not as bad as exception safety in C++, but it does have some similarities. Just take a look at the tricks the Rust standard library uses to keep everything sane even if the stack unwinds (structs implementing Drop normally called SomethingOnDrop, for instance CopyOnDrop), or std::sync::Mutex poisoning.


Agreed. There are definitely good arguments for using abort-on-panic, and doing isolation and recovery at the process level rather than the thread level. This is what we do with all the Rust code in Firefox, for example.

Unwind safety is a real issue. I have some personal experience with fixing panic-safety issues in unsafe Rust code:

https://github.com/servo/rust-smallvec/pull/103

I wrote a bit more about it here:

https://users.rust-lang.org/t/c-pitfalls-hard-to-avoid-that-...


Even isolation and recovery at the process level does not guarantee that a multi-process application will have no invalid state! By the time invalid state is detected in one process and a panic happens, the invalid state may have already propagated to another process via IPC messaging.

And even taking down an entire multi-process application doesn't fully protect against invalid state, if that invalid state has wound up in persistent storage.

All recoveries from panics are ultimately heuristics.


Your goal should be for your software to behave correctly regardless of persistent state. That is, all persistent states are valid, or else that's a grave bug.

If the situation is "Program fails when restoring from state X" the priority for "Don't fail when restoring from state X" is higher than for "Don't cause whatever happened that results in state X".

Example: Let's say your code believes 'foo' is supposed to be a file with an XML structure in it. A user reports that something went wrong and now the program crashes, the file 'foo' is now exactly 4096 bytes (one page on most architectures) of binary noise.

Correct priority: #1 Make the program work when 'foo' is not an XML file. If possible (maybe 'foo' is storing animated profile pictures for a chat program) it should carry on, if that's impractical (maybe 'foo' defines which model of X-ray machine we're hooked up to, best not to press on without knowing) it should give a clear error explaining what's wrong.

#2 only after fixing #1 figure out how 'foo' gets corrupted and try to solve that.


Nitpick: this isn't really about persistent state specifically, but any shared (and especially mutable) state, including but not limited to state shared across multiple temporally-nonoverlapping instances of the same program. Eg, compare when 'foo' is provided as input or fetched across a network.


> if that invalid state has wound up in persistent storage.

The same can be said for a program recovering from a power loss?


> much easier to find through fuzzing

Note that you can fuzz with debug asserts enabled.

Crash-only software is a fantastic paper that teaches another school of thought: always panic gracefully and be quick to recover. This assumes that you have a quick recovery plan for panics, if you panic, which is then fine (but not always the case).


> Panicking is, in many ways, the best-case scenario for code that contains a bug.

The very opposite: it's among the worst behaviors, short of creating a vulnerability.

From any modern languages we should expect an exception to be raised.


"Panic" is essentially the Rust name for exceptions. They are even implemented the same as C++ exceptions in terms of codegen. The real differences are cultural rather than technical, as I touch on in this thread:

https://users.rust-lang.org/t/c-pitfalls-hard-to-avoid-that-...


No, it does not have the same semantics of exceptions.


Care to give an example or two?

Rust panics, like C++ exceptions, can unwind the stack and invoke Drop/destructors.

Rust panics, like C++ exceptions, can be caught, inspected, and recovered from - using std::panic::catch_unwind instead of try/catch statements.

Rust panics can be rethrown with std::panic::resume_unwind, just as you can rethrow C++ exceptions with "throw;".

Rust panic payloads can be a wide variety of types - anything that conforms to "dyn Any + Send + 'static" - just as you can throw a wide variety of exception types in C++. While Rust panic payloads are typically a &str or String, they're not limited to that, and C++ lets you throw C-strings or std::string too.

Unhandled Rust panics terminate the application, unhandled C++ exceptions std::terminate (which by default invokes abort) the application.

It's discouraged to use Rust panics for general control flow, but that's cultural rather than semantic - you can totally use them for control flow. Nothing is stopping you, except hopefully your code reviewer.

Rust panics can be configured to abort instead of unwinding, but that's just a cleaner alternative to C++ compilers typically giving you the option to disable exceptions entirely - and it's not unheard of for a C++ library to wrap exception throwing with macros, such that these can abort instead when built without exception handling support.


> First of all, they kind of dodge this later by saying "Array indexing must be properly tested" (else it can panic) -- everyone thinks they write code which is "properly tested".

You're skipping half the recommendation though:

> Array indexing must be properly tested, or the get method should be used to return an Option.

emphasis mine

> Also, in rust if we don't want to panic we need to never use array indexing and never use integer division, just to start.

I mean, that's literally the block above the one you quote:

> Common patterns that can cause panics are:

> * using unwrap or expect,

> * using assert,

> * an unchecked access to an array,

> * integer overflow (in debug mode),

> * division by zero,

> * large allocations,

> * string formatting using format!.

>> Rule LANG-NOPANIC

>> Functions or instructions that can cause the code to panic at runtime must not be used.

The bit you quote is really a additional reminder that array indexing is not panic-safe.


Saying "properly tested" doesn't seem useful to me. What is "properly tested"? I'm sure people thing most array overflows in C libraries are "properly tested", as noone wants to cause memory corruption.


> Saying "properly tested" doesn't seem useful to me. I'm sure people thing most array overflows in C libraries are "properly tested"

It is useful in the sense that it's for a restricted set of constructs, in the same sense that `unsafe` blocks are not outright forbidden but they should be justified and because they're restricted in span they can be more easily tested than your entire C codebase.

Since array indexing would be recommended against by default (and either iterators or `get` would be the normal way to handle it), the number of places where it is used should be small and thus easy to check for, and test extensively if not exhaustively.


What do you do when your algorithm works with array indexing and you are never supposed to handle than None within the Option?

Ie if you have indexed array out of bounds that’s a developer’s mistake and not a “condition to be handled up the stack”.

It’s ok to panic if you agree that it’s better to panic rather than end up in a state that will probably cause even more issues down the line


> What do you do when your algorithm works with array indexing and you are never supposed to handle than None within the Option?

That's where the first clause comes into play:

> Array indexing must be properly tested

Although

> Ie if you have indexed array out of bounds that’s a developer’s mistake

Which developer would be my question.

If it's the person who reified the "algorithm [which] works with array indexing" then fair's fair, it's a bug in the implementation and should not happen, which is why the guideline specifically says:

> should never use functions or instructions that can fail and cause the code to panic

which, assuming the word is used in the RFC 2119 sense means there can be case where calling panic-ing functions is the right thing to do.

If it's the developer who calls the function implementing the algorithm, then either algorithm is failable (and thus so's the function) or the algorithm should take in more specific data structures which statically exclude failing cases. For instance `max` on an empty collection will fail, so either `max` should be failable (which e.g. `Iterator::max` is) or it should only be callable on some sort of `NonEmpty*`.


You're skipping half the recommendation though:

> Array indexing must be properly tested, or the get method should be used to return an Option.

Fwiw, for anyone who’s dabbled with Haskell this is SOP. Any result that could be undefined is returned as a Maybe (Haskell’s version of Option). If this seems odd to anyone, understanding it in Haskell will probably help understand it better in Rust too:

http://learnyouahaskell.com/a-fistful-of-monads#getting-our-...

https://en.wikibooks.org/wiki/Haskell/Understanding_monads/M...


Er, don't Haskell's head and (!!) functions default to throwing an exception?

    $ ghci
    Prelude> let a = [3, 4, 5]
    Prelude> a !! 0
    3
    Prelude> a !! 3
    *** Exception: Prelude.(!!): index too large
    
    Prelude> head []
    *** Exception: Prelude.head: empty list
I don't think Haskell is really different from Rust in this respect. Both have a wrapper type for optional values with good syntax, but in both, there's still some syntax so common operations sometimes choose to skip it.

In fact, Hoogle doesn't seem to find me a function like Rust's .get() in the standard library, just a handful of third-party packages: https://hoogle.haskell.org/?hoogle=%5Ba%5D+-%3E+Int+-%3E+May...


This was posted on /r/rust a couple days ago: https://notes.iveselov.info/cheatsheet-rust-option-vs-haskel...


> Hoogle doesn't seem to find me a function like Rust's .get() in the standard library

In practice, you don't really need one - the safe alternative to "xs !! n" is pattern-matching on the result of "drop n xs", as that's [] if xs has ≤n elements:

https://www.haskell.org/onlinereport/standard-prelude.html#$...


Sure, that seems syntactically more cumbersome than 'if let Some(foo) = xs.get(n)' or 'xs.get(n).map(|foo| ...)' in Rust, but yes, you can do it. As I said, because both the Rust and Haskell versions are more cumbersome than using the version that raises an exception/panics, both Rust and Haskell's standard libraries choose to give you a syntactically-easier alternative that isn't a total function.

All I'm saying is that Haskell doesn't seem to do anything different here - Rust has incorporated the lessons from Haskell's type system. (As someone who fell in love with Haskell a long time ago but never got to use it professionally, this is basically why I like Rust.) Is there something Haskell does that Rust does not do? I'm not trying to say Haskell is insufficient - I'm just refuting the claim that Rust is insufficient and should act more like Haskell.


Sure, I don't disagree - I meant only to add context for anyone reading who was unfamiliar with Haskell, lest they come away with the impression that the lack of a .get()-equivalent was some kind of egregious oversight.


> In practice, you don't really need one - the safe alternative to "xs !! n" is pattern-matching on the result of "drop n xs", as that's [] if xs has ≤n elements:

Great so instead of `xs !! n` you're supposed to write

    case drop n xs of
        [] -> …
        x :: _ -> …
that seems… less than likely?


> Fwiw, for anyone who’s dabbled with Haskell this is SOP. Any result that could be undefined is returned as a Maybe (Haskell’s version of Option).

It would be nice if that were true but that ain't exactly the case is it?

AFAIK exhaustive pattern matching still isn't enabled by default, and the prelude is full of partial functions, especially on lists (to such an extent that GHC has a utility function just for blowing up on empty lists: https://hackage.haskell.org/package/base-4.12.0.0/docs/GHC-L...)


There’s a number of functions that panic in the std if misused, for example copy_from_slice


Yes?


I agree wholeheartedly. If we tell people "Don't write code that panics" then we collectively start writing code that assume panics don't happen. Suddenly panics become a footgun in Rust, and it isn't the safe language we want it to be.

If I'm feeling in a mood, I think we should panic all the time. We should have a "panic monkey" tool that places panics at random parts of the code, to see how panics are handled. Eg to see if they interact poorly with unsafe code.


I can only agree diesel had (maybe still has) a serve bug around connection reuse and panics.

Panics in rust originally where a pretty nice way to have a error kennel pattern and as long as you always kill the whole error kennel in which the panic occurred and don't share mutable state between error kennels this fully save from any inconsistent state.

Sure panics across FFI boundaries can't be done nice at all, but this can still be handled by making sure there is no FFI boundary insider of an error kernel, just in between.

So I would push panic usage for error kennels, and just that. But also the unwind safe traits need an refresh: Enforce them (by linting against inconsistencies) and better defaults for what does and doesn't implement them.

For example a type which is sync but doesn't implement the referential unwind safe trait is broken/bad in 99% of cases. (Either it should implement unwind safe market or not be send). The remaining 1% are edge cases between the usage of catch unwind and thread local storage.


You don't need unwind safety (for anything, ever); a error kennel is more commonly known as a "process"; when a panic occurs

> you always kill the whole [process] in which the panic occurred and don't share mutable state between [process]s

Processes are designed for this, including making it very hard to accidentaly share mutable state (using eg mmap(MAP_SHARED) explicitly).


Sure, but having process error kernels isn't at all appropriate for many applications. E.g. you don't start a process for every http request (or actor request) your server handles (even starting a thread for everyone is considered to much overhead).

But not only is the overhead of "starting"/entering a error kernel a problem you also normally do share _immutable_ state (e.g. configs, lookup tables), as well as some _minor_ amount of mutable but well known to be panic safe state (e.g. simple performance counters and some caches). Which is now much harder to get right.

E.g. in a cache you want it potentially to be shared across error-kernels and re-use it after a kernel crashed but only if the panic didn't come from the cache. So the cache needs to be handled in it's own error-kernel which makes it important to make the intercommunication fast.

Sure you _can_ do all this with processes (by using both IPC and mmap together), but it's much harder to get right. Often this leads to either serve performance limitations or braking of the kernels surface (by using mmap in a bad way). It also tend to lead to code which is very dependent on the features of a specific operating system. Problems with managing startup/shutdown (no systemd isn't a solution ), moving resources between kernels etc.

So all in all processes make thinks more complex, less portable with very little to no benefits for many use cases. (Sandboxing of very large error-kernels like done in browsers is one point where it makes sense).


No,if you are writing secure code then a panic is a denial of service. Their point is that you should never panic, unless you do reach an unrecoverable error.


It seems like, in the absence of panics, there is a dichotomy between "You should know everything that can happen to your program and account for it." and "If you hit some unknown state, carry on." The first one is impossible, the second sounds dangerous. When you get into this dichotomy, panicking seems like a reasonable third option. We should strive for the first, but avoid the second. Nobody wants their code to panic, but we need to be able to deal with unexpected states.


Yeah that's what is lacking from my comment: expect panics to happen and attempt to gracefully recover whenever that happens. This is not easy though.


(This is neither direct disagreement nor agreement, but a sort of elaboration of my own.)

The practical semantics of a panic become increasingly ill-defined as the number of threads in the program and the depth of their stacks increase. I emphasize "practical" because mathematically, no matter how many threads you have or how deep you get, the semantics are perfectly well defined; it's going to do whatever it's going to do. But those mathematical semantics get more and more complicated as the program grows. If you call X() and it panics, its behavior depends on its stack above. If that X() is called in a "server" thread and it ends up terminating it (or restarting it, or anything else), it may impact any number of other threads in weird ways, even if you "handle" the panic in some manner, e.g., even if you "clean up the locks" on the way up the panic handler that still doesn't mean the program's in a "clean state".

When a programmer goes to write the aforementioned X(), it's impossible in general for that programmer to know what their "panic environment" is going to be. (In specific, it may be an unexposed/private method that does have a good idea what environment it will be called in, but in general a function does not know.) Working via panic handlers shares the responsibility between the caller and the callee in a difficult-to-describe manner. At small scales this is completely neglectable, but as you scale up, mismatches between the programmer of X() and the user of X() very gradually start accumulating, and interacting with the other mismatches, and at some point you get to the point where some junior programmer doesn't fully understand the complicated maze and just starts bashing on code until it superficially seems to work, and then you're in real trouble.

By contrast, the contract with error-type returns like Result<> or Either or whatever has a clean contract between the caller and the callee; the callee is responsible for reporting failure, and the caller is responsible for dealing with it. This kind of contract does not compose up into a complicated "panic environment", because it does not cross the stack frames like a panic/exception can. The frames don't mix with each other.

In theory it may be possible to completely exclude exceptions; in practice right now we don't really know how to do it 100% in practical languages, so regardless of what I wrote and regardless of what you may like, you will have some sort of "panic environment". However, I think you are in general better off to try to avoid complicating it as much as possible, because it gets exponentially complicated. I do mean exponentially as in 2^x and not x^2 as it is commonly abused. But the x is fairly small, which makes it easy to fail to notice. Also if you know your program is going to stay reasonably small it means it's a valid choice to just not care and use it anyhow. But if you have any reason to think your program may scale, I'd suggest trying very hard to stick to error-type handling as much as possible, as you only pay that exponential term as you add things into your panic environment.

(Also, I know that "using errors" doesn't fix the problems I mentioned, as I myself have had plenty of days where my "error based" program terminated a critical internal server and made the program "go all weird". I'm just saying it makes it better to not add the complications of a complicated panic-handling environment in addition to the already-unavoidable other tasks you have to do.)


I'm having problems fulfilling this requirement in my libs: "Crates providing libraries should never use functions or instructions that can fail and cause the code to panic."

The Rust standard library Vec, HashMap etc. can cause a panic in Rust, if the device (such as a mobile phone with a small memory) runs out of memory.

C and C++ standard libraries (malloc, std::vector, std::map..) can handle those situations by returning null or throwing an exception.

I wish Rust had some easy way to recover from out-of-memory situations when using the standard library. I have been considering writing my own out-of-memory safe Vec, HashMap etc, but it can't be the right way to do it..


https://www.youtube.com/watch?v=ARYP83yNAWk mentioned that the notion that C++ could handle out-of-memory was a purely theoretical wishful thinking. In practice it could not.


Yes that is one of the primary failures of Rust at the moment: to my knowledge it currently has no good way to safely manage allocation failures (it also has serious issues with stack overflows).

This is an issue with all heap-allocating construct, not just collections but also Box or Rc.

> I wish Rust had some easy way to recover from out-of-memory situations when using the standard library. I have been considering writing my own out-of-memory safe Vec, HashMap etc, but it can't be the right way to do it..

Maybe look at the embedded space there? There might be no_std third-party libraries which handle these issues. Possibly on top of alloc as the (unstable[0] and obviously unsafe) `Alloc` trait does have a concept of allocation failure.

[0] https://github.com/rust-lang/rust/issues/32838


> Yes that is one of the primary failures of Rust at the moment: to my knowledge it currently has no good way to safely manage allocation failures

So, sort of yes and sort of no.

The data structures that allocate in the standard library do not let you handle allocation failure. However, if you write your own, the global allocator lets you determine if failure happened, and then you can do whatever you want with it.


I think Rust got it right, here.

Dealing with allocation failure gracefully is hard and requires a lot of extra code.

For most applications the best default is to panic and handle it up the stack rather than pay the programming overhead of handling allocation failure explicitly in every last nook and cranny. The Rust standard library rightly optimizes for this use case.

For the embedded or critical-safety application spaces, where you really do want to handle allocation failure gracefully, you need something other than the standard library. Letting that "something" develop slowly out in the community is a good call.


The problem with allocation failure is that it's non-local. Suppose thread A does a huge allocation of several hundred megabytes. The best case scenario is that this allocation fails cleanly; the worst case scenario is that this allocation succeeds but causes a small 1024-byte allocation in an unrelated thread B to fail (and it doesn't have to be multiple threads, this can happen even within a single thread). I don't know of any solution for this problem, other than statically reserving the memory which will be used for each part of the system.


The horrible part is that this isn't even limited to threads; a competely separate process (even, eg, a browser JS engine running 'completely' untrusted remote code) can cause your program to hit a allocation failure, and then not have enough memory to preform nontrivial error-handling.


Actually, I have found FallibleVec written by Mozilla [1] but I haven't found anything for other containers, yet.

[1] https://github.com/mozilla/mp4parse_fallible


Honestly all C programs that fail to allocate panic as well. What else can you do?


> Honestly all C programs that fail to allocate panic as well.

I'd say that most C programs that fail to allocate panic (if only because e.g. they're running over-committing anyway so allocations never fail), but it's not true that all of them do. There are systems which do handle heap allocation failures "properly", usually ones which run on systems which are both resource-constrained enough that allocation failure is a possibility yet large enough that heap allocation is a possibility e.g. modern embedded-ish systems running with dozens (!) or even hundreds (!!!) of megabytes of RAM.

Or the software might be designed for running with a low ulimit for some reason (I think that also triggers allocation failures, I remember that being a case where Python throws MemoryError).

Even on larger machines, I'd also guess it is / was also more common on 32b systems where address space exhaustion could be a real concern.

> What else can you do?

You might just abort the current task because it's not that important when resource constrained, or you might have caches you can clear. Or possibly (if that's a possibly routine issue e.g. a background task which might run in cases of both plentiful and constained resources) you might switch to a less CPU efficient but more memory-efficient algorithm.


I've looked at these types of systems and they would still crash, actually even if you want to gracefully handle failed allocations it is possible that your program will crash due to lack of memory


Oh, I never even considered this... But how would Rust ever be able to signal allocation failures in such cases?


Same as any other failure; functions that may try to allocate memory and fail will return a Result<T, E> instead of just a T.


Which would be pretty painful for everyone not working in a memory-constrained environment (at a guess, that's most developers). Much like std:println! panicking instead of returning an error, ergonomics are important too a lot of the time.


It obviously wouldn't be the default (it couldn't be, in fact, since the existing APIs are set in stone short of memory unsafety).

The way it'd work is either you'd have two different collections entirely, one which'd signal and one which'd panic (the latter possibly being built atop the former) or every method which can panic on allocation failure would also have a panic-safe alternative.

Example: https://github.com/rust-lang/rust/issues/48043 similarly Vec could gain `try_push`, `try_append`, `try_insert`, `try_split_off`, …


Which is why you offer two APIs: one that panics, & one which returns Result or Option


Eg. with Vec, every operation that may trigger reallocation should return Result.


This is published by the ANSSI[1], National Cybersecurity Agency of France. They do quite a good job at publishing security guide.

[1] https://www.ssi.gouv.fr/en/


I found at least one problematic section when scanning:

> The environment variables RUSTC, RUSTC_WRAPPER and RUSTFLAGS must not be overriden when using Cargo to build project.

This is simply not true at all. Mainly build cache systems like sccache work by wrapping rustc, there is no reason why using a build cache should be forbidden.

(Through currently rust support of sccache is still not perfect and there are some limitations, doesn't change that the rules are to broad.)

I also wouldn't be surprised if there are some rustc flags not exposed in cargo profiles which allow trigger some security mechanisms in llvm which are not enabled by default but beneficial for your project.

Like always the important think is that you understand what you do and want implications it had.


The context that I was missing at first:

> The Agence nationale de la sécurité des systèmes d'information (ANSSI; English: National Cybersecurity Agency of France) is a French service created on 7 July 2009 with responsibility for computer security [1].

I didn't know about "cargo-outdated". I liked having "npm outdated" within the JavaScript ecosystem, so I'll give this a try.

1. https://en.wikipedia.org/wiki/Agence_nationale_de_la_s%C3%A9...


`cargo-outdated` is mint. If you haven't already, consider installing `cargo-edit` too! It adds `cargo add`/`cargo rm` to add/remove crates, as well as `cargo upgrade` to bump semver versions.

https://github.com/killercup/cargo-edit


There's also cargo-crev which is pretty neat.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: