As long as we're plugging our projects, I'll mention the scpptool-enforced memory-safe subset of C++. Fil-C would be generally more practical, more compatible and more expedient, but the scpptool-enforced subset of C++ is more directly comparable to Rust.
scpptool demonstrates enforcement (in C++) of a subset of Rust's static restrictions required to achieve complete memory and data race safety [1]. Probably most notably, the restriction against the aliasing of mutable references is not imposed "universally" the way it is in (Safe) Rust, but instead is only imposed in cases where such aliasing might endanger memory safety.
This is a surprising small set of cases that essentially consists of accesses to methods that can arbitrarily destroy objects owned by dynamic owning pointers or containers (like vectors) while references to the owned contents exist. Because the set is so small, the restriction does not conflict with the vast majority of (lines of) existing C++ code, making migration to the enforced safe subset much easier.
The scpptool-enforced subset also has better support for cyclic and non-hierarchical pointer/references that (unlike Safe Rust) doesn't impose any requirements on how the referenced objects are allocated. This means that, in contrast to Rust, there is a "reasonable" (if not performance optimal) one-to-one mapping from "reasonable" code in the "unsafe subset" of C++ (i.e. traditional C++ code), to the enforced safe subset.
So, relevant to the subject of the post, this permits the scpptool to have a (not yet complete) feature that automatically converts traditional C/C++ code to the safe subset of C++ [2]. (One that is deterministic and doesn't try to just punt the problem to LLMs.)
The problem isn't dedicating public resources to trying to getting LLMs to convert C to Safe Rust after investments in the more traditional approach failed to deliver. The problem is the lack of simultaneous investment in at least the consideration and evaluation of (under-resourced) alternative approaches that have already demonstrated results that the (comparatively well-funded) translate-to-Rust approach thus far hasn't been able to.
> Profiles cannot achieve the same level of safety as Rust
So the claim is that the scpptool approach[1] can, while remaining closer to traditional C++, and not requiring the introduction of new language elements. Since the scpptool-enforced safe subset of C++ is an actual subset of C++, conforming code continues to build with your existing compiler. It just uses an additional static analyzer to check conformance.
For the 90% or whatever of C++ code that is not actually performance sensitive, the associated SaferCPlusPlus library provides drop-in and "one-to-one" safe replacements for unsafe C++ elements (like standard library containers and raw pointers). (For example, if you're worried about potentially invalid vector iterators, you can just replace your std::vector<>s with mse::mstd::vector<>s.) With these elements, most of the safety is enforced in the type system and not reliant on the static analyzer.
Conforming implementations of performance-sensitive code would be more restricted and more reliant on the static analyzer for safety enforcement. And sometimes requires the use of library elements, like "borrowing objects", which may not have analogies in traditional C++. But overall, even high-performance conforming code remains very recognizable C++.
The claim is that the scpptool approach is a straightforward path to full memory (and data race) safety for C++, and the one that requires the least code migration effort. (And again, as an actual subset of existing C++, not technically dependent on standard committees or compiler vendors for its implementation or deployment.)
> From what I'm aware of, Rust has poor ergonomics for programs that have non-hierarchical ownership model (ie. not representable by trees)
Yeah, non-hierarchical references don't really lend themselves to static safety enforcement, so the question is what kind of run-time support the language has for non-hierarchical references. But here Rust has a disadvantage in that its moves are (necessarily) trivial and destructive.
For example, the scpptool-enforced memory-safe subset of C++ has non-owning smart pointers that safely support non-hierarchical (and even cyclical) referencing.
They work by wrapping the target object's type in a transparent wrapper that adds a destructor that informs any targeting smart pointers that the object is about to become invalid (or, optionally, any other action that can ensure memory safety). (You can avoid needing to wrap the target object's type by using a "proxy" object.)
Since they're non-owning, these smart pointers don't impose any restrictions on when/where/how they, or their target objects, are allocated, and can be used more-or-less as drop-in replacements for raw pointers.
Unfortunately, this technique can't be duplicated in Rust. One reason being that in Rust, if an object is moved, its original memory location becomes invalid without any destructor/drop function being called. So there's no opportunity to inform any targeting (smart) pointers of the invalidation. So, as you noted, the options in Rust are less optimal. (Not just "ergonomically", but in terms of performance, memory efficiency, and/or correctness checking.) And they're intrusive. They require that the target objects be allocated in certain ways.
Rust's policy of moves being (necessarily) trivial and destructive has some advantages, but it is not required (or arguably even helpful) for achieving "minimal-overhead" memory safety. And it comes with this significant cost in terms of non-hierarchical references.
So it seems to me that, at least in theory, an enforced memory-safe subset of C++, that does not add any requirements regarding moves being trivial or destructive, would be a more natural progression from traditional C++.
> Yeah, non-hierarchical references don't really lend themselves to static safety enforcement, so the question is what kind of run-time support the language has for non-hierarchical references.
Yes. Back references are a big problem.
I just wrote a bidirectional transitive closure algorithm that uses many back references, with heavy use of Rc, RefCell, Weak, and ".borrow()". It's 100% safe rust.
This is the "proper" Rust way to write this sort of thing. The nice thing about doing it the "right" way was that, once it compiled, it needed few changes to work correctly. No mysterious errors at all.
But it took a lot of work to get it to compile. Some sections had to be rewritten to get the ownership plumbing right.
I put it up on the Rust forums for comments, and got replies that I should stop doing all that fancy stuff and just use indices into arrays.[1] Or arena allocation. Things that bypass the Rust ownership system. Those approaches would probably have more bugs.
(I'm starting to see a way to do compile time checking for this sort of thing. The basic concept is that run time borrows must be disjoint as to type, disjoint as to scope, or disjoint as to instance. The first is easy. The second requires inspecting the call chain, and there are problems with templates due to ambiguity over what a type parameter does in .borrow() activity. The third is almost a theorem proving problem, but if you restrict compile time checks for disjoint instances to a single function (or maybe a "class", a struct and its functions), it might be manageable. All this might take too much cleverness to use in practice. Too much time getting the ownership plumbing right, even with compiler support.
But I should write this up.)
> I put it up on the Rust forums for comments, and got replies that I should stop doing all that fancy stuff and just use indices into arrays.[1] Or arena allocation. Things that bypass the Rust ownership system. Those approaches would probably have more bugs.
I ran into this years ago as well. It was very unsatisfying. Maybe Rust is just missing a good GC type?
It's not an allocation problem. It's a back-reference problem. When struct A owns struct B, and B needs to be able to find A, that's surprisingly difficult to set up in Rust. Even for structures where B is inside of A's struct.
A couple of solutions in development (but already usable) that more effectively address UB:
i) "Fil-C is a fanatically compatible memory-safe implementation of C and C++. Lots of software compiles and runs with Fil-C with zero or minimal changes. All memory safety errors are caught as Fil-C panics."
"Fil-C only works on Linux/X86_64."
ii) "scpptool is a command line tool to help enforce a memory and data race safe subset of C++. It's designed to work with the SaferCPlusPlus library. It analyzes the specified C++ file(s) and reports places in the code that it cannot verify to be safe. By design, the tool and the library should be able to fully ensure "lifetime", bounds and data race safety."
"This tool also has some ability to convert C source files to the memory safe subset of C++ it enforces"
Fil-C is interesting because as you'd expect it takes a significant performance penalty to deliver this property, if it's broadly adopted that would suggest that - at least in this regard - C programmers genuinely do prioritise their simpler language over mundane ideas like platform support or performance.
The resulting language doesn't make sense for commercial purposes but there's no reason it couldn't be popular with hobbyists.
Well, you could also treat Fil-C as a sanitiser, like memory-san or ub-san:
Run your test suite and some other workloads under Fil-C for a while, fix any problems report, and if it doesn't report any problems after a while, compile the whole thing with GCC afterwards for your release version.
Right. And of course there are still less-performance-sensitive C/C++ applications (curl, postfix, git, etc.) that could have memory-safe release versions.
But the point is also to dispel the conventional wisdom that C/C++ is necessarily intrinsically unsafe. It's a tradeoff between safety, performance and flexibility/compatibility. And you don't necessarily need to jump to a completely different language to get a different tradeoff.
Fil-C sacrifices some performance for safety and compatibility. The traditional compilers sacrifice some safety for performance and flexibility/compatibility. And scpptool aims to provide the option of sacrificing some flexibility for safety and performance. (Along with the other two tradeoffs available in the same program). The claim is that C++ turns out to be expressive enough to accommodate the various tradeoffs. (Though I'm not saying it's always gonna be pretty :)
Even with UB holes plugged, C (and C++) are still unsafe, because there are many assumptions you might want to make that you can not encode in the language.
To get an example that's easy to understand: before the introduction of the 'const' keyword, you just couldn't express that some variable should never be changed. And no amount of UB sanitisers would have fixed this for you: you just couldn't express the concept. There's lots of other areas of these languages that are still in a similar state.
Eg there's no way to express that a function should be pure, ie not have side effects (but is allowed to use mutation internally).
Yeah, but C++ now supports "user-defined" annotations which effectively allow you to add the equivalent of any keyword you need, right? (Even if it's not the prettiest syntax.) For example, the scpptool static analyzer supports (and enforces) lifetime annotations with similar meaning to Rust's lifetime annotations.
I believe gcc actually does support `__attribute__ ((pure))` to indicate function purity. (I assume it doesn't actually enforce it, but presumably it theoretically could at some point.)
Might I suggest that the scpptool-enforced safe subset of C++ has a better solution for such data structures with cyclic or complex reference graphs, which is run-time checked non-owning pointers [1] that impose no restrictions on how or where the target objects are allocated. Unlike indices, they are safe against use-after-destruction, and they don't require the additional level of indirection either.
Hmm, I take it that the situation is that there are a number of vendors/providers/distros/repos who could be distributing your memory-safe builds, but are currently still distributing unsafe builds?
I wonder if an organization like the Tor project [1] would be more motivated to "officially" distribute a Fil-C build, being that security is the whole point of their product. (I'm talking just their "onion router" [2], not (necessarily) the whole browser.)
I could imagine that once some organizations start officially shipping Fil-C builds, adoption might accelerate.
Also, have you talked to the Ladybird browser people? They seemed to be taking an interested in Fil-C.
Tor wants to move to Rust, and they aren't happy with their C codebase. They want to expand use of multi-threading, and C has been too fragile for that.
Makes sense. But maybe the fact that that post is 4 years old serves to bolster the argument for Fil-C's value proposition. However much people may want to move away from their C code bases, the resources it takes to do so in a timely manner are often not so readily available.
This particular memory vulnerability, as I understand it, was a result of a `ReadonlySpan<>` targeting a resizable vector. A simple technique used by the scpptool-enforced safe subset of C++ to address this situation is to temporarily move the contents of the resizable vector into a non-resizable vector [1] and target the span at the non-resizable vector instead.
Upon destruction, the non-resizable vector will automatically return the contents back to the original resizable vector. (It's somewhat analogous to borrowing a slice in Rust.)
While it wouldn't necessarily prevent you from doing the flawed/buggy thing you were trying to do, it would prevent it from resulting in a memory vulnerability.
So the pointer (iterator) targeting an existing (stack-allocated) array declared on line 2 gets translated to an owning pointer/Box) targeting a (heap-allocated) new copy of the array. So if the original code was somehow counting on the fact that the pointer iterator was actually targeting the array it was assigned to, the translated code may (quietly) not behave correctly.
For comparison, the scpptool (my project) auto-translation (to a memory safe subset of C++) feature would translate it to something like:
1 mse::lh::TNativeArrayReplacement<uint8_t, 1> x = { 0 };
2 mse::lh::TNativeArrayReplacement<uint8_t, 1>::iterator y = x; // implicit conversion from array to iterator
3 *y = 1;
4 assert(*x == 1); /* SUCCESS */ // dereferencing of array supported for compatibility
or if y is subsequently retargeted at another type of array, then line 2 may end up as something like:
2 mse::TAnyRandomAccessIterator<uint8_t> y = x; // implicit conversion from array to iterator
So the OP project may only be converting C code that is already amenable to being converted to safe Rust. But given the challenge of the problem, I can respect the accomplishment and see some potential utility in it.
edit: added translation for line 2 in an alternate hypothetical situation.
the translated code may (quietly) not behave correctly.
The whole point of them show that example is that they say they catch this case, and bring it to the attention of the programmer:
If the original C program further relies on x, our translation will error out, and will ask the
programmer to fix their source code. This is another area where we adopt a “semi-active” approach
to verification, and declare that some patterns are poor enough, even for C, that they ought to be
touched up before the translation takes place.
Thanks for clarifying. The issue is what code would be rejected for auto-translation, not the correctness of an "accepted" translation (as my comment may have implied).
The point of noting that the example translation quietly does the wrong thing, is that that is the reason that it would have to be ("unconditionally") rejected.
While the paper does suggest that their example translation would be rejected:
> If the original C program further relies on x, our translation will error out
note that precisely determining whether or not the program "further relies on x" statically (at compile/translation-time) is, in general, a "Halting Problem". (I.e. Cannot be reliably done with finite compute resources.) So they would presumably have to be conservative and reject any cases were they cannot prove that the program does not "further rely on x". So it's notable that they choose to use a (provisional) translation that has to be rejected in a significant set of false positive cases.
And at least on initial consideration, it seems to me that an alternative translation could have, for example, used RefCell<>s or whatever and avoided the possibility of "quietly doing the wrong thing". (And thus, depending on your/their requirements, avoid the need for unconditional rejection.) Now, one might be an a situation where they'd want to avoid the run-time overhead and/or potential unreliability of RefCell<>s, but even then it seems to me that their translation choice does not technically avoid either of those things. Their solution allocates on the heap which has at least some theoretical run-time overhead, and could theoretically fail/panic.
Now I'm not concluding here that their choice is not the right one for their undertaking. I'm just suggesting that choosing a (provisional) translation that has to be rejected with significant false positives (because it might quietly do the wrong thing) is at least initially notable. And that there are other solutions out there that demonstrate translation of C to a (high-performance, deterministic) memory-safe language/dialect that don't have the same limitations.
scpptool demonstrates enforcement (in C++) of a subset of Rust's static restrictions required to achieve complete memory and data race safety [1]. Probably most notably, the restriction against the aliasing of mutable references is not imposed "universally" the way it is in (Safe) Rust, but instead is only imposed in cases where such aliasing might endanger memory safety.
This is a surprising small set of cases that essentially consists of accesses to methods that can arbitrarily destroy objects owned by dynamic owning pointers or containers (like vectors) while references to the owned contents exist. Because the set is so small, the restriction does not conflict with the vast majority of (lines of) existing C++ code, making migration to the enforced safe subset much easier.
The scpptool-enforced subset also has better support for cyclic and non-hierarchical pointer/references that (unlike Safe Rust) doesn't impose any requirements on how the referenced objects are allocated. This means that, in contrast to Rust, there is a "reasonable" (if not performance optimal) one-to-one mapping from "reasonable" code in the "unsafe subset" of C++ (i.e. traditional C++ code), to the enforced safe subset.
So, relevant to the subject of the post, this permits the scpptool to have a (not yet complete) feature that automatically converts traditional C/C++ code to the safe subset of C++ [2]. (One that is deterministic and doesn't try to just punt the problem to LLMs.)
The problem isn't dedicating public resources to trying to getting LLMs to convert C to Safe Rust after investments in the more traditional approach failed to deliver. The problem is the lack of simultaneous investment in at least the consideration and evaluation of (under-resourced) alternative approaches that have already demonstrated results that the (comparatively well-funded) translate-to-Rust approach thus far hasn't been able to.
[1] https://github.com/duneroadrunner/scpptool/blob/master/appro...
[2] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...