Last time I looked at it, c2rust translated from C to unsafe Rust, right? I'll point out a (neglected) project[1] to (partially) auto convert from C to a safe subset of C++. For example, if you need a png encoder/decoder library written in C++, perhaps the safest one is here [2].
Both libraries do the same thing. Boost lets you set your exception policy and I'm unsure if this Bounded Integer library gives the same level of control.
The bounded::integer types accept three template parameters: `integer<min, max, overflow_policy>`. That final parameter can be bounded::throw_policy, which is itself templated on the exception type thrown. The default exception policy is "overflow is undefined behavior". The other two policies supported out of the box are wrapping / modulo and saturation / clamping.
If maximum (or just more) flexibility wrt memory safety is the goal, I might suggest the author take a look at SaferCPlusPlus. In particular, it supports memory safe pointer/references[1][2] that can target objects associated with different allocators (including stack allocated objects) without, as the author desires, imposing the "sometimes confining" restrictions that Rust does. And in terms of data race safety, it allows you to hold "read-lock" and "write-lock" pointer/references simultaneously in the same thread [3], which adds a little extra functionality (and maybe convenience).
It's good that you include "shameless plug" in these posts to clarify, but it would be much much clearer if you included more explicit phrasing along the lines of "I only post to HN to plug my project called SaferCPlusPlus, not ever to talk about the submitted article. You can see here: https://news.ycombinator.com/threads?id=duneroadrunner."
> I only post to HN to plug my project called SaferCPlusPlus
From this account, yes. Too much? Sorry, (you can see) I haven't gotten much feedback. Or is having a separate, project-specific account in itself not cool?
> not ever to talk about the submitted article.
I try to post/plug only when I think it's relevant. I think. For example, in the "gmm.pdf" linked in the comment I responded to, the author says that specific types of references can only target objects owned/allocated by the associated allocator. There is no available reference type that can target objects from different allocators. References in Rust, for example, can target any object regardless of how they were allocated, but imposes strict restrictions that the author implies he's trying to avoid.
So I tried to point out that SaferCPlusPlus has pointer types that can safely target objects allocated by different allocators and do not have the strict restrictions of Rust's references. As far as I know, these types of pointers are (still) unique to SaferCPlusPlus, and I assume I am one of a few people who is familiar with these pointers. But there's nothing proprietary about them. If the author is constructing a language with a goal of flexibility wrt to memory safety, I thought he might consider whether such pointer/reference types might be compatible with his language design. I think they unquestionably increase flexibility (while maintaining memory safety).
> There is no available reference type that can target objects from different allocators.
Cone actually does support Rust-like, lifetime-constrained borrowed references which can do exactly that safely. Cone also supports raw pointers (however de-reference safety is the responsibility of the programmer).
I appreciate the chance to learn about your language's unique form of reference type. I am less likely to call them safe than you, no doubt because I use a different criteria for safety. A key requirement I have placed on references (vs. pointers) is that you can always de-reference them and get a valid value with no chance of exception. I don't think your references would comply with this.
A Cone programmer would need to use raw pointers to throw off the shackles of lifetime constraints but, unlike with your references, they could not expect such pointers to turn into nullptr if the object they refer to has been freed. Given the nature of Cone's design, there is no way to accomplish this mechanic with decent performance, especially given that borrowed references and raw pointers are both able to point inside an allocated object.
I do appreciate your bringing it to my attention and wish you all the best with getting others to learn about and adopt your language.
> Cone actually does support Rust-like, lifetime-constrained borrowed references
Ah, so kind of a super-set of Rust functionality. Presumably these would require a "borrow checker" or equivalent? Is that already implemented? So how do you address the safety of, say, taking a reference to an element in a (resizable) vector? Rust's "exclusivity of mutable references" restriction intrinsically makes the vector immutable while the borrowed reference exists, but do I understand that Cone doesn't have that restriction? The "C++ lifetime profile checker", on the other hand, makes the vector non-resizable (but leaves the data mutable).
> A key requirement I have placed on references (vs. pointers) is that you can always de-reference them and get a valid value
SaferCPlusPlus provides both a pointer that throws an exception if you attempt an invalid memory access (though it could just as easily return an optional<>, and you can always query if the target is valid), and one that terminates the program if its target is ever deallocated prematurely (and thus (technically) satisfies your criteria). (The latter has less/minimal overhead.)
> A Cone programmer would need to use raw pointers to throw off the shackles of lifetime constraints
Or reference counting pointers or GC, right? The features needed to implement the pointers I mentioned is either support for calling a destructor on move operations or the ability to make an object non-movable, and, support for copy constructors or the ability to make an object uncopyable. Does/could your language support some combination of those features?
I explain the reason the pointers are important in an article called "Implications of the Core Guidelines lifetime checker restrictions" [1]. Specifically, I give an example of reasonable C++ code [2] that historically had no corresponding efficient implementation in Safe Rust. (I think it's still the case, but I haven't fully investigated the implications of Rust's new "pinning" feature.) It can, however, be implemented in a memory safe way using the SaferCPlusPlus pointers in question [3]. Basically the example just temporarily inserts a reference to a (stack allocated) local variable into a list (or whatever dynamic container), given that the local variable does not outlive the container.
> especially given that borrowed references and raw pointers are both able to point inside an allocated object
SaferCPlusPlus has the equivalent of "borrowed references"[4] (though even more restricted until C++'s "borrow checker" (the aforementioned "lifetime profile checker") is completed), and they can safely point "inside" (allocated) objects. Note that the second safe pointer type (the one that potentially terminates the program), is a "strong" pointer, and there is a simple mechanism for obtaining a "borrowed reference" from a strong pointer [5]. And from there, a simple mechanism for obtaining a reference to an (interior?) member [6].
The other pointer (the one that potentially throws an exception) would be considered a "weak" pointer, and you cannot obtain a "borrowed reference" directly from a weak pointer. But often, the weak pointer is used to target an object that can yield a borrowed reference (like a strong pointer, for example), or a borrowed reference directly.
> all the best with getting others to learn about and adopt your language.
You too :) I can see the appeal of this sort of clean, flexible language. But in the case of SaferCPlusPlus the goal is not necessarily (just) for programmers to adopt it. In part it's maybe a demonstration (to language designers such as yourself :) of a set of language elements that use run-time safety enforcement mechanisms but are a little more flexible than (and might be a good / (unintuitively) needed complement to) their counterparts that rely on strictly compile-time safety enforcement.
Oh and if you do get around to checking out SaferCPlusPlus in more depth, apologies for the inadequate documentation in advance. Feel free to post any questions you might have. :)
Yes, borrowed references being lifetime-constrained means that I have a "borrow checker" that ensures that. It is only partially implemented.
You are correct that Cone supports a static, shared, mutability permission, including on borrowed references into resizable arrays. The short safety answer is array resizing is only possible when you have a unique reference to the array, so you can't run into the trouble you describe. I wrote a post about it.[1]
You left out an important clause I specified in my criteria: "with no chance of exception". Terminating the program in the event of dereferencing a reference does not meet the safety requirements I set for Cone references.
Yes, only borrowed reference have lifetime constraints. I did not mention the allocator-based reference in that quote because of context. Cone does support a distinction between move vs. copy types. Unlike with Rust, the distinction is typically inferred from the definition of the type. Currently, all memory is "pinned", but that may become more flexible in the future.
The safety strategy for Cone involves versatility: giving the programmer a curated collection of permissions and memory allocators, each with distinct advantages and disadvantages. The safety of certain options can be completely determined statically, making them inflexible but fast. Others will use a mix of static and runtime mechanisms, which offer greater flexibility but incur a runtime cost.
That said, I admit I am somewhat uncomfortable temporarily injecting a borrowed reference into a longer-lived container as snippet 4 shows. I feel like any logic able to ensure this is only done safely would be too complicated for my taste, at least for now. I understand how your mechanism would address this scenario, but again that does not ascribe to my more restrictive notion of safety. If the program does it wrong, it crashes.
I just read it, and I thought it was great. I had a similar, if perhaps not-as-well-thought-out, reaction to Manish's (I agree, excellent) post. I think SaferCPlusPlus basically implements the permission mechanisms you listed in the summary (as well as the preceding "Race-Safe Strategies" post). (Although with some of the restrictions enforced at run-time rather than compile-time.) Looking forward to Cone 1.0. :)
p.s.: btw, the link on your post to the preceding "Race-Safe Strategies" post is broken
No, I just read it. SaferCPlusPlus does support all the permission "modes" listed, except the "opaque" one, and transitioning between them (though being stuck with C++'s move semantics). The permissions modes that aren't intrinsic to C++ are enforced at run-time. Objects to be mutably (non-atomically) shared need to be put in an "access control" wrapper, which is kind of like a generalized version of Rust's RefCell<> wrapper.
As I noted in my original comment, in the "mutex"/"RwLock" case, SaferCPlusPlus allows you to simultaneously hold read-locks and write-locks in the same thread. Which seems natural, since SaferCPlusPlus (and Cone) allows const and non-const pointer/references to coexist in the same thread. But in this case it actually provides increased functionality. It is the functional equivalent of replacing your mutex (and Rust's RwLock) with an "upgradable mutex", which facilitates better resource utilization in some cases, right? It also provides new opportunities to create deadlocks, so the mutex has to detect those.
Btw, I am certainly a pot talking to a kettle here, but your "mutex1" urgently needs a better name, right?
Yeah, this is interesting. They're saying they can't determine whether a pointer targets an array buffer or not? Perhaps they might want to take a look at the (long neglected) "C to SaferCPlusPlus" translator[1] which can do this. (It was an unexpectedly taxing undertaking though.) It converts C arrays and allocated buffers used as arrays into memory safe implementations of std::array<>s and std::vector<>s, so failure to properly identify them would generally result in output code that wouldn't compile.
The examples they give of problematic code in the paper:
void f(int* a) {
*(int**)a = a;
}
and
f1(((int*) 0x8f8000));
don't strike me as the kind you would often encounter in real-world code.
> The syntax they use is rather clunky
The output code of the "C to SaferCPlusPlus" translator replaces the types and declarations with macros[2] that can be redefined with a compile-time directive to either use the safe C++ implementation, or revert to the original unsafe native C implementation. The argument being that using macros instead of custom syntax makes the source code more versatile. And existing C programmers already "get" macros.
What happened there? Where are the array types? Wrong place to look?
If inference can't make a definitely good decision, maybe translators should guess, conservatively. That is, if it looks like something needs an array type parameter, make it an array type parameter with subscript checking. Then run tests on the translated program and see if that works. That's what humans do on such code. Machine learning has potential here. For any array in a working program, there must be some expression of some variables that expresses the size of the array. If humans can't find that expression, the program is unmaintainable and probably has a bug.
There are really 3 cases.
1. this is a pointer, and it's never subscripted or offset. That's a pointer to a single instance of something.
2. this is a pointer which is subscripted or offset, and we can tell from context how big the array is.
3. This is a pointer which is subscripted or offset, but auto-translation fails to figure out how big the array is supposed to be.
The problem is to convert (3) into (2).
I tend to think that a good metric for C code quality is how hard that is. If it's not obvious by looking how big something is supposed to be, there's probably a potential bug.
Thanks for noticing :) It's been quite a while since I worked on the code, but I believe that the translator intentionally left types declared as "char {star}" unmodified assuming that they were being used as strings [1] rather "regular" array buffers. I'm guessing that dealing with strings would have been a lot more work because it would require providing safe compatible replacements for all the standard C library string functions.
I think you should find that array buffers of other types, like "unsigned char" or "const unsigned char", and their associated pointer iterators are translated to their corresponding macros. I'd be interested if you find otherwise. If you're interested, the relevant code for the translator is in the "safercpp" subdirectory [2]. It's not super-well commented so if you have any questions feel free to post them in the "issues" section of the repository.
OK, Here's a non-string function where the translator is trying to deal with C written like it's 1980:
static unsigned countZeros(MSE_LH_ARRAY_ITERATOR_TYPE(const unsigned char) data,
size_t size, size_t pos)
{
MSE_LH_ARRAY_ITERATOR_TYPE(const unsigned char) start = data + pos;
MSE_LH_ARRAY_ITERATOR_TYPE(const unsigned char) end = start +
MAX_SUPPORTED_DEFLATE_LENGTH;
if(end > data + size) end = data + size;
data = start;
while(data != end && *data == 0) ++data;
/*subtracting two addresses returned as 32-bit number
(max value is MAX_SUPPORTED_DEFLATE_LENGTH)*/
return (unsigned)(data - start);
}
What guarantees that the "while" loop will not run away and take "data" outside the array bounds?
I proposed a version of C with slices and references, where you could write that like this:
The "data" parameter has size info, so the language knows how big it is.
The "work" variable is a slice of "data". This eliminates the need for pointer arithmetic. Much pointer arithmetic in C, especially where you have a pointer partway into an array, is an attempt to emulate a slice.
Automatically extracting slice usage from code with pointer arithmetic is a tough problem.
But not impossible. When you see code constructing something like
data = start;
while(data != end && *data == 0) ++data;
The slice is the same pointer, but the there's now valid size information associated with it.
If you do transformations like that, you get a version of C where subscript checking is possible.
You can then hoist or prove out many of the subscript checks. Here, the compiler would be expected to
understand that if an array subscript is less than LENGTH of the array, it's safe. LENGTH here, as
I wrote in my paper, refers to the length of the array as known to the compiler from the array declaration.
Here, array lengths can be expressions evaluated at declaration time. That's how length info gets passed around.
const unsigned char &(data)[size]
as a parameter means "this is an array of size "size". "size" comes in via another parameter. The function can assume "size" is valid, and all callers must check that, either at compile time or run time.
If you can't write an expression for the size of something, you have a big problem with your program.
> What guarantees that the "while" loop will not run away and take "data" outside the array bounds?
What do you mean "the array bounds"? The code is memory safe. "data" is an iterator that knows exactly what array/container it's pointing to, and that container knows its own size. Dereferences are bounds checked (by default).
This translated code is not intended to be performance optimal. The translator does not add, remove or rearrange any of the original source code elements, it simply replaces some of them with macros that are defined as functionally equivalent, memory safe C++ substitutes for the original element. Doing it this way has the benefit of allowing you to "disable" the memory safety mechanisms by reverting the macro definitions to the original (unsafe) elements.
I have not yet gotten around to addressing performance of the translated code. In order to preserve the ability to revert back to pure C code, there would need to be an additional set of macros (like maybe an "array view" macro) that could be mapped to their (safe) high performance C++ counterparts but that would be more restricted in their usage.
But at this point I think the value of that is questionable. If you need your code to be memory safe and high performance, the most expedient thing to do is to just accept the translated code as C++ code (or SaferCPlusPlus code) and re-optimize the performance bottlenecks as idiomatic SaferCPlusPlus code. SaferCPlusPlus is, along with Rust, the fastest [1] option for memory (and data race) safe programming.
And if you don't like the C++ language as whole, just (define and) stick to a subset you're comfortable with, right? I mean, (I think your proposal is fine as an extension of C, but) I don't see the point in extending the C language with things like views/slices/spans, when the C language is already extended with those. It's called C++ (or some subset thereof) right? And with C++ you can solve the memory (and data race) issues much more comprehensively and performantly (if that's a word :) than with any extension to C. No?
For cases where the platform supports C++ (and its standard library), there is kind of a corresponding "checked C++"[1] that also supports the "completely incremental" migration approach. (And obviously supports "array view" type objects.)
Yes, It's hard to deny the intuitive appeal, but what's notably missing from that blog post, and seemingly any other article about it, is a consideration of the cost/downsides of universal imposition of the "exclusivity of mutable references" restriction. Rust provides the RefCell wrapper to essentially circumvent the restriction on demand, but i) that also essentially circumvents the "invariant protection" benefits of the policy, and ii) you can just as easily use an equivalent wrapper[1] in C++ to impose the same restriction. At which point the difference between Rust and C++ just kind of becomes which policy do you want to be the zero-overhead default.
I mean you could imagine a hypothetical future scenario where it is demonstrated that the optimizers are good enough to essentially eliminate the run-time cost of RefCell wrappers, and a lot of Rust programmers just start wrapping everything with RefCells by default.
Oh yeah, there are plenty of reasons to prefer Rust over C++. I think it's also reasonable to favor Rust's "exclusivity of mutable references" default as a matter of personal preference. I'm just not sure the formal or technical argument has yet been made that universally applying that restriction is necessarily the "better" default overall.
Rust's usability is advancing, but I'll note that so are C++'s memory safety facilities. Arguably, modern C++ programming is becoming intrinsically more safe than more traditional coding styles (though arguably the use of string_views and spans outside of function parameters is a step backwards), but progress is also being made on the lifetime profile checker (C++'s borrow checker analogue), and there are libraries[1] that allow you avoid potentially unsafe C++ elements, to the degree that (memory and data race) safety is a priority.
At present, C++ is significantly lacking in safety enforcement tooling and "community enthusiasm" for memory safety compared to Rust. But to me, it is not obvious that that will be the case indefinitely. It's even plausible that Rust and C++ will sufficiently converge in flexibility and safety that at some point automated translation between the two languages will be a thing.
At the moment, I think the main relevant difference in fundamental (as opposed to not-yet-available tooling or whatever) capability between the two languages is Rust's lack of support for move constructors/destructors. I think it prevents the safe, efficient, unintrusive, robust implementation of a small but significant set of algorithms / data structures.
But I guess my point is that I think that C++, and its existing codebases, are not necessarily condemned indefinitely to be memory unsafe in the way they have been historically. And that could be a factor that contributes to the justification of deciding to continue to use C++ in some cases.
They are working on it. The analogue to the borrow checker in C++ is called the "lifetime profile checker" and (an incomplete version) is included in MS Visual C++, but last time I checked (in January) it seemed to still have too many false positives to be practical.
In the mean time, I think "the minimum necessary changes" to achieve memory and data race safety is to replace all your unsafe C++ elements (pointers, arrays, vectors, string_views, etc.) with compatible substitutes from the SaferCPlusPlus library [1]. You don't even need to replace them all at once. You can replace them incrementally and your code will continue to compile and run throughout the process. And where needed, maintain maximal performance as well [2].
I suggest that the most expedient (cheapest) language to migrate the existing code base to would be a memory safe subset of C++ [1]. In practice most of the safety benefit could be obtained from a just a partial migration. Specifically, just banning raw pointers/views/spans and non-bounds-checked arrays and vectors. From a quick glance at the code in the (quite small) patch diff, the code in question includes:
DOMArrayBuffer* result = DOMArrayBuffer::Create(raw_data_->ToArrayBuffer());
...
raw_data_.reset();
...
return result;
I'd imagine the effort/time/money it would take to replace those raw pointers with memory safe substitutes [2][3][4] (and enforce the ban going forward) would be relatively modest. Performance shouldn't be an issue [5].
Ug. After closer inspection, it looks like those particular raw pointers seem to be managed by a garbage collector. (Specifically, the "Blink GC" [1].) As others have pointed out, this particular bug may not actually be a C++ issue. (Or at least not a typical one.)
Not so sure. I haven't worked on this code for a while and have no non-public knowledge of the bug, but ArrayBufferBuilder does not inherit from any of the GCed base classes, and has the USING_FAST_MALLOC macro which is used for non-GC classes.
https://chromium.googlesource.com/chromium/src/+/refs/heads/...
ArrayBufferBuilder isn't, but DOMArrayBuffer seems to be a GC managed type [1], right? And, before the patch, the DOMArrayBuffer held a "refcounting pointer" potentially targeting raw_data_'s reference counted ArrayBuffer, right? I don't see any immediately apparent use-after-free with this, so I assume raw_data_'s ArrayBuffer is being messed with elsewhere? As someone who worked on the code, do you have an idea/hunch about where the invalid memory access actually occurs?
Yes, DOMArrayBuffer inherits via ScriptWrappable from GarbageCollectedFinalized<> so it's on the GCed heap. I don't understand the UAF yet, I'm hoping someone will write a blog post on it later :-).
[1] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...
[2] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...