Nice. If this can be reasonably retrofitted to existing libraries and projects so that the safety properties compose from local to global, then this could actually be a meaningful improvement to the safety of real-world C code.
There would be many more steps required "toward" memory safety, such as eliminating all forms of UB including uninitialized memory, out of bounds pointers, data races, etc. but if this direction is to be pursued it has to start somewhere.
Not sure I agree with that premise as the cake source would have been written in a way to be compatible with ownership annotations from the get go vs retrofitting an existing codebase. Help me understand how something like this composes:
Now open_file callers would need to know that ownership is being returned which means that local variables would need to have the owner annotation propagated. That’s what I mean when I say it’s not composable - the ownership has to propagate fully throughout the codebase for a specific resource. Of course maybe you know better as this is just an initial glimpse on my part.
Not sure if I understood. the usage of old and new (checked and unchecked) is a challenge.We may have the same headers used in both codes.
The other challenging is that same source may compile in compiler with or without support.
Ownership Feature Strategy (Inspired by stdbool.h)
If the compiler supports ownership checks and qualifiers such as _Owner, _View, _Obj_view, etc., it must define __STDC_OWNERSHIP__.
However, even if the compiler implements ownership, it is not active by default. The objective is to have a smooth transition allowing some files without checks. For instance, a thirty part code inside your project.
For instance, when compiling this file, even if the compiler supports ownership we don't have errors or warnings because the checks are not enabled by default.
#include <stdlib.h>
int main() {
void \* p = malloc(1);
}
A second define __OWNERSHIP_H__, is used to enable ownership. This define is set when we include <ownership.h> at beginning.
#include <ownership.h>
#include <stdlib.h>
int main() {
void \* p = malloc(1); //error: missing owner qualifier
}
The other advantage of having a <ownership.h> is because owner is a macro that can be defined as empty in case the compiler does not support ownership, allowing the same code to be compiled in compilers without ownership support.
To me ownership composability means you can express ownership locally without it infecting anything outside of that local scope. However, ownership is not always tied to lexical scope and in those cases it doesn’t compose. In other words, you can add all the annotations you want locally and a) the code will still be incorrect b) the code may not compile as you show in your other snippet because now the function signature needs to contain the ownership sigil which then results in all callers needing the ownership sigil. Disabling it partially only emulates composition if the callers are in external files compiled with alternate options / skipping ownership validation. If you have local calls to the newly annotated function, you’ll be back into needing to fix the entire file’s annotations.
Unless I misunderstood what OP meant about ownership composition.
I might be missing something, but this seems to require ownership annotations on all functions, e.g. a compatible and correct prototype for `fclose` to correctly note that the owned `FILE *` is moved into the call.
If that's correct, then this is somewhat practically limited: either pre-existing codebases will need to be retrofitted with an essentially bespoke set of macros, or the compiler will need to be "fail open" by default. The tradeoffs between these two are hard (substantial developer pain versus being ineffective against the bulk of a compiled program's API surface).
(Also, this design appears to be for temporal safety only, not spatial safety. But again I might have missed something.)
Compilers cheat when working with known libraries - they don't need annotations on known functions, much like they don't need the specific implementation of them to know the semantics. This occasionally goes wrong when the compiler assumes any function with a given name must be that function from libc, e.g. the programmer writes a function called `sin`, there's a risk of it being mistaken for the libm function.
Right. It's less known libraries I'm worried about (libc can easily be layered over, as you've said) and more unknown ones.
In particular: there are a lot of ~universally used libraries with ludicrously complex C APIs that undergo significant own/borrow semantic changes between releases. Things like OpenSSL. These libraries will need to carry these annotations upstream for correctness and up-datedness reasons.
currently cake uses existing msvc and gcc headers.
These headers does not have any owner qualifiers.
The temporary solution, is the re-declare the malloc etc when compiling with cake and not complain withe the function signature difference only by owner qualifiers.
if this ownership were standard then gcc and mscv headers would have the qualifiers there enabled or not , but they would be there.
> new methods of communication with the compiler have been established.
From what I understand, this appears to a be separate binary from GCC/Clang that does static analysis and outputs C99.
Can this be a GCC plugin? I know we can write plugins that are activated when a specific macro is provided, and the GCC plugin event list allows intercepting the AST at every function declaration/definition. Unless you're rewriting the AST substantially, I feel this could be a compiler plugin. I'd like to know a bit more about what kinds of AST transformations/checks are run as part of Cake.
Cake is a C23 front end, but it can also be used as a static analysis tool.
The qualifiers can be empty macros then the same code can be compiled with gcc , clang and the static analysis of ownership can be using cake.
Inside visual studio for instance, we can have on external tools
The main annotations are qualifiers (similar to const). C23 attributes were considered instead of qualifiers, but qualifiers have better integration with the type system. In any case, macros are used to be declared as empty when necessary.
The qualifiers and the rules can be applied to any compiler.
Something harder to specify (but not impossible) is the flow analysis.
Sample of rule for compilers.
int * owner a;
int * b;
a = b;
we cannot assign view to an owner object.
this kind of rule does not require flow analysis.
Cake implements defer as an extension, where ownership and defer work together. The flow analysis must be prepared for defer.
int * owner p = calloc(1, sizeof(int));
defer free(p);
However, with ownership checks, the code is already safe. This may also change the programmer's style, as generally, C code avoids returns in the middle of the code.
In this scenario, defer makes the code more declarative and saves some lines of code. It can be particularly useful when the compiler supports defer but not ownership.
One difference between defer and ownership checks, in terms of safety, is that the compiler will not prompt you to create the defer. But, with ownership checks, the compiler will require an owner object to hold the result of malloc, for instance. It cannot be ignored.
The same happens with C++ RAII. If you forgot to free something at our destructor or forgot to create the destructor, the compiler will not complain.
In cake ownership this cannot be ignored.
struct X {
FILE * owner file;
};
int main(){
struct X x = {};
//....
} //error x.file not freed
The difference with cake ownership and RAII , is that with C++ RAII, the destructor is unconditionally called at end of scope.
Then flow analysis is not required in RAII.
Cake requires flow analysis because "destructor" is not unconditionally called.
When the compiler can see that the owner is not owning a object (because the pointer is null for instance) then the "destructor" is not necessary.
To understand the difference.
With flow analysis (how it works today)
int main()
{
FILE *owner f = fopen("file.txt", "r");
if (f)
fclose(f);
}
Without flow analysis (or with a very simple one, where the destroy must be the last statement)
void fclose2(FILE * owner p) {
if (p) fclose(p);
}
int main()
{
FILE *owner f = fopen("file.txt", "r");
if (f){
}
fclose2(f);
}
C safety addons like this (there have been many) is that they don't prevent extracting raw pointers from controlled pointers. Optional memory safety isn't.
> If this can be reasonably retrofitted to existing libraries and projects
That's the problem.
If you want to fool around in this space, consider revisiting C++ to Rust conversion.
There's something called Corrode, which compiles C to a weird subset of Rust full of objects that implement C raw pointers. The output is verbose and unmaintainable.
What's needed is something that can figure out how big things are and who owns what, possibly guessing, and generate appropriate ideomatic Rust. Now that LLMs are sort of working, that might be possible.
Can you ask Github Co-pilot to look at C code and answer the question "What is the length of the array 'buf' passed to this function"? That tells you how to express the array in a language where arrays have enforced lengths, whicn includes both C++ and Rust. With hints like that, ideomatic translation becomes possible. Bad guesses will result in programs that subscript out of range, which is caught at run time. But guesses should be correct most of the time, because C programmers tend to use the same idioms for arrays with lengths. Forms such as "int read(int fd, char* buf, size_t buf_l)" show up often.
Using LLMs to help with tightening up existing code might work.
Optional memory safety is, when you can opt an entire project into a "strict" mode, and this becomes trivially verifiable by others. I imagine that's the goal here.
Optional security is a problem only when you need to remember a million different rules and gotchas, because you will inevitably miss a spot. But if it's a global toggle, it's pretty good. "Use '-fmemsafe' for C/C++" is as tractable as "don't use 'unsafe' in Rust".
Yeah, as you note, library compatibility is an issue. But it's an even bigger issue when bootstrapping a new, safe language: you gotta implement the libraries from scratch, and you never really get to full parity with C/C++. Getting it done for the top 10 most-used libraries would make a spectacular difference in itself.
I should note that I'm not a huge believer in "saving" C/C++ as the memory-safe language of the future - I think there are lingering cultural problems around the standards that we had no luck overcoming for decades - but I also don't think the duo is going away any time soon, so might as well expend some effort on making it a safer tool.
>Can you ask Github Co-pilot to look at C code and answer the question "What is >the length of the array 'buf' passed to this function"? That tells you how to >express the array in a language where arrays have enforced lengths, whicn >includes both C++ and Rust
this is the way you tell C what is the size of array.
You can write that in C, but it doesn't really do anything. It's equivalent to
void f(int n, int a[]) {
}
Why? So that you can write
void f(int n, int m, int a[n][m]) {
}
which declares a 2-dimensional array parameter. In that case, the "m" is used to compute the position in the array for a 2D array. The "m" doesn't do anything.
This is equivalent to writing
void f(int n, int m, int a[][m]) {
}
This is C's minimal multidimensional array support, known by few and used by fewer.
Over a decade ago, I proposed that sizes in parameters should be checkable and readable I worked out how to make it work.[1] But I didn't have time for the politics of C standards.
Do you have source on this syntax? Does the `[n]` actually do anything here? Fooling around in godbolt, `void f(int n, int a[n]) {` is the same as `void f(int n, int a[]) {` and doesn't appear to change assembly or generate any warnings/errors with improper usage.
The major difference is when the array is multi-dimensional. If you don't have VLAs then you can only set the inner dimensions at compile time, or alternatively use pointer-based work-arounds.
Even in the case of one-dimensional arrays, a compiler or a static analyzer can take advantage of the VLA size information to insert run-time checks in debug mode, or to perform compile-time checks.
the cake implementation cannot be mapped to rust. I am not rust specialist but one concept for instance is that a owner pointer owns two resources at same time, the memory and object. In rust it is one concept.
Owner pointers take on the responsibility of owning the pointed object and its associated memory, treating them as distinct entities. A common practice is to implement a delete function to release both resources, as illustrated in Listing 7:
Listing 7 - Implementing the delete function
#include <ownership.h>
#include <stdlib.h>
struct X {
char *owner text;
};
void x_delete(struct X *owner p) {
if (p) {
/*releasing the object*/
free(p->text);
/*releasing the memory*/
free(p);
}
}
int main() {
struct X \* owner pX = calloc( 1, sizeof \* pX);
if (pX) {
/*...*/;
x_delete( pX);
}
}
let p: Box<X> = Box::new(X { ... });
let x2 = X { ... };
// Moves x2 into the same memory as the first X.
// The first X is automatically dropped as part of this assignment.
// Also consumes x2 so x2 is not available any more.
*p = x2;
// Drops the X that was originally assigned to x2 and then moved into p.
drop(p);
// No need, nor is it possible, to destroy x2.
Thanks for the rust sample. It looks very similar.
Can the allocator be customized?
As I said I am not Rust specialist.
Also, in my understanding is that in Rust, sometimes a dynamic state is created when the object may or may not be moved.
In cake ownership this needs to me explicit ( and the destructor is not generated)
I also had a look at Rust in lifetime annotations.
This concept may be necessary but I am avoiding it.
Consider this sample.
struct X {
struct Y * pY;
};
struct Y {
char * owner name;
};
An object Y pointed by pY must live longer than object X.
(Cake is not checking this scenario yet)
Also (classic Rust sample)
int * max(int * p1, int * p2) {
return *p1 > *p2 ? p1 : p2;
}
int main(){
int * p = NULL;
int a = 1;
{
int b = 2;
p = max(&a, &b);
}
printf("%d", *p);
}
This is not implemented yet but I want to make the lifetime of p be the smallest scope. (this is to avoid lifetime annotations)
int * p = NULL;
int a = 1;
{
int b = 2;
p = max(&a, &b);
} //p cannot be used beyond this point*
`Box<T>` is the type of an owning pointer that uses the default global allocator, and `Box<T, A>` is the type of an owning pointer that uses an allocator of type `A`. The latter is unstable, ie it can only be used in nightly Rust.
(Also the fact that the latter changes the type means a large part of existing third-party code as well as a bunch of code in libstd itself becomes unusable if you want to use a type-level custom allocator because they only work with `Box<T>`. But that's a different discussion...)
>An object Y pointed by pY must live longer than object X.
Yes, the py field in Rust would use a reference type instead of a pointer, and the reference would need to have a lifetime annotation, and the compiler would work to prevent the situation you describe:
struct X<'a> { py: &'a Y }
let y = Y { ... };
let x = X { py: &y };
drop(y); // error: y is borrowed by x so it cannot be moved.
But to be clear, the `'a` lifetime syntax is not what's making this work. What's making this work is that the compiler tracks the lifetimes of references in general. This works in the same way even though there are no lifetime annotations:
let y: String = "a".to_owned();
let x = &y;
drop(y); // error: y is borrowed by x so it cannot be moved.
do_something_with(x);
The explicit lifetime annotations are just for a) readability, and b) because sometimes you want to name them to be able to express relationships between them. Eg if two lifetimes 'a and 'b are in play and you want to express that 'a is at least as long as 'b, then you have to write a `'a: 'b` bound. In many cases they can be omitted and the compiler infers them automatically.
(question about rust.. this is not implemented in cake yet)
Let's say I have to objects on the heap.
A and B. A have a "view" to B.
Then we put a prompt for the user. (or dynamic condition)
"Which object do you want to delete first A or B?"
Then user select B.
How this can be checked at compile time?
The code path that drops B will not compile unless that code path drops A first. It doesn't matter if that code path is in response to user input or not. Again, as I said, the point is that the compiler tracks the lifetime of all references. In this case A contains a reference to B, so any code that drops B without dropping A will not compile.
> Also, in my understanding is that in Rust, sometimes a dynamic state is created when the object may or may not be moved.
Yes?
if cond {
drop(p)
}
// p may or may not be dropped here
or
let p;
if cond {
p = something();
}
// p may or may not be set here
These trigger dynamic drop semantics, in which case the stackframe has a hidden set of drop flags going alongside any variable with dynamic drop semantics, to know if they do or don’t need to be dropped. The flags are automatically updated when the corresponding variables are set or moved-from.
In cake there is no temporal hole, we cannot reuse the deleted object.
This prevents double free and use after free.
int main() {
struct X * owner p = malloc(sizeof(struct X));
p->text = malloc(10);
free(p->text); //object text destroyed
//p->text is on uninitialized state.
//cannot be used (except assignment)
struct X x2 = {0};
*p = x2; //x2 MOVED TO *p
x_delete(p);
These are tools. How well they work largely depends on the discipline and processes followed by the development team using them. If the concern is that raw pointers can be extracted from controlled pointers, then the development team needs to check for this. No tool is perfect, but tools like these can be used effectively to reduce attack surfaces and improve safety.
Even languages like Rust make memory safety optional. One can drop into an unsafe block and perform all sorts of abominable things. Such escape hatches are necessary to color outside of the lines when one must do system software development or optimize software beyond what the compiler can do on its own. At some point, the developer must be trusted to learn the tool or to use discretion when considering something like unsafe. In both cases, a development team can peer review these choices.
What makes me interested in tools like Cake and similar tools is that these bring us closer to being able to use proof assistants to build up reasoning about the times when we must color outside of the lines. Whether C, C++, or Rust, being able to import code into a proof assistant or extract efficient code from a proof assistant can further assist us when our use cases exceed what is possible with the safety features in our language or tooling.
That it can translate C23 to C89 means it has most of the work in place to translate C23 to C23, or C99 to C99 etc. If that is done in a (mostly) reversible fashion - successfully re-encode back to the original, where you `preprocess -> parse -> unparse -> re-preprocess` which is a nuisance but possible, then it opens the door to much more aggressive type systems.
In particular, the input can be C with the ownership annotations, and if they're valid, the output can be C with those annotations dropped to be fed into some other compiler. Or whatever other invariant systems the compiler dev is interested in.
Or the input could be C extended with namespace {} syntax, C++ style lambdas, contract checking - whatever you wish really, and the output can be the extensions desugared into C. Templates (possibly the D style ones) can be implemented as instantiating normal functions from said template.
That the output is C means this is usable in all the pipelines that already work with C.
A 'C' -> C compiler which preserves most source code unchanged (i.e. would be the identity transform on some input) and which implements something like constexpr on functions (by running the interpreter during the transform) could be argued to be a forward looking C implementation. Specifically C23 has constexpr, but in an extremely limited form, and aspires to extend that to be more useful later.
Equally one which replaces 'auto' with the name of the type (and similar desugaring games) is still a C to C compiler, just running as a C23 to C99 or whatever. Resolve the branch in _Generic before emitting code as part of downgrading C11.
The lifetime annotations are an interesting one because they're a different language which, if it typechecks, can be losslessly converted into C (by dropping the annotations on the way out).
I'm not sure where in that design space the current implementation lies. In particular folding preprocessed code back into code that has the #defines and #includes in is a massive pain and only really valuable if you want to lean into the round trip capability.
auto, typeof, _Generic are implemented in cake.
Sometimes when they are used inside macros the macros needs to be expanded.
Then cake has
#pragma expand MACRO. for this task.
Sample macro NEW using c23 typeof.
#include <stdlib.h>
#include <string.h>
static inline void* allocate_and_copy(void* s, size_t n) {
void* p = malloc(n);
if (p) {
memcpy(p, s, n);
}
return p;
}
#define NEW(...) (typeof(__VA_ARGS__)*) allocate_and_copy(&(__VA_ARGS__), sizeof(__VA_ARGS__))
#pragma expand NEW
struct X {
const int i;
};
int main() {
auto p = NEW((struct X) {});
}
The generated code is
#include <stdlib.h>
#include <string.h>
static inline void* allocate_and_copy(void* s, size_t n) {
void* p = malloc(n);
if (p) {
memcpy(p, s, n);
}
return p;
}
#define NEW(...) (typeof(__VA_ARGS__)*) allocate_and_copy(&(__VA_ARGS__), sizeof(__VA_ARGS__))
#pragma expand NEW
struct X {
const int i;
};
int main() {
struct X * p = (struct X*) allocate_and_copy(&((struct X) {0}), sizeof((struct X) {0}));
}
I've been dabbling in embedded programming. Everything is written in C. I just don't understand why. C++ solves pretty much all problems if you want it too (RAII, smart pointers, move semantics) and the frameworks writers wouldn't need to implement their bespoke OOP system on top of opaque pointers and callbacks.
Maybe it was bad luck on my part, and other embedded frameworks are better; but I got into both ESP32 and STM32, both frameworks are the worst spaghetti code I have ever seen. You need to jump through at least one, often two layers of indirection to understand what a particular function call will do. Here's an example of what I mean:
And that's an easy example. Macros everywhere, you need to grok what's happening in four different files to understand what the hell a single function call will actually do. Sure, the code is super efficient, because once it's compiled all the extraneous information is pre-processed away if you don't use such and such peripheral or configuration option. But all this could be replaced by an abstract class, perhaps some templates... And if you disable stuff you may not need (RTTI, exceptions) then you'd get just as efficient compiled code. It would be much easier to understand what going on, and you wouldn't be able call DoSomething on uninitialized data... Because you'd have to call the constructor first to even have access to the method.
Anyway, thank god for debuggers, step-by-step execution, and IDEs.
> But all this could be replaced by an abstract class, perhaps some templates... And if you disable stuff you may not need (RTTI, exceptions) then you'd get just as efficient compiled code.
Isn't it just that your personal in-head GPT has been trained on C++ and wants to see it everywhere? It's not so easy to make very small embedded implementations and there's a reason after 25+ years C++ has not made inroads there.
I'd appreciate if you made your point in a less condescending and dismissing manner. Anyway.
No, C++ is not even my programming language of predilection. Not sure why you would make assumptions about my background while knowing nothing about me. But I can recognize OOP patterns when I see them. There's even a book about that https://www.cs.rit.edu/~ats/books/ooc.pdf C-styled OOP is not a new concept. C++ just does it better.
The reason C++ has "not made inroads" may just be inertia, you know. And look at Arduino - if C++ code can run on an 8bit ATmega MCU, it can run anywhere. The whole language is designed around "pay for what you use and nothing else".
C doesn't need to look like this. Some of it does, because it comes from the days where function inlining and dead code elimination were aspirational, but your C compiler is probably derived from clang or gcc now and totally capable of folding away branches on constant data.
An abstract class is a struct with function pointers in it. Mark the fields const and the instance const and it'll be devirtualised and optimised away. If you miss overloading, `static inline __attribute__((overloadable))` wrappers in a header will bring it back.
Code generators can be better for debugging than built in templates. At the source level they look the same, but if it's behaving weirdly, you can look at the generated C instead of the templated layer.
C code can look rather like modern C++. If you're up for feeding it to a custom preprocessor to implement templates, or especially if you've gone as hardcore as the compiler front end under discussion here, C++ starts to look a lot like a syntactic obfuscation over C.
[it's not quite syntax over C, the languages play divergent games with semantics as well, but picking a different set of syntax abstractions over C to the C++ one is an interesting way to go]
You're working with embedded C that is a rats nest of macros. It could instead be sanely factored and readable C without the macros if it was written with slightly more trust in the compiler.
Static analysis has a significant advantage over runtime checks for memory leaks, especially in code that is almost never executed, because bugs can remain hidden until they appear in production. The code where I found the bug last year was executed occasionally, and to create a unit test, it was necessary to integrate with another server. So it wasn't easy to check at runtime.
On the other hand static analysis will catch the error at first compilation even on those almost never executed code.
Do I need to use the Cake frontend to use the ownership library or is it actually macros (or an extension?) I could use in code compiled with gcc or clang?
The answer you can have the same source code and compile with gcc, but only cake is implementing the checks at this moment.
The have the same source code compiling in any compiler a header ownership.h is used to define owner etc as empty macro.
This strategy is used on cake source itself, that is checked with cake, but compiled with gcc msvc and clang.
This doesn’t help for a lot of things, including some of the examples described in the article. For example trying to segregate file descriptors (or similar resource handles) to an isolated heap would be mostly worthless because a UAF through a “stale reference” would let you mess with a completely different file. In general the problem of figuring out which objects are safe to confuse with each other is very difficult. There are a handful of “obvious” types (collections, ports, etc.) that it’s clear cannot be allocated together because they hold capabilities but doing this in general is not tractable.
If all you give me is half measures, then I’ll either just use plain old C/C++ or I’ll switch to a totally different language. Maybe one with a GC so I don’t have to please some ownership thingy.
I think it's a common misconception that ownership is there to make you suffer compiler shenanigans. When in my experience it changes the way you model programs. Turns out, that structuring your program in a way were it's clear who owns what makes for easier to understand and debug programs. It's a bit analogous to static typing, saying I'll use a language like Python without type hints, because its gonna make me avoid compiler errors, is a bit short sighted when I plan on developing the piece of code for a longer time.
If you want to keep doing programming in a way you are already familiar with, and are not willing to change your way of thinking about programs, yes then it's a bad fit. If you want to write reliable programs, there is evidence that changing the way we think about and express programming problems, can have substantial effects on reliability.
You don't need a borrow checker to write reliable programs. If anything the Rust obsession with memory safety has been harmful since it detracts from general safety. But don't take my word for it, maybe consider what the co-author of The Rust Programming Language, 2nd edition has to say [1].
If you really care about writing robust programs then focus on improving your testing methodology rather than fixating on the programming language.
> I think it's a common misconception that ownership is there to make you suffer compiler shenanigans.
I don't think it's a misconception. When I tried Rust I tried to implement a cyclic data structure but couldn't because there is no clear "owner" in a cyclic data structure. The "safe" solution recommended by the rustaceans was to use integer handles. So, instead of juggling pointers I was juggling integers which made the code harder to debug and find logic errors. At least when I was troubleshooting C I could rely on the debugger to break on a bad pointer, but finding where an integer became "bad" was more time consuming.
> When in my experience it changes the way you model programs.
Having to rearchitect my code to please the borrow checker gave me flashbacks to Java where I'd be forced to architect my code as a hierarchy of objects. In both languages you need design patterns and other work-arounds since both force a specific model onto your code whether it makes sense or not.
You seem to have a preset opinion, and I'm not sure you are interested in re-evaluating it. So this is not written to change your mind.
I've developed production code in C, C++, Rust, and several other languages. And while like pretty much everything, there are situations where it's not a good fit, I find that the solutions tend to be the most robust and require the least post release debugging in Rust. That's my personal experience. It's not hard data. And yes occasionally it's annoying to please the compiler, and if there were no trait constraints or borrow rules, those instances would be easier. But way more often in my experience the compiler complained because my initial solution had problems I didn't realize before. So for me, these situations have been about going from building it the way I wanted to -> compiler tells me I didn't consider an edge case -> changing the implementation and or design to account for that edge case. Also using one example, where Rust is notoriously hard and or un-ergonomic to use, and dismissing the entire language seems premature to me. For those that insist on learning Rust by implementing a linked list there is https://rust-unofficial.github.io/too-many-lists/.
The GP commented once, politely disagreeing and describing their own experience. Looking over their past comments, I also don't see hostility to the ideals of memory safety or using Rust.
Seems like you made a passive-aggressive presumption.
I don't think it's very compelling to convert C code to a thing that gives you a safety half-measure. You'll still have security bugs, so it'll just feel like theatre.
huh? There are also security bugs in Rust, so it is theatre as well?
Pointer ownership could eliminate a class of bugs. And such an approach can be combined with run-time checks for bounds and signed overflow, and then you have a memory-safe C more or less (some minor pieces are still missing, but nothing essential),
I don't personally like Rust, I believe Rust achieves this. In Rust, if you don't use the unsafe escape hatch, then your bugs are at worst logic bugs. There won't be any kind of weirdness like that you got some math wrong in an array access and now all of a sudden an attacker can make your program execute arbitrary code.
On the other hand, this Cake thing just adds some ownership and when folks say it's problemmatic the first answer is "oh just tell it to ignore your function". That doesn't sound like memory safety to me. It's nowhere near Rust in that regard.
Rust does the same thing, though? If you are having trouble pleasing the compiler, you can use unsafe to get around it. Of course, the Rust people are a lot more active at telling you that what you wanted was actually wrong and bad, but it's essentially the same position.
Code that's full of memory bugs is likely full of other bugs too. Improving testing methodology, perhaps establishing official guidelines, would address ownership issues and more. The goal should be to write robust software, because robustness implies memory safety but the reverse is not true.
> Sure ownership protects against more things. But some coding patterns are impossible under it.
In Rust you're told that if it's impossible under ownership then you should find a different way to express it rather than trying to circumvent ownership. I guess it's different in C.
It works in both ways.
We can just tell the compiler to ignore some function.
We can be also creative writing the code that at same time is good and makes the static analysis happy.
A good sample is linked lists.
The ownership works for non pointers. We can have integers (handles) that are owners. This allow custom allocators for instance.
The concept of owner is some value that works as reference to an object and manages its lifetime.
Converting code can be challenging. The cake code has been successfully converted. null-checks are not ready, and something similar already happened to c#.
The experience is similar to changing a header file to use a const argument where previously the argument was non-const. This change will propagate everywhere.
I also think a similar experience is converting JavaScript to typescript.
The type system will complain before it stabilizes.
The ownership checks are new in Cake, less than one year. So the answer is no, just cake is using. And there is a lot of work to do..in flow analysis etc.
The cake source itself is moving to another experimental feature that is nullable types.
> But you could get most of the benefit by just isoheaping (strictly allocate different types in different heaps).
Can you elaborate on this? I've been reading up on memory allocation algorithms and most of them seem to favor segregation of blocks by element size instead. Are there additional benefits to coming up with a complex typing scheme for a custom memory allocation interface?
>If it were that simple, someone would have had success at scale by now
A lot of code in that article doesn't use mempools, and furthermore, just because a double free exists doesn't mean that its always exploitable. And if its exploitable, it doesn't mean that you can gain a shell or even exfil data, sometimes it means you can just crash the program.
Fundamentally, if you write a wrapper around memory management that keeps track of allocated resources, much in the same way how rust includes some runtime code during compilation for memory safety, you gain the same functionality.
> Fundamentally, if you write a wrapper around memory management that keeps track of allocated resources, much in the same way how rust includes some runtime code during compilation for memory safety, you gain the same functionality.
Can you substantiate that? There are commonly employed tracking allocators, such as ASAN that can catch certain kinds of UB, and UBSAN other, and with special interpreters you can catch even more. But even basic ASAN is more exhaustive than what you are suggesting, and it provably can't provide the same guarantees that safe and sound Rust gives you https://stackoverflow.com/a/48902567:
> And that is not accounting for the fact that sanitizers are incompatible with each others. That is, even if you were willing to accept the combined slow-down (15x-45x?) and memory overhead (15x-30x?), you would still NOT manage for a C++ program to be as safe as a Rust one.
Also, I think you misunderstand the way Rust works, it does compile-time ownership checking, which allows it to avoid run-time checking, so this part "same way how rust includes some runtime code during compilation for memory safety" is factually wrong.
Rust needs to add some runtime checks when calling destructors in scenarios where some object may or may not be moved.
In C++ for instance, for smart pointers, the destructor will have a
"if p!= NULL". Then if the smart pointer was moved, it makes the pointer null and the destructor checks at runtime for it.
>so this part "same way how rust includes some runtime code during compilation for memory safety" is factually wrong.
RefCell includes runtime code. Fundamentally, because of Rice Theorem, the compiler cannot predict the state of memory at all points in time, so runtime checks are needed.
>Can you substantiate that?
I mean, double free relies on using free() twice. Mempool malloc()'s once, and free()'s once at exit. Use after free is mitigated by making sure that the pointer to the memory is set to zero (mempool either returns struct or a pointer to a pointer on allocation, and you access the requested memory through that).
Furthermore, you can have multiple mempools, and keep critical data separate, so if the pointer doesn't get zeroed out in the implementation, use after free won't leak anything critical.
Is anyone using this at scale and having success with avoiding all memory safety problems, just use mempool trust me bro. I made a copy of the thing the mempool pointer points to and that wasn't zeroed by free, I now have UAF, just use mempool trust me bro. I was using C for performance, I now have double pointer indirection everywhere, just use mempool trust me bro. I went out-of-bounds, just use mempool trust me bro. I violated strict aliasing, just use mempool trust me bro. I violated pointer provenance, just use mempool trust me bro. My program uses more than one thread, just use mempool trust me bro.
malloc doesn't keep track accurately in all cases, which is why double free is possible in the first place.
with mempool implementation, you shouldn't be able to release a previously released chunk because the pointer to that chunk will be zeroed out. This requires one more level of indirection in accessing the memory, i.e pointer to a struct that contains the pointer to the memory, but is otherwise safe.
As of note that may be causing some confusion, im not referencing the standard linux mempool implementation. I have written custom ones with a lot of helper functions for safe memory access.
If you are talking about a very naive version of mempool, then you are correct, but thats why I said a good implementation.
The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.
User after free is mitigated in the same way. When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.
> If you are talking about a very naive version of mempool, then you are correct, but thats why I said a good implementation.
No true Scotsman.
> The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.
Then you've just moved the same problem one layer up - "use after returned to mempool" takes the place of "use after free" and causes the same kind of problems.
> When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.
And the program - or, more likely, library code that it called - still has a copy of that pointer that it made when it was valid?
Its not about comparing implementations, its about the fact that a correct mempool implementation solves the problem without need for complex borrow checkers.
For example, in that implementation, you request memory from a mempool, it returns a chunk-struct with the pointer to allocated memory, the size of the chunk, and optionally some convenience functions for safe access (making sure that the pointer is not incremented or decremented beyond the limits). It also keeps its own pointer to the chunk-struct, along with the chunk that it was allocated. When you release the chunk, it zeros out the pointer in the chunk-struct. Now any access to it will cause a segfault.
You can of course write code that bypasses all those checks, but in Rust, thats equivalent to using unsafe when you wanna be lazy. Also you could argue that Rust is better because instead of segfaulting, the check will be caught during compile time, which is true but only for fairly simple programs. Once you start using RefCells, you cannot guarantee everything during compile time.
> You can of course write code that bypasses all those checks, but in Rust, thats equivalent to using unsafe when you wanna be lazy.
The difference is that most of the Rust ecosystem is set up to allow you to not use unsafe. Whereas whenever you use a library in C, you need to pass it a pointer, so bypassing these checks has to be routine. (Note that the article claims as a key merit that it's possible to add annotations to existing libraries)
> When you release the chunk, it zeros out the pointer in the chunk-struct. Now any access to it will cause a segfault.
Only if you're very lucky. Null pointer dereference is undefined behaviour, so it may cause a different thread to segfault on a seemingly unrelated line, or your program may silently continue with subtly corrupted state in memory, or...
> Also you could argue that Rust is better because instead of segfaulting, the check will be caught during compile time, which is true but only for fairly simple programs. Once you start using RefCells, you cannot guarantee everything during compile time.
Using RefCells should be (and, idiomatically, is) the exception rather than the rule. And incorrect use of RefCell results in a safe panic rather than undefined behaviour.
Null pointer dereference in the vast majority of cases will segfault. In the cases where it doesn't, thats fully on you for running some obscure os on some obscure hardware.
>Whereas whenever you use a library in C, you need to pass it a pointer,
When it comes to developing with Rust, any performance oriented project is necessarily going to have lots of unsafe for interacting with C libraries in the linux kernel in the same way that C code does.
As for comparison to fully safe Rust code outside the unsafes, you can largely accomplish analogous behavior in C with good mempool implementation. Or if you don't need to pass around huge amount of data, you can also do it by simply just never mallocing and using stack variables. There is still some things you have to worry about (using safe length bounded memory copy/move functions, using [type]* const pointer values to essentially make them act like references for function parameters, some other small things).
The point is Rust isn't the defacto standard for memory safety, and while it can exist as its own project, porting its semantics to other languages is not worth it.
> Null pointer dereference in the vast majority of cases will segfault.
Attempting access to a zero address will segfault on most hardware, but unfortunately common C compilers in common configurations will not reliably compile a null pointer dereference to an access to the zero address. Look up why the Linux kernel builds with -fno-delete-null-pointer-checks (sadly, most applications and libraries don't).
> When it comes to developing with Rust, any performance oriented project is necessarily going to have lots of unsafe for interacting with C libraries in the linux kernel in the same way that C code does.
I'm not talking about performance oriented projects. I'm talking about regular use of libraries e.g. I need to talk to PostgreSQL so I'll call libpq, I need to uncompress some data so I'll use zlib, I need to make a HTTP call so I'll use libcurl...
> The point is Rust isn't the defacto standard for memory safety
It absolutely is though. It's got clear, easy-to-assess rules for whether a project is memory-safe or not, and a substantial ecosystem that follows them; so far it's essentially unique in that unless you include GCed languages.
I mean you just proved your own point - compile with -fno-delete-null-pointer-checks.
And whatever criticism is you have of that is surpassed by the fact in all cases for regular software (i.e run on a server or laptop or desktop) that would be normal to write in either Rust or C, if it was written in C, and a null pointer is dereferences, it would absolutely crash (i.e Rust is not really being used to develop embedded system software code in non experimental workflows where zero address is a valid memory address).
And whatever criticism you have of that is surpassed by the fact that if you can write Rust code with all the borrowing semantics, you can also write a quick macro for any dereference of a mempool region that checks if the pointer is null and use that everywhere in your code.
So TLDR, not hard to write memory safe code. Rust is just a way to do it, but not the only way. Its great for enterprise projects, much in the same way that Java came up because of its strictness, GC and multi platform capability. And just like Java today, eventually nobody is going to take it seriously, people who want to get shit done will be writing something that looks like python except even higher level, with ai assistants that replace text, and then LLMs will translate that code into the most efficient machine code.
Most people don't though. Even if your code was compiled with it, libraries you use may not have been compiled that way. And even if you do, it doesn't cover all cases.
> And whatever criticism is you have of that is surpassed by the fact in all cases for regular software (i.e run on a server or laptop or desktop) that would be normal to write in either Rust or C, if it was written in C, and a null pointer is dereferences, it would absolutely crash
No it won't. Not reliably, not consistently. It's undefined behaviour, so a C compiler can do random other things with your code, and both GCC and Clang do.
> And whatever criticism you have of that is surpassed by the fact that if you can write Rust code with all the borrowing semantics, you can also write a quick macro for any dereference of a mempool region that checks if the pointer is null and use that everywhere in your code.
"Everywhere in your code" only if you're not using any libraries.
> So TLDR, not hard to write memory safe code.
If it's that easy why has no-one done it? Where can I find published C programs written this way? Like most claims of "safe C", this is vaporware.
>It's undefined behaviour, so a C compiler can do random other things with your code, and both GCC and Clang do.
Give me an example of a null pointer dereference in a program that one compiles -with -fdelete-null-pointer-checks that doesn't crash when its run on any smartphone, x64 cpu in modern laptops/desktops/servers or Apple Silicon.
> Give me an example of a null pointer dereference in a program that one compiles -with -fdelete-null-pointer-checks that doesn't crash when its run on any smartphone, x64 cpu in modern laptops/desktops/servers or Apple Silicon.
https://blog.llvm.org/2011/05/what-every-c-programmer-should... has an example under "Debugging Optimized Code May Not Make Any Sense" - in that case the release build fortuitously did what the programmer wanted, but the same behaviour could easily cause disaster (e.g. imagine you have two different global "init" functions and your code is set up to call one or other of them depending on some settings or something, and you forget to set one of your global function pointers in one of those init functions. Now instead of crashing, calls via that global function pointer will silently call the wrong version of the function).
> The whole point of a good mempool is that you malloc once, and only call free when you exit the program
So you're describing fork() and _exit(). That's my favorite memory manager. For example, chibicc never calls free() and instead just forks a process for each item of work in the compile pipeline. It makes the codebase infinitely simpler. Rui literally solved memory leaks! No idea what you're talking about.
One issue I see with this approach (compiler leaking memory) is, for instance, if the requirements change and you need to utilize the compiler as a lib or service.
For example, if the Cake source is used within a web browser compiled with Emscripten, leaking memory with each compilation would lead to a continuous increase in memory usage.
Additionally, compilers often offer the option to compile multiple files. Therefore, we cannot afford to leak memory with each file compilation.
Initially I was planning a global allocator for cake source.
It had a lot of memory leaks that would be solved in the future.
When ownership checks were added it was a perfect candidate for fixing leaks.
(actually I also had this in mind)
True, but with some stuff you just ain't gonna need it. For example, chibicc forks a process for each input file. They're all ephemeral. So the fork/_exit model does work well for chibicc. You could compile a thousand files and all its subprocesses would just clean things up. Now needless to say, I have compiled some juicy files with chibicc. Memory does get a bit high. It's manageable though. I imagine it'd be more of an issue if it were a c++ compiler.
There would be many more steps required "toward" memory safety, such as eliminating all forms of UB including uninitialized memory, out of bounds pointers, data races, etc. but if this direction is to be pursued it has to start somewhere.