More

thradams · 2024-04-28T22:02:52

Cake is a open source compiler and static analyzer in development. (Not production quality yet.)

This video shows how cake can help programmers to create safe code just fixing warnings.

We copy paste code then we add pragma safety enable

This enables two features ownership and nullable checks. Ownership will check if the fclose is called for instance, also checks double free etc, while nullable checks will check for de-referencing null pointers.

New qualifiers _Opt and _Owner are used but they can be empty macros, allowing the same code to be compiled without cake.

thradams · 2024-02-28T19:51:04

I think at "Keep the language small and simple" it should say avoid "two ways of doing something"

( The sample I have is 0, NULL and nullptr where nullptr is something new. Two ways of doing something makes the language complex. )

Jorengarenar · 2024-02-28T21:07:31

Yeah, we didn't copy that one over precisely because it was kind of a blocker to introducing replacements for outdated design.

But I think it can be weaseled into that principle. Thanks!

thradams · 2024-02-20T12:05:13

The idea is to keep cake aligned with C, not a language fork. But Cake itself could have a fork to Cake++. :D

JonChesterfield · 2024-02-20T12:16:00

A 'C' -> C compiler which preserves most source code unchanged (i.e. would be the identity transform on some input) and which implements something like constexpr on functions (by running the interpreter during the transform) could be argued to be a forward looking C implementation. Specifically C23 has constexpr, but in an extremely limited form, and aspires to extend that to be more useful later.

Equally one which replaces 'auto' with the name of the type (and similar desugaring games) is still a C to C compiler, just running as a C23 to C99 or whatever. Resolve the branch in _Generic before emitting code as part of downgrading C11.

The lifetime annotations are an interesting one because they're a different language which, if it typechecks, can be losslessly converted into C (by dropping the annotations on the way out).

I'm not sure where in that design space the current implementation lies. In particular folding preprocessed code back into code that has the #defines and #includes in is a massive pain and only really valuable if you want to lean into the round trip capability.

thradams · 2024-02-20T12:27:08

auto, typeof, _Generic are implemented in cake. Sometimes when they are used inside macros the macros needs to be expanded. Then cake has #pragma expand MACRO. for this task.

Sample macro NEW using c23 typeof.

    #include <stdlib.h>
    #include <string.h>

    static inline void* allocate_and_copy(void* s, size_t n) {
        void* p = malloc(n);
        if (p) {
            memcpy(p, s, n);
        }
        return p;
    }

    #define NEW(...) (typeof(__VA_ARGS__)*) allocate_and_copy(&(__VA_ARGS__), sizeof(__VA_ARGS__))
    #pragma expand NEW

    struct X {
        const int i;
    };

    int main() { 
        auto p = NEW((struct X) {});     
    }

The generated code is

    #include <stdlib.h>
    #include <string.h>

    static inline void* allocate_and_copy(void* s, size_t n) {
        void* p = malloc(n);
        if (p) {
            memcpy(p, s, n);
        }
        return p;
    }

    #define NEW(...) (typeof(__VA_ARGS__)*) allocate_and_copy(&(__VA_ARGS__), sizeof(__VA_ARGS__))
    #pragma expand NEW

    struct X {
        const int i;
    };

    int main() { 
        struct X  * p =  (struct X*) allocate_and_copy(&((struct X) {0}), sizeof((struct X) {0}));     
    }

thradams · 2024-02-20T11:47:55

(by the way, embed is not working on web version because of include directory bug - it is an open issue and regression)

thradams · 2024-02-20T11:42:50

Rust needs to add some runtime checks when calling destructors in scenarios where some object may or may not be moved.

In C++ for instance, for smart pointers, the destructor will have a "if p!= NULL". Then if the smart pointer was moved, it makes the pointer null and the destructor checks at runtime for it.

thradams · 2024-02-20T11:00:14

Cake implements defer as an extension, where ownership and defer work together. The flow analysis must be prepared for defer.

    int * owner p = calloc(1, sizeof(int));
    defer free(p);

However, with ownership checks, the code is already safe. This may also change the programmer's style, as generally, C code avoids returns in the middle of the code.

In this scenario, defer makes the code more declarative and saves some lines of code. It can be particularly useful when the compiler supports defer but not ownership.

One difference between defer and ownership checks, in terms of safety, is that the compiler will not prompt you to create the defer. But, with ownership checks, the compiler will require an owner object to hold the result of malloc, for instance. It cannot be ignored.

The same happens with C++ RAII. If you forgot to free something at our destructor or forgot to create the destructor, the compiler will not complain.

In cake ownership this cannot be ignored.

    struct X {
      FILE * owner file;
    };

    int main(){
       struct X x = {};
       //....
       
    } //error x.file not freed

thradams · 2024-02-20T04:38:18

>Can you ask Github Co-pilot to look at C code and answer the question "What is >the length of the array 'buf' passed to this function"? That tells you how to >express the array in a language where arrays have enforced lengths, whicn >includes both C++ and Rust

this is the way you tell C what is the size of array.

    void f(int n, int a[n]) {
    }

Animats · 2024-02-20T07:06:23

You can write that in C, but it doesn't really do anything. It's equivalent to

    void f(int n, int a[]) {
    }

Why? So that you can write

    void f(int n, int m, int a[n][m]) {
    }

which declares a 2-dimensional array parameter. In that case, the "m" is used to compute the position in the array for a 2D array. The "m" doesn't do anything. This is equivalent to writing

   void f(int n, int m, int a[][m]) {
   }

This is C's minimal multidimensional array support, known by few and used by fewer.

Over a decade ago, I proposed that sizes in parameters should be checkable and readable I worked out how to make it work.[1] But I didn't have time for the politics of C standards.

[1] http://animats.com/papers/languages/safearraysforc43.pdf

a_t48 · 2024-02-20T06:35:12

Do you have source on this syntax? Does the `[n]` actually do anything here? Fooling around in godbolt, `void f(int n, int a[n]) {` is the same as `void f(int n, int a[]) {` and doesn't appear to change assembly or generate any warnings/errors with improper usage.

unnah · 2024-02-20T07:31:03

It looks like standard C99 variable-length array (VLA) syntax: https://en.cppreference.com/w/c/language/array#Variable-leng...

The major difference is when the array is multi-dimensional. If you don't have VLAs then you can only set the inner dimensions at compile time, or alternatively use pointer-based work-arounds.

Even in the case of one-dimensional arrays, a compiler or a static analyzer can take advantage of the VLA size information to insert run-time checks in debug mode, or to perform compile-time checks.

a_t48 · 2024-02-20T16:55:13

Thank you - that makes total sense.

JonChesterfield · 2024-02-20T12:21:28

you're missing the word "static" to have that work as intended. Option (2) at https://en.cppreference.com/w/c/language/array

Parameters like `const double b[static restrict 10]` for at least 10 long and doesn't alias other parameters.

Syntactically this is pretty weird.

thradams · 2024-02-20T04:35:18

mempool does not solve double free, use after free (at least at compile time) or fopen sample. But mempool and ownership can be complementary.

ActorNightly · 2024-02-20T05:50:57

If you are talking about a very naive version of mempool, then you are correct, but thats why I said a good implementation.

The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.

User after free is mitigated in the same way. When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.

lmm · 2024-02-20T06:31:02

> If you are talking about a very naive version of mempool, then you are correct, but thats why I said a good implementation.

No true Scotsman.

> The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.

Then you've just moved the same problem one layer up - "use after returned to mempool" takes the place of "use after free" and causes the same kind of problems.

> When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.

And the program - or, more likely, library code that it called - still has a copy of that pointer that it made when it was valid?

ActorNightly · 2024-02-20T18:09:18

Its not about comparing implementations, its about the fact that a correct mempool implementation solves the problem without need for complex borrow checkers.

For example, in that implementation, you request memory from a mempool, it returns a chunk-struct with the pointer to allocated memory, the size of the chunk, and optionally some convenience functions for safe access (making sure that the pointer is not incremented or decremented beyond the limits). It also keeps its own pointer to the chunk-struct, along with the chunk that it was allocated. When you release the chunk, it zeros out the pointer in the chunk-struct. Now any access to it will cause a segfault.

You can of course write code that bypasses all those checks, but in Rust, thats equivalent to using unsafe when you wanna be lazy. Also you could argue that Rust is better because instead of segfaulting, the check will be caught during compile time, which is true but only for fairly simple programs. Once you start using RefCells, you cannot guarantee everything during compile time.

lmm · 2024-02-20T21:26:58

> You can of course write code that bypasses all those checks, but in Rust, thats equivalent to using unsafe when you wanna be lazy.

The difference is that most of the Rust ecosystem is set up to allow you to not use unsafe. Whereas whenever you use a library in C, you need to pass it a pointer, so bypassing these checks has to be routine. (Note that the article claims as a key merit that it's possible to add annotations to existing libraries)

> When you release the chunk, it zeros out the pointer in the chunk-struct. Now any access to it will cause a segfault.

Only if you're very lucky. Null pointer dereference is undefined behaviour, so it may cause a different thread to segfault on a seemingly unrelated line, or your program may silently continue with subtly corrupted state in memory, or...

> Also you could argue that Rust is better because instead of segfaulting, the check will be caught during compile time, which is true but only for fairly simple programs. Once you start using RefCells, you cannot guarantee everything during compile time.

Using RefCells should be (and, idiomatically, is) the exception rather than the rule. And incorrect use of RefCell results in a safe panic rather than undefined behaviour.

ActorNightly · 2024-02-21T18:45:36

Null pointer dereference in the vast majority of cases will segfault. In the cases where it doesn't, thats fully on you for running some obscure os on some obscure hardware.

>Whereas whenever you use a library in C, you need to pass it a pointer,

When it comes to developing with Rust, any performance oriented project is necessarily going to have lots of unsafe for interacting with C libraries in the linux kernel in the same way that C code does.

As for comparison to fully safe Rust code outside the unsafes, you can largely accomplish analogous behavior in C with good mempool implementation. Or if you don't need to pass around huge amount of data, you can also do it by simply just never mallocing and using stack variables. There is still some things you have to worry about (using safe length bounded memory copy/move functions, using [type]* const pointer values to essentially make them act like references for function parameters, some other small things).

The point is Rust isn't the defacto standard for memory safety, and while it can exist as its own project, porting its semantics to other languages is not worth it.

lmm · 2024-02-22T03:21:42

> Null pointer dereference in the vast majority of cases will segfault.

Attempting access to a zero address will segfault on most hardware, but unfortunately common C compilers in common configurations will not reliably compile a null pointer dereference to an access to the zero address. Look up why the Linux kernel builds with -fno-delete-null-pointer-checks (sadly, most applications and libraries don't).

> When it comes to developing with Rust, any performance oriented project is necessarily going to have lots of unsafe for interacting with C libraries in the linux kernel in the same way that C code does.

I'm not talking about performance oriented projects. I'm talking about regular use of libraries e.g. I need to talk to PostgreSQL so I'll call libpq, I need to uncompress some data so I'll use zlib, I need to make a HTTP call so I'll use libcurl...

> The point is Rust isn't the defacto standard for memory safety

It absolutely is though. It's got clear, easy-to-assess rules for whether a project is memory-safe or not, and a substantial ecosystem that follows them; so far it's essentially unique in that unless you include GCed languages.

ActorNightly · 2024-02-22T05:00:23

I mean you just proved your own point - compile with -fno-delete-null-pointer-checks.

And whatever criticism is you have of that is surpassed by the fact in all cases for regular software (i.e run on a server or laptop or desktop) that would be normal to write in either Rust or C, if it was written in C, and a null pointer is dereferences, it would absolutely crash (i.e Rust is not really being used to develop embedded system software code in non experimental workflows where zero address is a valid memory address).

And whatever criticism you have of that is surpassed by the fact that if you can write Rust code with all the borrowing semantics, you can also write a quick macro for any dereference of a mempool region that checks if the pointer is null and use that everywhere in your code.

So TLDR, not hard to write memory safe code. Rust is just a way to do it, but not the only way. Its great for enterprise projects, much in the same way that Java came up because of its strictness, GC and multi platform capability. And just like Java today, eventually nobody is going to take it seriously, people who want to get shit done will be writing something that looks like python except even higher level, with ai assistants that replace text, and then LLMs will translate that code into the most efficient machine code.

lmm · 2024-02-22T06:20:08

> compile with -fno-delete-null-pointer-checks

Most people don't though. Even if your code was compiled with it, libraries you use may not have been compiled that way. And even if you do, it doesn't cover all cases.

> And whatever criticism is you have of that is surpassed by the fact in all cases for regular software (i.e run on a server or laptop or desktop) that would be normal to write in either Rust or C, if it was written in C, and a null pointer is dereferences, it would absolutely crash

No it won't. Not reliably, not consistently. It's undefined behaviour, so a C compiler can do random other things with your code, and both GCC and Clang do.

> And whatever criticism you have of that is surpassed by the fact that if you can write Rust code with all the borrowing semantics, you can also write a quick macro for any dereference of a mempool region that checks if the pointer is null and use that everywhere in your code.

"Everywhere in your code" only if you're not using any libraries.

> So TLDR, not hard to write memory safe code.

If it's that easy why has no-one done it? Where can I find published C programs written this way? Like most claims of "safe C", this is vaporware.

ActorNightly · 2024-02-22T11:09:55

>It's undefined behaviour, so a C compiler can do random other things with your code, and both GCC and Clang do.

Give me an example of a null pointer dereference in a program that one compiles -with -fdelete-null-pointer-checks that doesn't crash when its run on any smartphone, x64 cpu in modern laptops/desktops/servers or Apple Silicon.

lmm · 2024-02-25T22:28:26

> Give me an example of a null pointer dereference in a program that one compiles -with -fdelete-null-pointer-checks that doesn't crash when its run on any smartphone, x64 cpu in modern laptops/desktops/servers or Apple Silicon.

https://blog.llvm.org/2011/05/what-every-c-programmer-should... has an example under "Debugging Optimized Code May Not Make Any Sense" - in that case the release build fortuitously did what the programmer wanted, but the same behaviour could easily cause disaster (e.g. imagine you have two different global "init" functions and your code is set up to call one or other of them depending on some settings or something, and you forget to set one of your global function pointers in one of those init functions. Now instead of crashing, calls via that global function pointer will silently call the wrong version of the function).

thradams · 2024-02-21T20:56:58

Cake is not porting Rust semantics. It works on classical C code, like the first sample using fopen.

    #include <ownership.h>
    #include <stdio.h>

    int main()
    {
      FILE *owner f = fopen("file.txt", "r"); 
      if (f)
        fclose(f);
    }

But comparisons are inevitable, and I also think there are lessons learned in Rust.

C programmers uses contracts, these contracts are part of documentation of some API. For instance, if you call fopen you must call fclose.

All we need is to create contracts that the compiler can read and verify automatically.

jart · 2024-02-20T07:06:41

> The whole point of a good mempool is that you malloc once, and only call free when you exit the program

So you're describing fork() and _exit(). That's my favorite memory manager. For example, chibicc never calls free() and instead just forks a process for each item of work in the compile pipeline. It makes the codebase infinitely simpler. Rui literally solved memory leaks! No idea what you're talking about.

thradams · 2024-02-20T11:37:10

One issue I see with this approach (compiler leaking memory) is, for instance, if the requirements change and you need to utilize the compiler as a lib or service. For example, if the Cake source is used within a web browser compiled with Emscripten, leaking memory with each compilation would lead to a continuous increase in memory usage.

Additionally, compilers often offer the option to compile multiple files. Therefore, we cannot afford to leak memory with each file compilation.

Initially I was planning a global allocator for cake source. It had a lot of memory leaks that would be solved in the future.

When ownership checks were added it was a perfect candidate for fixing leaks. (actually I also had this in mind)

jart · 2024-02-20T14:40:37

True, but with some stuff you just ain't gonna need it. For example, chibicc forks a process for each input file. They're all ephemeral. So the fork/_exit model does work well for chibicc. You could compile a thousand files and all its subprocesses would just clean things up. Now needless to say, I have compiled some juicy files with chibicc. Memory does get a bit high. It's manageable though. I imagine it'd be more of an issue if it were a c++ compiler.

thradams · 2024-02-20T16:18:55

(I think preprocessor is the place where memory is used and released all the time while expanding macros.)

jart · 2024-02-20T18:06:37

It is.

thradams · 2024-02-20T04:13:06

I think gsl::Owner is related with RAII.

The difference with cake ownership and RAII , is that with C++ RAII, the destructor is unconditionally called at end of scope. Then flow analysis is not required in RAII.

Cake requires flow analysis because "destructor" is not unconditionally called.

When the compiler can see that the owner is not owning a object (because the pointer is null for instance) then the "destructor" is not necessary.

To understand the difference.

With flow analysis (how it works today)

    int main() 
    {
      FILE *owner f = fopen("file.txt", "r"); 
      if (f)
        fclose(f);
    }

Without flow analysis (or with a very simple one, where the destroy must be the last statement)

    void fclose2(FILE * owner p) {
       if (p) fclose(p);
    }

    int main() 
    {
      FILE *owner f = fopen("file.txt", "r"); 
      if (f){
      }
      fclose2(f);
    }

thradams · 2024-02-20T04:17:29

the other difference in RAII destructor cannot be turned off. In cake the same object can be a "view"

    struct X x = {0};
    //...
    view struct X x2 = x;
    destroy(&x);
    //x2 does not need destructor

thradams · 2024-02-20T03:52:27

the cake implementation cannot be mapped to rust. I am not rust specialist but one concept for instance is that a owner pointer owns two resources at same time, the memory and object. In rust it is one concept.

Owner pointers take on the responsibility of owning the pointed object and its associated memory, treating them as distinct entities. A common practice is to implement a delete function to release both resources, as illustrated in Listing 7:

Listing 7 - Implementing the delete function

    #include <ownership.h>

    #include <stdlib.h>


    struct X { 
      char *owner text; 
   };

    void x_delete(struct X *owner p) {
      if (p) {
        /*releasing the object*/ 
        free(p->text);
    
       /*releasing the memory*/ 
       free(p); 
     }
   }

   int main() {
      struct X \* owner pX = calloc( 1, sizeof \* pX);
      if (pX) {
       /*...*/;
       x_delete( pX); 
      }  
   }

pitaj · 2024-02-20T04:23:01

I don't see why that couldn't be represented like this in Rust:

    struct X {
        text: Option<Box<str>>,
    }
    fn main() {
        let pX = Box::new(X { text: None });
        // automatically dropped (freed) at end of scope
    }

thradams · 2024-02-20T04:31:51

In cake object and memory are two resources. We can for instance, delete the object and reuse the same memory.

For instance, this code is correct.

    #include <ownership.h> 
    #include <stdlib.h>

    struct X {
       char * owner text;
    };

    void x_delete(struct X * owner p)
    {
        if (p)
        {
           free(p->text);
           free(p);    
        }
    }

    int main() {   
       struct X * owner p = malloc(sizeof(struct X));
        
       p->text = malloc(10);

       free(p->text); //object text destroyed

       struct X x2 = {0};

       *p = x2; //x2 MOVED TO *p

       x_delete(p);   

       //no need to destroy x2
    }

Arnavion · 2024-02-20T05:10:51

    let p: Box<X> = Box::new(X { ... });

    let x2 = X { ... };

    // Moves x2 into the same memory as the first X.
    // The first X is automatically dropped as part of this assignment.
    // Also consumes x2 so x2 is not available any more.
    *p = x2;

    // Drops the X that was originally assigned to x2 and then moved into p.
    drop(p);

    // No need, nor is it possible, to destroy x2.

thradams · 2024-02-20T11:16:33

Thanks for the rust sample. It looks very similar. Can the allocator be customized?

As I said I am not Rust specialist.

Also, in my understanding is that in Rust, sometimes a dynamic state is created when the object may or may not be moved.

In cake ownership this needs to me explicit ( and the destructor is not generated)

I also had a look at Rust in lifetime annotations. This concept may be necessary but I am avoiding it.

Consider this sample.

   struct X {  
     struct Y * pY;  
   };  
   struct Y {  
     char * owner name;  
   };

An object Y pointed by pY must live longer than object X. (Cake is not checking this scenario yet)

Also (classic Rust sample)

    int * max(int * p1, int * p2) {  
      return *p1 > *p2 ? p1 : p2;
    }

    int main(){  
       int * p = NULL;
       int a  = 1;
      {
         int b = 2;
         p = max(&a,  &b);
      }
      printf("%d", *p);
    }

This is not implemented yet but I want to make the lifetime of p be the smallest scope. (this is to avoid lifetime annotations)

   int * p = NULL;
   int a  = 1;
   {
      int b = 2;
      p = max(&a,  &b);
   } //p cannot be used beyond this point*

Arnavion · 2024-02-20T19:12:20

>Can the allocator be customized?

`Box<T>` is the type of an owning pointer that uses the default global allocator, and `Box<T, A>` is the type of an owning pointer that uses an allocator of type `A`. The latter is unstable, ie it can only be used in nightly Rust.

(Also the fact that the latter changes the type means a large part of existing third-party code as well as a bunch of code in libstd itself becomes unusable if you want to use a type-level custom allocator because they only work with `Box<T>`. But that's a different discussion...)

>An object Y pointed by pY must live longer than object X.

Yes, the py field in Rust would use a reference type instead of a pointer, and the reference would need to have a lifetime annotation, and the compiler would work to prevent the situation you describe:

    struct X<'a> { py: &'a Y }

    let y = Y { ... };
    let x = X { py: &y };
    drop(y); // error: y is borrowed by x so it cannot be moved.

But to be clear, the `'a` lifetime syntax is not what's making this work. What's making this work is that the compiler tracks the lifetimes of references in general. This works in the same way even though there are no lifetime annotations:

    let y: String = "a".to_owned();
    let x = &y;
    drop(y); // error: y is borrowed by x so it cannot be moved.
    do_something_with(x);

The explicit lifetime annotations are just for a) readability, and b) because sometimes you want to name them to be able to express relationships between them. Eg if two lifetimes 'a and 'b are in play and you want to express that 'a is at least as long as 'b, then you have to write a `'a: 'b` bound. In many cases they can be omitted and the compiler infers them automatically.

thradams · 2024-02-20T19:32:17

(question about rust.. this is not implemented in cake yet)

Let's say I have to objects on the heap. A and B. A have a "view" to B.

Then we put a prompt for the user. (or dynamic condition) "Which object do you want to delete first A or B?" Then user select B. How this can be checked at compile time?

Arnavion · 2024-02-20T20:15:14

The code path that drops B will not compile unless that code path drops A first. It doesn't matter if that code path is in response to user input or not. Again, as I said, the point is that the compiler tracks the lifetime of all references. In this case A contains a reference to B, so any code that drops B without dropping A will not compile.

masklinn · 2024-02-20T13:49:18

> Also, in my understanding is that in Rust, sometimes a dynamic state is created when the object may or may not be moved.

Yes?

    if cond {
        drop(p)
    }
    // p may or may not be dropped here

or

    let p;
    if cond {
        p = something();
    }
    // p may or may not be set here

These trigger dynamic drop semantics, in which case the stackframe has a hidden set of drop flags going alongside any variable with dynamic drop semantics, to know if they do or don’t need to be dropped. The flags are automatically updated when the corresponding variables are set or moved-from.

thradams · 2024-02-20T14:13:51

This "dynamic drop semantics" does not exist in cake.

    int * owner p = malloc(sizeof(int));
    if (condition) 
       free(p);
    free(p); // error p may be initialized/moved.

to fix

    int * owner p = malloc(sizeof(int));
    if (condition) 
    { 
      free(p);
      p = 0;
    }
    free(p);

masklinn · 2024-02-20T07:51:50

And as bonus there is no temporal hole where you could access a null or dangling text.

Here it is as a runnable snippet: https://godbolt.org/z/fc4Gfxrfd

thradams · 2024-02-20T11:22:24

In cake there is no temporal hole, we cannot reuse the deleted object. This prevents double free and use after free.

    int main() {   
       struct X * owner p = malloc(sizeof(struct X));
        
       p->text = malloc(10);

       free(p->text); //object text destroyed

       //p->text is on uninitialized state. 
       //cannot be used (except assignment)

       struct X x2 = {0};

       *p = x2; //x2 MOVED TO *p

       x_delete(p);