Circle C++ with memory safety

tialaramex · 2024-06-02T16:18:09

Sean (the author of Circle) is an impressive guy. He started pursuing this work at about the same point several of the "C++ Successor Languages" were created, but although all of them claimed to be about solving this problem, especially when first announced, they actually don't have a solution unlike this Circle work. Let me briefly enumerate:

Val (now HyLo) says it wants to solve this problem... but it doesn't yet have the C++ interop stuff, so, it's just really a completely different language nobody uses.

Carbon wants to ship a finished working Carbon language, then bolt on safety (somehow) but, only for some things, not data races for example, so, you still don't actually have Rust's Memory Safety

Cpp2 explicitly says it isn't even interested in solving the problem, Herb says if he can produced measurements saying it's "safer" somehow that's good enough.

It's interesting how many good ideas from Rust just come along free for the ride when you try to do this, do you like Destructive Move? Good news, that's how you make this work in Rust so that's how Circle does it. Exhaustive Pattern Matching? Again, Circles does that for the same reason Rust needs it.

It is certainly true that this is not the Only Possible Way to get Safety. It would be profoundly weird if Rust had stumbled onto such a thing, but "Let's just copy the thing that worked in Rust" is somehow at once the simplest possible plan that could work and an astonishing achievement for one man.

pjmlp · 2024-06-02T17:18:32

Right now, Circle looks like the only Typescript like evolution path for existing C++, with a production quality compiler.

Unfortunately WG21 seems to have some issues with any ideas coming out from Circle, going back to its early days, and I don't see them being willing to adopt Sean's work.

Which is really a pity, as he single handled managed to deliver more than whole C++ compiler folks, stuck in endless discussions about the philosophical meaning of the word safety.

Maybe at least some vendor in high integrity computing domain does adopt his work.

jokoon · 2024-06-02T18:26:59

> Unfortunately WG21 seems to have some issues with any ideas coming out from Circle, going back to its early days, and I don't see them being willing to adopt Sean's work.

What reasons? Are those valid?

pjmlp · 2024-06-02T19:11:04

It was due to the metaprogramming capabilities, due to how Circle enables to use full C++ at compile time instead of constexpr/constinit/constval, David Sankel has a talk where he jokes with the WG21 decision process that was behind it,

"Don't constexpr All the Things", from CppNow 2021,

https://youtu.be/NNU6cbG96M4?t=2045

fweimer · 2024-06-02T21:42:10

Well, GCC supports -fimplicit-constexpr these days: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Optio...

triblemaster · 2024-06-03T13:18:52

from what I understand, there's a price Sean is asking for his work which no one is willing to pay at this moment.

wrsh07 · 2024-06-02T18:42:04

I'm very confused by the love for circle - there's a GitHub that hasn't been updated for 7 months that doesn't have a license.

When I first heard about it, a lot of the ideas seemed interesting, but are there users of it?

I think my biggest question is "what is the goal here?"

For carbon they're pretty explicit that it's a research prototype. If anything is to come if it, it will need to be usable at Google's scale (they have real issues with build times, they build from head so abi compatibility isn't required, etc)

Herb wasn't really designing cpp2 as a successor language so much as a playground to understand what features might make sense to propose for adoption in c++.

What is circle? It's more than just some project, but the ideas haven't been adopted in cpp and the compiler repository isn't being updated

steveklabnik · 2024-06-02T18:51:26

Circle is not open source, that’s correct.

dwheeler · 2024-06-02T19:39:26

I think that's automatically a dead end, then. People have inceasingly abandoned closed source compilers, they create a huge risk if the maker decides to stop maintaining it. Most languages that people pick up have an open source software implementation.

jay-barronville · 2024-06-02T22:19:34

In most cases, I’d probably agree with your point here, but in this case, I think you’re wrong. If Circle can truly accomplish its stated goals, the value proposition of a memory-safe superset of C++ is ginormous. Lots of companies with critical software written in C++ won’t care that Circle isn’t open-source as long as it ticks all of their boxes (certifications, audits, etc.) and they have a strong enterprise support story. This isn’t your average project.

nicce · 2024-06-02T23:01:58

Even Ada is not that closed. But yeah, Ada proves that some will pay, a lot.

pjmlp · 2024-06-03T06:38:02

Enough that there are still seven vendors selling Ada compilers.

wrsh07 · 2024-06-03T02:29:42

I hope that's right, but I don't really get it.

From a practical standpoint, when you run your tests with msan and asan (and if you have decent test coverage) I'm not convinced of the benefits of memory safety that eg rust provides

Supposing that it is worth it though, why not migrate to rust? I have a friend starting a startup / consultancy to do just that, and it makes more sense than using circle. Even carbon says "if you can use rust, use rust not carbon"

xmcqdpt2 · 2024-06-03T11:20:27

The problem with testing is that you need a very large number of tests to cover a moderately complex piece of software: you need to test most branches obviously, but often you need to test combinations of behaviours such that 100% line testing is still not even close to enough testing.

The advantage of compile time verification is that you can prove that certain paths and behaviours are impossible and don't need to be tested. This reduces the space of required tests combinatorially. We all already do that: no one bothers testing (for example) that standard library functions work correctly in their own code base. In Rust (and strongly typed languages in general) there are entire classes of test and assertions that aren't needed anymore.

jay-barronville · 2024-06-03T02:51:45

> Supposing that it is worth it though, why not migrate to rust? I have a friend starting a startup / consultancy to do just that, and it makes more sense than using circle. Even carbon says "if you can use rust, use rust not carbon"

A key selling point for Circle is the “superset of C++” aspect. What you’re proposing—while feasible for startups—is entirely unreasonable for larger companies, especially those with existing [usually massive] C++ codebases. With enough evangelizing, you might get them to agree to start using something like Rust in new, smaller, and internal projects. However, suggesting a rewrite of anything mission-critical that can directly impact the bottom line to an entirely different language, ecosystem, behaviors, guarantees, community, etc. is one of the most scary proposals you can make to a company. Are the existing engineers proficient in this new language? How long will it take to ramp up? Are there any new costs (not just financial) to adopting this new language (hint: there always are)? Are there legal concerns? How long will the rewrite take (hint: likely longer than the engineers think)? The list goes on. It’s simply too risky a proposition.

wrsh07 · 2024-06-03T14:48:04

I don't believe your claim about migrating to rust being untenable. Primarily because I worked at Google where I've seen them migrate the codebase consistently through years of development and language updates. If an enormous company like Google can switch CPU architectures, change their numeric types, change out their hash maps, etc, then yes you can migrate to using a whole new language. (If Google thought this was impossible, why bother with carbon?)

I say this confidently because I've worked directly with people doing this and seen their work.

I will claim one better and say: you can migrate to idiomatic rust (with help from some custom libraries)

Should companies do this? Depends on the industry and the needs for the kind of safety rust provides

nextaccountic · 2024-06-03T05:46:15

The greatest value of Circle isn't in the compiler and tooling, it's in the design. Designing a C++ superset with Rust-like safety properties is hard. Once Circle gains traction, there's 110% chance that it gets reimplemented elsewhere.

That said, I'm wholly uninterested in any proprietary language, too.

wrsh07 · 2024-06-03T14:49:11

It won't gain traction because it's not open source

shaklee3 · 2024-06-03T05:05:00

Sean has said he would consider open sourcing it later, but doing so now would defeat the purpose of the project. He makes a lot of progress simply because he's but watching all the issues and prs in GitHub.

umanwizard · 2024-06-03T16:20:34

There are plenty of open source projects that aren’t developed in the open and just throw a source tarball over the wall periodically. Lua for example.

shaklee3 · 2024-06-03T16:28:20

Sure, but as soon as you put out source you're going to have suggestions/comments/criticisms/PRs/etc.

shaklee3 · 2024-06-03T16:27:36

That was supposed to say he's not watching.

steveklabnik · 2024-06-02T19:48:22

I think it's more complicated than that, but I agree that it's a factor that would give some folks pause for currently adopting Circle.

dataflow · 2024-06-03T00:56:07

Isn't MSVC closed source?

bluGill · 2024-06-03T13:49:13

Yes, but there are several arguements in favor of MSVC that don't apply in general.

MSVC is for and from the same people as Windows, Windows is large enough and popular enough that it isn't going away (Microsoft would have to die - and even if that happens you can bet organizations the US government would take over Windows), thus betting that MSVC won't go away is safe enough. If it doesn't work out your company is in trouble but so is everyone else. It is when you are betting on something less popular that you get into potential trouble as the thing you depend on can be canceled.

Mingw isn't as good as MSVC, but if forced you could use it instead. Which means you have options and so a bet on closed source has a understandable worst case risk that is a lot better.

Small close source projects like Circle C++ should be used either as a small experiment - if this fails you can rewrite everything in something open source in a few months and so the risk is low. There is one other common option (though I don't know what circle c++ offers): you can bet on circle C++ after your lawyers get a contract that if it becomes unavailable you get source code and rights so your worst case is you maintain it yourself. These contracts in business are made all the time, and so a good lawyer will have no problem getting the risk terms into a contract.

pjmlp · 2024-06-03T06:39:00

There is still a healthy C and C++ commercial compiler market, specially in embedded, high integrity computing and games.

ReleaseCandidat · 2024-06-03T07:04:30

Yes, but the whole value proposition of Circle is rewriting existing C++ libraries in safe C++. Because if not, you could "just" use Rust and call them from there. And without an open source compiler that won't happen, even if it would be free as in beer.

pjmlp · 2024-06-03T08:16:35

Regulation and certificed compilers also help to reach decisions.

bluGill · 2024-06-03T13:56:30

No, the value is you can use existing C++ with circle. Rust might be a great language, but if I have several million lines of C++ and I just want to work with a std::vector<MyCppClass> rust will have a lot of trouble.

Vt71fcAqt7 · 2024-06-02T16:30:24

>Carbon wants to ship a finished working Carbon language, then bolt on safety (somehow) but, only for some things, not data races for example, so, you still don't actually have Rust's Memory Safety

I'm not sure this is correct. As I understand it, Carbon's plan is to add a borrow checker like Rust's.

From a recent talk[0][1] by one of the lead developers:

>Best candidate for C++ is likely similar to Rust’s borrow checker

[0] slides: https://chandlerc.blog/slides/2023-cppnow-carbon-strategy/in...

[1] relevant timestamps:

https://youtube.com/watch?v=1ZTJ9omXOQ0&t=1h31m34s

https://youtube.com/watch?v=1ZTJ9omXOQ0&t=1h9m49s

tialaramex · 2024-06-02T16:49:14

Chandler has explicitly said that he doesn't see a reason to solve data races.

The borrow checker isn't enough on its own to solve this in Rust, Sean explains (probably deeper in Circle's documentation) that you need to track a new property of types which is infectious, Rust does this with the Send and Sync traits.

chandlerc1024 · 2024-06-03T01:43:18

"Chandler has explicitly said that he doesn't see a reason to solve data races."

Er, the slide title says that solving this is highly desirable, just not a strict requirement for security purposes.

Not sure how that's the same as "doesn't see a reason to solve data races". I see lots of reasons. I just think it is possible to achieve the security goals without it.

FWIW, I'm hopeful we'll end up including this in whatever model we end up with for Safe Carbon. It's just a degree of freedom we also shouldn't ignore when designing it.

nextaccountic · 2024-06-03T05:58:20

> Not sure how that's the same as "doesn't see a reason to solve data races". I see lots of reasons. I just think it is possible to achieve the security goals without it.

If Carbon doesn't prevent data races, then how exactly will it achieve memory safety? Will it implement something like OCaml's "Bounding Data Races in Space and Time"? [0]

If we ignore compiler optimizations, the problem with data races is that it may make you observe tearing (incomplete writes) and thus it's almost impossible to maintain safety invariants with them. But the job of a safe low level language is to give tools for the programmer to guarantee correctness of the unsafe parts. In the presence of data races, this is infeasible. So even if you find a way to ensure that data races aren't technically UB, data races happening in a low level language surely lead to UB elsewhere.

Ultimately this may end up showing as CVEs related to memory safety so I don't think you can achieve your security goals without preventing data races.

[0] https://kcsrk.info/papers/pldi18-memory.pdf

xmcqdpt2 · 2024-06-03T11:28:27

It is possible to have a memory model that blocks word tearing without full logical data race prevention. Java does it, although it benefits from not having to deal with packed types etc.

nextaccountic · 2024-06-04T07:26:49

I'm not sure, but I don't think this is the case. https://openjdk.org/projects/valhalla/design-notes/state-of-...

> Tearing

> For the primitive types longer than 32 bits (long and double), it is not guaranteed that reads and writes from different threads (without suitable coordination) are atomic with respect to each other. The result is that, if accessed under data race, a long or double field or array component can be seen to “tear”, where a read might see the low 32 bits of one write, and the high 32 bits of another. (Declaring the containing field volatile is sufficient to restore atomicity, as is properly coordinating with locks or other concurrency control.)

> This was a pragmatic tradeoff given the hardware of the time; the cost of atomicity on 1995 hardware would have been prohibitive, and problems only arise when the program already has data races — and most numeric code deals with thread-local data. Just like with the tradeoff of nulls vs. zeros, the design of primitives permits tearing as part of a tradeoff between performance and correctness, where primitives chose “as fast as possible” and objects chose more safety.

> Today’s JVMs give us atomic loads and stores of 64-bit primitives, because the hardware makes them cheap enough. But primitive classes bring us back to 1995; atomic loads and stores of larger-than-64-bit values are still expensive, leaving us with a choice of “make operations on primitives slower” or permitting tearing when accessed under race. For the new primitive types, we choose to mirror the behavior of the existing primitives.

> Just as with null vs. zero, this choice has to be made by the author of a class. For classes like Complex, all of whose bit patterns are valid, this is very much like the choice around long in 1995. For other classes that might have nontrivial representational invariants, the author may be better off declaring a value class, which offers tear-free access because loads and stores of references are atomic.

The key here is the last phrase: "For other classes that might have nontrivial representational invariants, the author may be better off declaring a value class, which offers tear-free access because loads and stores of references are atomic.". This implies that to avoid tearing you would need to introduce a runtime cost to every access, which is unacceptable for a language aiming to replace C++.

And you can assume that a low level language like Carbon has a lot of types with nontrivial invariants. Just like in Java, data races WILL make one thread observe a partially written value in another thread.

In the presence of data races, you can only avoid tearing when writing to fields whose size is smaller or equal than word length (typically, 64 bits). If all you have are small primitives or pointers, then it might work. But Carbon can't abide by this restriction either.

Vt71fcAqt7 · 2024-06-03T02:25:36

Thanks for clarifying that point. It's worth pointing out that the safety strategy doc[0] mentions that

>A key subset of safety categories Carbon should address are:

>[...]

>Data race safety protects against racing memory access: when a thread accesses (read or write) a memory location concurrently with a different writing thread and without synchronizing

But then later in the doc it says

>It's possible to modify the Rust model several ways in order to reduce the burden on C++ developers:

>Don't offer safety guarantees for data races, eliminating RefCell.

>[...]

>Overall, Carbon is making a compromise around safety in order to give a path for C++ to evolve. [...]

One could read this as saying that guaranteed safety against data races is not a goal. Perhaps this doc could be reworded? Maybe something like "Carbon does not see guaranteed safety against data races as strictly necessary to achieve its security goals but we still currently aim for a model that will prevent them."

[0] https://github.com/carbon-language/carbon-lang/blob/trunk/do...

Vt71fcAqt7 · 2024-06-02T17:18:33

You're right. In fact it was in the previous slide[0] from that same talk. Thanks for pointing that out.

[0] https://youtube.com/watch?v=1ZTJ9omXOQ0?t=1h8m19s

jandrewrogers · 2024-06-02T17:27:47

Many of the criticisms are of the C++ standard library design and implementation rather than C++ the language per se, particularly with respect to undefined behavior. Much of this, in turn, is because the C++ standard library is very old and anything new added to it must be compatible and interoperable with all of the older parts, whether or not it is a good idea. Borrow checking is a separate matter.

Modern C++ provides all the tools to build an alternative standard library from the ground up with most behaviors and interactions being defined and safe. This always seemed like lower-hanging fruit for a first attempt than changing the language.

C++ is commonly used in contexts where object lifetimes and ownership are inherently unknowable at compile-time, safety can only be determined at runtime. This is why memory safety annotations like 'std::launder' exist in C++. Trying force these cases into compile-time memory safety box is a source of much friction when using languages like Rust. They handle the 80% case where compile-time safety, destructive moves, etc make things easy for the developer but then significantly worsen the complexity, and therefore safety, of the other 20%. This isn't necessarily a bad thing but it intrinsically creates a market for C++ which explicitly allows you to handle these cases in a reasonable way.

Systems programming is full of edge cases that break tidy programming language models. This has to be accommodated reasonably in languages that want to wear the mantle of "systems language". Zig is an example of a new systems language that does this well in my opinion.

112233 · 2024-06-02T19:56:21

> Modern C++ provides all the tools to build an alternative standard library from the ground up with most behaviors and interactions being defined and safe.

It also goes to a great lenght to embed standard library into base language as deeply as it can, and forces unsafe constructs. Just see how many magic types from std namespace are emmited by compiler when you use range for, initializers, or, worst of all, coroutines. Lambdas with capture by reference replace old fashioned dangling pointers with much more modern dangling references (you are free to copy lambdas that reference local variables :smiling virus with thumbs-up emoji:).

There is no way to get "another library", without making it not-c++. Those guys cannot even make no-exceptions/no-rtti a part of the standard.

Gibbon1 · 2024-06-03T00:29:27

The dependence of C++ is one of the reasons I think C and C++ need a divorce.

Because while you'll never be able to create a new standard lib for C++ it wouldn't be hard for C.

112233 · 2024-06-03T04:04:15

It's more like C++ made by that ISO committee and C++ used by actual users of C++ needs a divorce. They are so far up in the clouds adding features to language that mostly only help them to write less embarrasing standard library code, it hurts to watch sometimes.

Remember discussions that led to std::bit_cast? How absolutely everyone who processes binary data is wrong, because object of type, say, "unsigned" does not exist before it is created, so everyone casting pointers to parse packets is wrong, and only commitee is right?

It is doubly sad to see them claim compatibility to C (C++ is absolutely not compatible to C) as a reason for some features, or single pass compilation as a reason for others...

[edit] To clarify, on one hand, a lot of terrible misfeatures are kept in the name of compatibility, e.g. "int x = x;" ("fixing that will break too many existing exploits") or not being able to add friends without editing source ("debugging? we think it is overrated"), but then they go and proclaim basically all existing code that processes structured binary data as non-conformant.

nextaccountic · 2024-06-03T06:58:31

> a lot of terrible misfeatures are kept in the name of compatibility, e.g. "int x = x;" ("fixing that will break too many existing exploits")

What does this code do? Is this referring to a previously defined x, now shadowed by a new x? Like this

    int x = 1;
    int x = x;

If yes, Rust has this feature too. And in Rust, shadowing variables is very idiomatic and is sometimes used to create safe abstractions.

112233 · 2024-06-03T07:17:02

No, it is referring to the x that is being initialized, and uses uninitialized memory to initialize x. -Wall will even print a helpful reminder. Compare:

    int f(int x) {
        int x = x; // second x is not argument
        return x;
    }

and:

    int f(int x){
        // here second x is argument
        return [x=x] { return x; }();
    }

nextaccountic · 2024-06-03T08:27:05

I don't understand, in the two cases there is a local x shadowing a function parameter also named x.. in the first case it is a local variable that shadows the parameter, in the second it's a closure binding.

Are you sure that, in the first example, in int x = x, the local variable x doesn't get initialized with the function parameter x, but rather gets initialized with itself (uninitialized memory)? Is this instant UB?

Is the second example actually legit, and initializes the closure bound variable with the x function parameter?

112233 · 2024-06-03T09:27:17

It's even more fun than I remember. This compiles without warnings with "gcc -Wall", and produces an error (declaration of x shadows a parameter) if f4 is uncommented

    struct S {
        int x;
        int f1() { int x = x; return x; }
        int f2(int x) { return [x=x]{return x;}(); }
        int f3(int x) { if (int x=x) return x; return x;}
    //  int f4(int x) { int x=x; return x; }
    };
    int main(){}

clang warns a bit with -Wall

    a.cpp:5:26: warning: variable 'x' is uninitialized when used within its own initialization [-Wuninitialized]
    5 |     int f3(int x) { if (int x=x) return x; return x;}
      |                             ~ ^

roca · 2024-06-02T22:02:50

The claim that Rust makes the safety of 20% of code worse requires some justification. It is utterly contradictory to my experience. Having to do a dynamic check (an array bounds check, or `Option::unwrap`) does not make code "less safe".

charleslmunger · 2024-06-04T17:33:54

This post makes a cogent argument that Rust's rules that are normally enforced by the compiler makes unsafe rust harder to write without UB than other languages that are not memory safe by default.

https://zackoverflow.dev/writing/unsafe-rust-vs-zig/

roca · 2024-06-08T04:01:37

In my Rust projects (including Pernosco, 200K lines), less than 0.1% of code directly uses "unsafe". It's a long way from there to 20%.

pcwalton · 2024-06-03T03:54:06

> They handle the 80% case where compile-time safety, destructive moves, etc make things easy for the developer but then significantly worsen the complexity, and therefore safety, of the other 20%.

Thankfully we have actual data on this, not just random numbers, and no, this is not the case in practice.

geertj · 2024-06-03T12:53:13

> Thankfully we have actual data on this, not just random numbers, and no, this is not the case in practice.

Mind to share that data? I think it would be hard to get quantitative data either way.

pcwalton · 2024-06-04T01:35:01

70% of high severity security problems in Chrome are memory safety bugs: https://www.chromium.org/Home/chromium-security/memory-safet...

You can find several papers about Rust in particular by searching for [rust memory safety analysis].

geertj · 2024-06-04T01:56:47

I am familiar with that number but this does not address the point the top-level comment is making. The commenter claims is that there exists a class of problems where lifetime and ownership are not knowable at compile time, and makes the claim that for these problems Rust is worse than C++.

pcwalton · 2024-06-04T02:11:04

Well, I don't know what "worse" means in this context. If it means "less memory safe", that's clearly not true: RefCell and Mutex are formally memory safe, while the C++ equivalents are not. If it means "less correct" in a more vague sense, then that's essentially an argument that Rust programs have more bugs than C++ programs, which seems dubious and certainly not something I'd accept at face value.

mike_hearn · 2024-06-02T20:06:07

See for example https://suslib.cc/

pjmlp · 2024-06-02T19:32:46

Zig is basically Modula-2 with a revamped syntax for C folks, plus compile time metaprogramming.

And yes, many of these issues were already solved by other system programming languages outside the UNIX linage of programming languages.

orf · 2024-06-02T21:18:53

> C++ is commonly used in contexts where object lifetimes and ownership are inherently unknowable at compile-time

What specific contexts are these?

jandrewrogers · 2024-06-03T01:27:19

For example, anything that pages live objects to disk from user space. This is a common feature of high-performance/high-scale database engines and other data intensive software. Modern storage is larger than the addressable virtual memory supported by silicon so you can't mmap() it, and even if you could you wouldn't want to because using virtual memory is significantly slower than doing it from user space.

A single memory address can contain several unrelated live objects depending on when you look; a single live object can appear at different memory addresses over its lifetime without being moved; and live objects may not have a memory address at all. The hardware implicitly owns read and write references to these objects, which are not visible to the compiler and free to ignore the ownership model.

Consequently, you need a way to tell the compiler that the object it thinks exists at an address no longer does, and that object has neither been moved nor destroyed, and that there may be an unrelated object at that address that was never constructed. In these cases, compiler optimizations can generate incorrect code based on a naive analysis based on what is visible at compile-time. C++ provides ways to give this a defined behavior and indicate to the compiler when its lifetime model is incomplete so that it doesn't produce incorrect code. You can build ownership models on top of this that incorporate ownership that is not visible at compile-time.

Some C++ devs don't know these features exist, other C++ devs use them routinely. But the point is that C++ has specific features to address the case when lifetime analysis cannot be correctly done by the compiler.

pcwalton · 2024-06-03T04:02:07

Userspace-paged data is a tiny fraction of the amount of data that C++ programs, taken as a whole, process, and is a very weak argument for dropping static memory safety. Besides, because of unsafe, any Rust-like language, like Circle C++, will have the features needed to deal with such data too.

LegionMammal978 · 2024-06-03T21:40:27

> Consequently, you need a way to tell the compiler that the object it thinks exists at an address no longer does, and that object has neither been moved nor destroyed, and that there may be an unrelated object at that address that was never constructed.

If anything, Rust is even more expressive than C++ in this regard: it has no concept at all of "an object at an address". You can (unsafely) cast a pointer from any type to any other type and read it, as long as the underlying memory is properly initialized. And if you're in a context where you know it will always be valid, or if you can check the validity at runtime, you can put a safe wrapper over the conversion.

For instance, the safe function str::from_utf8() converts a byte-array pointer into a string pointer, after checking that the encoding is correct. Nothing in memory is changed, and the old and new pointers form two different views that can be used simultaneously. And it's not magic: you could copy its implementation yourself, and your function would be just as valid as the standard-library version. Contrast with C++, where all memory must have one single type at any time, regardless of how much laundering you do.

The only kinds of semantic "ownership" in Rust are access restrictions caused by the aliasing rules (which just depend on the pointers in use, not on objects in memory), and local variables and temporaries being dropped at the end of their scope (if they haven't been moved elsewhere). All the high-level lifetime rules of safe Rust are built on top of those semantic rules, and a smidgeon of unsafe Rust can subvert them as necsssary.

nextaccountic · 2024-06-03T07:01:12

> Modern storage is larger than the addressable virtual memory supported by silicon so you can't mmap() it

Larger than the 48 bits address space of most 64 bits architectures?

chuckadams · 2024-06-04T02:55:05

48 bits is “only” 256 terabytes. Totally possible to max that nowadays.

pornel · 2024-06-02T20:58:34

I think the "unknowable" lifetimes are C++'s own making.

It's like types in statically vs dynamically typed languages. Types are unknowable at compile time when the compiler doesn't force them to be static.

And similarly ownership and lifetimes are "dynamically typed" in C++, because the compiler doesn't force them to be rigidly declared like in Rust.

kccqzy · 2024-06-02T23:07:33

Unknowable lifetimes are a necessary evil. Rust has them too. It's called RefCell. Without RefCell there are very useful things that are impossible to write in safe Rust.

pornel · 2024-06-03T14:43:36

But this needs to be explicit, and doesn't undermine other statically-known lifetimes (pointers to the inner cell can't escape it).

Just like statically typed languages have some dynamic dispatch features, but despite that still most of their types can be known with certainty.

geertj · 2024-06-02T17:02:12

First came across this 2 days ago and found this extremely impressive. There's also a YouTube presentation where Sean goes over the main features of Circle's C++ memory: https://www.youtube.com/watch?v=5Q1awoAwBgQ.

This seems to be adding Rust borrow semantics and borrow checking to C++. The delivery vehicle seems to be a C++ compiler that the author has been working on for a number of of years. I couldn't find a ton more on the background of this.

From a technical perspective this looks promising as a C++ successor language. The project will have to attract other members in the C++ community though.

secondcoming · 2024-06-02T17:20:08

He’s active on the C++ Slack, regularly asking questions about the minutiae of the C++ spec. It seems like a massive headache.

KerrAvon · 2024-06-02T16:08:03

Haven’t read through it in enough detail yet to fully understand the language changes, but the authors are absolutely correct on some basic background that other folks don’t always understand:

> Memory-safe languages are predicated on a basic observation of human nature: people would rather try something, and only then ask for help if it doesn't work. For programming, this means developers try to use a library, and only then read the docs if they can't get it to work. This has proven very dangerous, since appearing to work is not the same as working.

100% 100% 100%

tialaramex · 2024-06-02T16:43:17

You already wrote "100%" enough times, so I'll add that Rust's technology, and Rust's culture, still aren't enough, you have to really put the work in counteract this very powerful danger. Rust's technology + culture should mean you won't blow your foot off with this (entirely human) approach in their language, but you can definitely give yourself a nasty splinter, destroy your customer's data, and a million other dumb things.

For example Rust provides types like core::cmp::Ordering, core::time::Duration and core::ops::ControlFlow so sometimes your API will be harder to misuse than in might have been because you know, your timeout parameter was a Duration, not an integer count of seconds (or was it milliseconds?)

But, although eventually Clippy will express doubts, Rust won't force you to rewrite that function which took N booleans and now after modification takes N+1 booleans, even though all your callers are probably a mess of true, false, true, true, false undecipherable flag nonsense and a re-factor was called for.

It's surprisingly hard to give new programmers the right instincts on this stuff. I'm pretty sure I was terrible (thirty years ago) too, so this isn't a humble brag it's just an observation.

dataflow · 2024-06-02T16:31:36

It's an elegant sentence but it's incorrect to say memory save languages are predicated on that? Even a room full of C++ experts who understand this completely and write their code strictly based on formal contracts will still eventually write memory bugs.

Memory safe languages are just predicated on the memory-safety problem being difficult to avoid for humans, because nobody has a 0% error rate. They would still be incredibly necessary and relevant even if nobody relied on "appears to work" as the measure of correctness.

pornel · 2024-06-02T21:30:44

I think the point is that Rust encodes more rules in its interfaces (ownership, lifetimes, thread safety). If you misunderstand how a Rust library works, your code most likely won't compile instead of silently causing UB.

The rules for safe interfaces are the same for all Rust programs, so users know what to expect. Whereas in C++ the library author has more to say what usage is supported and what isn't (e.g. Rust requires all globals to be thread-safe unconditionally, but a C++ library may say it's thread safe if you set a config option or it's your problem to synchronize access).

pizlonator · 2024-06-02T18:06:26

Nice to see Sean cite Fil-C, though he does it much more in passing than it deserves, considering that Fil-C gives you memory-safe C and C++ without requiring any annotations whatsoever. He cites it as a sanitizer and references undefined behavior, which isn't really how I would describe it (unlike a sanitizer, it catches all memory safety bugs, and also unlike a sanitizer, it turns a lot of what would have been UB in C++ into defined-but-safe behavior). It's a very different approach from Sean's.

For example, Circle implies broad overhaul to tooling and libraries while Fil-C implies no such thing. Also, Circle is all about not using existing compilers on the grounds that they are hard to experiment with, while Fil-C is a surgical change to clang/LLVM.

The main advantage of Circle over Fil-C is that if you pay the cost of that overhaul, you'll get performance that is better than what Fil-C could ever do even with a lot of optimization. Not obvious to me if that makes it net better, though.

tialaramex · 2024-06-02T18:44:07

For the performance, there are a bunch of people, some of them probably wrong and others definitely right, who believe they need the best possible performance from software.

You can sell these people something like Rust because you can very often either show why the "better performance" C++ they have is wrong (and if they wanted a wrong answer here's zero already, pay me) or sometimes actually worse performance. Not every time, but often enough to make a real difference. The Circle safety feature should be in the same ballpark.

You can't sell them anything that's just anyway going to have worse performance, if you could they'd be writing Java already. So that's counting against Fil-C.

pizlonator · 2024-06-02T19:05:33

Java is a totally different language, so it’s not even remotely a competitor in this space. Also Java is quite fast, even compared to C or Rust.

Fil-C is all about being able to run existing C/C++ code that nobody is going to rewrite, not even in a dialect like Circle, since the burden of annotations will be too great.

mike_hearn · 2024-06-02T20:08:18

For existing C++ just using a checked std::vector and Boehm GC can get you quite a long way.

pizlonator · 2024-06-02T20:18:32

Nowhere near to memory safety. There are so many exploits left on the table if you do what you say.

Not to mention that Boehm isn’t sound on modern C compilers. Conservative stack scanning can be foiled by optimizations that are valid from the compiler’s perspective.

mike_hearn · 2024-06-03T08:01:48

No, but it can be done without rewriting code. There's a lot of C++ out there that was perhaps once performance sensitive but hasn't been for years due to hardware improvements, or was perhaps never sensitive but used C++ just for team consistency etc. Windows is full of code like that for example. But, there's no funding to rewrite it. For situations like that, things which boost safety but don't require rewrites of any kind are waaaay under-rated.

Compilers should definitely have a mode that stops them breaking conservative stack scanning. GC is a drop-in fix for so many memory safety problems it's a vital weapon in the toolbox. Combined with checked array and vector dereferences, you can get a long way without needing Rust or Circle style rewrites.

pizlonator · 2024-06-03T14:00:56

You get all the way to memory safety if you use Fil-C and you don’t have to rewrite code.

I don’t think it’s reasonable to expect compilers not to break conservative stack scanning. Fil-C uses accurate stack scanning instead, and it’s properly wired into the compiler and the rest of Fil-C’s type system.

mike_hearn · 2024-06-03T15:52:35

Yes but from your web page about it, Fil-C does things like allocate all local variables on the heap and runs code 50x slower. Bounds checks+GC don't impose such a high level of overhead.

pizlonator · 2024-06-03T16:33:51

And my web page about it also says that I haven’t optimized it yet.

It’s not necessary to allocate all locals in the heap and have 50x overhead. It’s that way now because I’m only just getting started.

Bounds checks + an unsound conservative GC isn’t worth the overhead it imposes since it still leaves your code memory unsafe.

_huayra_ · 2024-06-03T01:32:51

Sean Baxter was on cppcast recently, highly recommend anyone interested to give it a listen: https://cppcast.com/safe-borrow-checked-cpp/

quietbritishjim · 2024-06-02T18:01:55

> There's only one systems-level/non-garbage collected language that provides rigorous memory safety. That's the Rust language.

Honest question: what about Ada? It was specifically designed to be a safe language for critical systems, and for a while was mandated for some military systems. Did the author not consider it, or are its protections just not as expansive as Rust's?

steveklabnik · 2024-06-02T18:47:01

I’m not the author, but there’s a few reasons Ada tends to be forgotten in these discussions:

Back when Ada was new, people just didn’t actually like programming in it much. Some did, of course, but many did not. This is the reason the Ada Mandate was abandoned.

This led into a situation with a small, walled off community that didn’t really communicate with the outside world much. This has a compounding effect over time.

Ada, while designed for safety critical systems, was not actually memory safe until fairly recently. Deallocating memory at runtime wasn’t, and in my understanding, may only be in the presence of SPARK? Hopefully someone can chime in here. Now, that fine for the systems Ada tended to be used for, which often have either no dynamic allocation or a singular allocation at program startup, but for inspiration for other language designs, given that it forgoes a hard problem, it’s not really as useful to those who are trying to solve those problems. This doesn’t mean Ada is useless for inspiration, but for “how do I implement memory safety,” it doesn’t have many unique uses things to offer.

None of this means Ada is a bad language, but these are the main contributing factors that I see with regards to your specific question.

pjmlp · 2024-06-02T19:40:14

That is the usual Ada outdated image, just like unsafe code blocks in Rust, those Unchecked Deallocation are wrapped in safe calls, and since Ada95 there are controlled types, allowing for RAII patterns.

Additionally, it is common to use arenas, and many types can be dynamically stack allocated with a retry operation in case there is not enough space, so that the call can be redone with a smaller size.

Memory Management with Ada 2012 from FOSDEM 2016,

https://archive.fosdem.org/2016/schedule/event/ada_memory/

Doesn't go much into SPARK related improvements though, given its date.

roca · 2024-06-02T22:30:53

Rust's affine types and borrow checking give safe Rust a lot more power than the safe subset of Ada95. A trivial getter method that returns a reference (i.e. pointer) to a field of its object is safe in Rust, but in Ada95 (and C++ and most other non-GC languages) in general you can't ensure the object outlives the reference.

It is true that Ada has a safe suitable-for-systems-programming subset that's much better than most languages. I think when people say "Rust is the first safe systems programming language" they implicitly mean "that is expressive enough for me to replace C++ with".

pjmlp · 2024-06-03T06:05:00

Except we are into Ada 2022 nowadays, so SPARK is also part of the story.

roca · 2024-06-02T22:08:43

How does "redo the call with a smaller size" possibly help you? If you could get by with less memory, you should have asked for less memory in the first place.

pjmlp · 2024-06-03T06:06:37

For some types of applications, it is a doable use case, it is the same as asking how malloc() returning NULL helps.

roca · 2024-06-08T04:06:41

malloc() returning NULL isn't that helpful. Most applications just react to that by aborting, and most applications that don't abort take poorly-tested error-handling paths that often don't work. I don't remember ever seeing an application that reacted by trying the allocation again but requesting less memory.

I have seen applications react by pausing and trying the allocation again a few seconds later --- this can be shockingly effective! https://hacks.mozilla.org/2022/11/improving-firefox-stabilit...

steveklabnik · 2024-06-02T21:53:17

Sure, none of that is particularly novel though, so in terms of citing it as something specifically to add to the conversation, there isn't any real reason.

quietbritishjim · 2024-06-02T22:08:41

It was the article author's choice to specifically say there is no safe systems programming language other than Rust. If that is not true then it's worth citing in discussion of the article.

steveklabnik · 2024-06-02T22:30:59

My parent confirmed that deallocating memory is still unsafe, even though there are common usage patterns that help make sure you're doing it correctly.

pjmlp · 2024-06-03T06:08:04

Just like using double linked lists in Rust is unsafe, unless special libraries are used, this is no different.

Also we aren't talking about Ada83 any longer.

tialaramex · 2024-06-02T18:58:03

Ada itself doesn't provide you the same guarantees as Safe Rust. You can use SPARK to grant Ada more memory safety capabilities. However as a language (rather than comments which may or may not be ignored by your Ada compiler) SPARK is from 2014, so now we're close to Rust's age.

I assume that the totality of SPARK's guarantees would get you to the same place as Rust but I don't know.

A big thing which counts against Ada (and SPARK) in practice is that it's not popular. You'll trip over Rust programmers everywhere, two of my friends are getting paid to write Rust, for completely unrelated companies, unrelated reasons, Rust seems like a reasonable fit so those companies picked Rust. You don't see that with Ada and SPARK.

bluGill · 2024-06-03T14:07:10

Rust gets the hype/talk. However there are a lot of Ada programmers out there who just do their 8 hours and go home and live their life. I doubt rust is anywhere near as popular as Ada, but I don't know how to measure that.

tialaramex · 2024-06-03T20:38:20

> I doubt rust is anywhere near as popular as Ada, but I don't know how to measure that.

I would have the reverse guess, but like you I can't think of a reliable way to get reliable numbers. Here's why I get my guess though: There just aren't that many defence IT people proportionally and only some of them are writing Ada. I saw no sign of non-defence Ada in noticeable quantities.

Sometimes people are just upfront, we're a defence contractor, all our clients are military, everybody who works here must get clearance. Sometimes they're a bit more shy, please come talk to our specialist recruiter, and fill out all these forms that are surprisingly interested in you as a person not an employee (because they're pre-screening you, you will get security cleared). Nobody was like "Ada has been so great for our CRUD web sites" or "Come join our team writing a trading platform in Ada" or anything like that.

So I think it just can't work statistically.

cardanome · 2024-06-02T23:34:14

I think Ada not being very popular shows that the need for a memory safe non-garbage collected language is kind of overstated. If there was really that much of a need, people would have flocked to Ada despite its shortcomings.

Most people solved the safety problem a long time ago by switching to garbage collected languages. As times go on and garbage collector designs get better while hardware gets beefier you can get away with using a garbage collected language in more and more cases. Case in point golang marketing itself as a system programming language and succeeding, not to even mention even the game dev accepting C# these days.

To be clear, there is a niche and need for safe and non-garbadge collected languages and there will always be. I am just saying the niche is much smaller than people might think.

I think it is mistake for Rust evangelists to market safety of the killer-feature of the language. Rust is not popular because of the safety but because it lowered the barriers of entry for a new generation of system programmers by providing excellent tooling, error messages and a type system that take at least the lower hanging fruits of the academic functional world. That is all cool stuff beyond the safety circle jerk.

burjui · 2024-06-04T07:36:33

> I think Ada not being very popular shows that the need for a memory safe non-garbage collected language is kind of overstated.

It only shows that Ada is very old and verbose.

madebydouglas · 2024-06-03T01:01:12

It seems like if this can be the next version of C++ that would be a good thing, but if it's just another C++ like language, such as Carbon (even with proposed benefits), aren't we just going in circles here?

Wouldn't a better approach be to instead focus on adding better C++ interop into Rust? Isn't that a more forward thinking approach?

On that line of thinking, what about Swift and its C++ interop? Or is Swift not ready for systems level programming?

zer0zzz · 2024-06-03T01:34:51

> On that line of thinking, what about Swift and its C++ interop? Or is Swift not ready for systems level programming?

A lot of swift behavior is dictated by the standard library. There have been Micro-Swift and embedded swift implementations to cut down overhead along with work on move only types.

I think swift could be a good candidate for a post-c++ language but despite the open community it is highly Apple dominated in its scope and direction. That’s probably my the biggest trouble with swift.

legobmw99 · 2024-06-02T17:28:56

I think a challenge to this evolution path is the same as what motivated Sutter’s cpp2: defaults

I think the premise that a lot of experienced programmers can write good C++ is at least somewhat valid. They know to not use raw pointers, which APIs to use for bound checking, whatever. The issue is that new users don’t, and the defaults are bad.

If I have to write #feature on safety in every file, it becomes possible to forget. Opting in vs opting out

mgaunard · 2024-06-02T17:10:48

Memory-safe languages are IMHO a distraction from making people write bug-free code.

There are much more important bugs than memory safety bugs, including all sorts of performance bugs.

We need to address the whole spectrum instead of compromising all areas in order to tackle niche bugs that only matter in security-critical contexts.

sunshowers · 2024-06-02T17:39:59

This isn't true in practice.

When you don't have to worry about memory safety, your brain is freed up to worry about other kinds of bugs.

The other bit of good news is that the sorts of things Rust does to have memory safety also enable the systematic elimination of many other kinds of bugs. For example, iterator invalidation.

mgaunard · 2024-06-02T22:54:28

For example, the easiest way to make programs avoid use-after-free errors is to never free things.

Would you rather get a program that constantly use all of your RAM until you restart it, or one that consistently uses very little, but someone crashes in edge cases?

sunshowers · 2024-06-02T23:29:12

I'd rather use a program that uses as little memory as possible and that also never crashes. With Rust you can just have both.

mgaunard · 2024-06-03T07:41:31

That's not really what happens. Most Rust code actually does whatever is easiest to get the borrow checker to stop complaining, including extending lifetime beyond the scope that is necessary, and putting reference counting everywhere.

sunshowers · 2024-06-03T15:59:37

I'm not one to boast, but at least my Rust code doesn't do that much. Feel free to have a look, basically all the professional Rust code I've ever written is free software.

Rust is a lot better than GC languages at memory usage, and (because borrowing is actually safe) a bit better than C++.

kiitos · 2024-06-04T03:04:54

Sure, if you never use unsafe, or anything that uses unsafe, transitively.

sunshowers · 2024-06-04T20:20:24

Memory unsafety is a property of von Neumann architectures, and explicit unsafe limits the scope of damage significantly.

I have made major contributions to several Rust projects that have collectively saved organizations over $100MM in production, if not more. I worry about lots of things, but crashes just aren't one of them.

leni536 · 2024-06-03T07:47:50

Good luck not releasing stack frames.

mgaunard · 2024-06-03T08:27:56

Talked like somebody that doesn't understand the stack at all.

On Linux, the stack grows automatically and never shrinks.

leni536 · 2024-06-03T11:26:27

Sure, the stack as storage never shrinks, but it's constantly reused for different kinds of objects. Having said that dynamic memory is also rarely given back to the OS, and you are lucky to get a segfault from a use-after-free there too. This is not the problem.

The problem is accessing objects that are outside of their lifetime, not storage. The objects are gone, even if the storage might still be there.

mgaunard · 2024-06-03T13:06:12

That's not as big of a problem as using more resources than you need.

UncleMeat · 2024-06-03T11:54:04

Use-after-return is a subset of use-after-free bugs. Allocate a temporary on the stack, pass it by reference to somewhere that outlives that function, and then watch your program explode. No dynamic memory allocation at all.

kiitos · 2024-06-09T00:23:10

Escape analysis can easily determine if a temporary/local value that would by default be allocated on the stack outlives the function in which it was declared, and in that case can move the allocation to the heap instead.

UncleMeat · 2024-06-09T12:40:47

Sort of. If your approach is "never free things" then you probably want to know pretty clearly what allocations will produce leaks and a sound escape analysis is going to definitely report some allocations that are unexpected.

But this isn't the hard part.

Since you need to edit your compile to modify allocations, this needs to be a global analysis. If you've got a function that takes a parameter by reference and returns it by reference, you need to adjust the allocations at the call site to be on the heap if the return value outlives the allocation. But... now you've broken separate compilation and need this to run as a global pass with lto. And enjoy all of your false positives caused by sound call graph analysis in a world where you've got virtual and indirect function calls. And what about a local that is stored by reference as a field of an object and then that object is passed by reference to a function? Now you need an object-sensitive pointer analysis. During your compile.

And then there are all of the ordinary problems with C++ that make sound static analysis nightmarish. One of these things havocs and blows everything up to Top and you are quickly yelling at your compiler.

kiitos · 2024-06-20T02:32:16

I dunno, Go does it pretty effectively, are they cheating or something?

umanwizard · 2024-06-02T17:25:22

It’s not true that memory safety bugs only matter in security-critical contexts. If a program segfaults, it’s going to make the user unhappy regardless of security implications.

Yoric · 2024-06-02T17:35:00

And don't forget that a segfault is the lucky case in matters of memory safety bugs. I've worked on many memory safety bugs that manifested themselves by silently corrupting data.

Yoric · 2024-06-02T17:34:06

What kind of niche bugs are you speaking of?

For what it's worth, Rust (and other languages of ML heritage) are really good at getting rid of huge categories of bugs, not just memory. Not all bugs, of course.

UncleMeat · 2024-06-03T11:52:37

The problem with memory safety bugs is that the capacity for unwanted behavior is much higher. This is why they are useful as security vulnerabilities when crafting exploits. They can turn your program into an unbounded one that can compute anything. It also means "uh, we just stored garbage data where there should be a key and now the entire encrypted blob is irrecoverable."

djur · 2024-06-02T21:24:55

> all sorts of performance bugs

A lot of code is written in safe, slow languages that could be written in a safe, fast language.

mgaunard · 2024-06-03T07:54:05

Making all vector accesses bounds-checked, as all memory-safe languages do, is a recipe for slow code.

UncleMeat · 2024-06-03T11:56:12

You can try turning on hardened libc++ and see what performance hit you get. It isn't really too bad. And a considerable portion of that is waste caused by other design choices in C++ making it difficult for the compiler to remove redundant bounds checks.

panstromek · 2024-06-02T18:10:53

The term "Memory safety" also heavily undersells what it actually implies in Rust.

kiitos · 2024-06-02T17:30:15

Memory safety bugs are second only to segfaults in terms of importance.

vitiral · 2024-06-03T01:57:32

Segfaults are a result of violating memory safety. They are the BEST case since you can actually detect the program yolo'd

Far worse is your program starts corrupting your data and suddenly you are executing a routine you had no intention of ever executing

kiitos · 2024-06-04T02:10:25

Not really. Not all memory safety violations result in segfaults, and not all segfaults are the result of memory safety violations.

vitiral · 2024-06-05T12:24:58

Which segfaults are NOT the result of memory safety violations?

kiitos · 2024-06-06T03:32:41

Uh, most? Memory safety is about data races and concurrent access violations. De-referencing a nil pointer isn't a memory safety violation, for example. But I guess if you re-define memory safety to mean anything that can produce a segfault then sure.

logicchains · 2024-06-02T17:37:28

In terms of the actual bugs encountered when developing modern C++ applications, logic bugs vastly outnumber segfaults and memory safety bugs, which are quite rare when you're writing business logic.

bluGill · 2024-06-03T14:10:12

I fully agree. Memory safety issues are a small minority of C++ issues. Memory safety issues when you get them are very difficult to track down, and often have much broader/worse effects.

erik_seaberg · 2024-06-02T18:09:31

With an MMU, segfaults cause pretty abrupt and obvious failures for straying outside the heap or stack. Violating memory safety may cause subtle and unpredictable errors even in correct functions, which is a lot more dangerous.

kiitos · 2024-06-04T02:09:47

No argument there!

throwawayk7h · 2024-06-03T16:12:24

> Institutional users of C++ should be worried. The security community is baying for this language's extinction. The government is on our case about it, and legislation banning C++'s use in certain sectors looks like a possibility. I don't think nitpicking the NSA is a compelling response.

I am a bit worried about this. should I be?

maxloh · 2024-06-02T16:09:23

Also, check out Google's in-development Carbon Language, designed to address the same issues with a Kotlin-like approach. It's an entirely new language that could interoperate with existing C++ code/library.

https://github.com/carbon-language/carbon-lang

kccqzy · 2024-06-02T23:15:33

I'm not convinced Carbon will become popular at all. I was a Google employee when it was first announced and even internally the project was shrouded in secrecy because there was a ton of resistance. Google employees who cared about safety wanted to write Rust.

maxloh · 2024-06-03T01:13:06

Isn't a memory-safe language subset one of their long term goals?

jjnoakes · 2024-06-02T16:18:47

Does Carbon have any undefined behavior? Some of the wording in their goals document suggests that they do/will, but it isn't clear to me.

summerlight · 2024-06-02T19:28:49

My understanding is that if you want a high level of semantic compatibility with C++, a certain degree of UB is quite hard to avoid. But at least unlike C++, they prefer not to leave 100s of different possibilities on the table.

gumby · 2024-06-02T16:45:22

Unless you solve the halting problem, every language has undefined behavior.

tialaramex · 2024-06-02T16:56:22

Nope.

A language which has semantic requirements and isn't willing to reject programs for which it has been unable to determine whether they meet the requirements, will have these cases by Rice's Theorem. This is why C++ is a complete disaster. C++ specifically does not allow the compiler to reject such programs.

You can defuse this like (safe) Rust by just writing conservative checking. You will reject some programs you would like to have successfully compiled, but that's actually OK, programmers who hate it are inspired to improve the checking, that's what Rust's "Non-Lexical Lifetimes" - mentioned by Sean, are about.

Even more radically, you can be a Special Purpose language and just only allow a relatively small subset of possible programs, all of which you know are correct, that's why WUFFS gets to emit indexing with no runtime bounds miss checks - it did all the bounds checks at compile time, all source where that wouldn't be possible isn't legal WUFFS code.

gumby · 2024-06-02T17:20:41

I suppose you can make that assertion by moving the goalposts. But Rice's theorem specifically addresses such behavior: all turing machines are inherently undecidable for all but trivial cases.

You can well argue that Rust may be better than C++ in some dimension, just as others can argue that those qualities have an unreasonable or intractable cost (note that about 20% of all crates resort to `unsafe`). But your blanket assertion is not even supported by the reference you made.

Also the user is free to tell a C++ compiler to reject programs that use many sorts of undefined behavior, though indeed nobody would claim that any current compiler can identify all such cases. But Rice's theorem says that no Rust compiler could either.

mgaunard · 2024-06-02T17:16:54

That is incorrect. A C++ compiler is perfectly allowed to reject any program that contains undefined behaviour.

It's just not required to do so.

tialaramex · 2024-06-03T17:17:52

That would be easy, the problem is that Rice's Theorem tells us the compiler definitionally cannot be sure whether every possible source code represents a valid program if it has any non-trivial semantic requirements (C++, like Rust, has a whole lot of such requirements).

So while "What should we do when we're sure it's not a valid program?" is easy and so it's very silly that some C++ compilers will compile that anyway, it's not enough, you also need to decide "What should we do when we're not sure?" and there the ISO document provides a firm answer, which all the compiler vendors follow - press on anyway, too bad. "Ill-formed - No Diagnostic Required" is how the ISO document describes this situation. The diagnostic isn't required because it's technically impossible to guarantee.

You might insist that "Not sure" can't happen. Unfortunately the reason it's called Rice's Theorem is that a guy named Henry Rice got himself a Maths PhD proving this in 1951.

leni536 · 2024-06-02T17:28:37

It's allowed to reject programs that will evaluate an undefined operation on every possible execution. This is pretty much impossible, so no compiler even attempts this.

mgaunard · 2024-06-02T19:04:18

It can replace any path with undefined behaviour to one that does std::abort.

leni536 · 2024-06-03T07:52:56

Which is very different from diagnosing at compile time.

pjmlp · 2024-06-02T19:41:50

It can, but in practice no one does it.

majoe · 2024-06-02T21:51:11

When you think about it, that's what -fsanitize=undefined is doing. It detects undefined behaviour at runtime and aborts the program.

leni536 · 2024-06-03T07:52:14

It detects some, but not all UBs. AFAIK it can't detect UB due to hnsequenced access to the same variable.

mgaunard · 2024-06-03T08:25:56

There are different sanitizers that detect different types of UB.

Asan, Msan, Tsan and UBSan all detects different ones.

mike_hock · 2024-06-02T18:32:59

Yes, but it seems that we can't satisfy all reasonable real-world use cases this way, or else Rust wouldn't have the "unsafe" escape hatch that's actually used by real code.

umanwizard · 2024-06-02T17:26:47

That’s not true. The halting problem / Rice’s theorem only mean that the behavior can’t be statically determined by the compiler, not that it’s undefined.

Asooka · 2024-06-02T19:08:23

No? The CPU doesn't have undefined behaviour, thus any code running on it doesn't have undefined behaviour either. There are certain operations whose result is not defined in certain states, I think on Intel some SSE operations had undefined results, but even then the same order of operations will lead to the same result. C/C++ compilers generally let you turn off almost all possible undefined behaviour assumptions. I think the only one you can't turn off is the assumption that a pointer to an object is properly aligned, which is understandable - if every pointer had to be checked for alignment before every access, it would ruin performance.

In general I would be much happier if I could assert to the compiler that my data fits certain parameters and it would optimise based on these explicit assertions rather than on implicit assumptions (undefined behaviour). Those assertions could be turned off in the final build if we determine they actually impact performance, but I expect I'd be able to keep them on all the time.

fluoridation · 2024-06-02T21:04:16

CPUs do have undefined behavior.

gumby · 2024-06-02T23:07:12

> The CPU doesn't have undefined behaviour, thus any code running on it doesn't have undefined behaviour either.

Have you read Rice’s theorem? Every nontrivial program on a Turing machine (which has only defined behavior) is undecidable.

Slyfox33 · 2024-06-03T05:17:54

That's not what c++ UB is.

jaccobbo · 2024-06-03T07:55:00

Real computers are not turing machines, they are FSMs.

Yoric · 2024-06-02T18:32:03

C++ has a specific definition of UB. Most languages don't suffer from it, regardless of halting problem.

roca · 2024-06-02T22:42:08

It's an impressive project. One of the key problems for any safe language interoperating with C++ (even a safe C++ subset) is that you really want to be able to interact with C++ standard library types (string, vector etc) safely because they will appear at interface boundaries. Circle introduces its own safe standard library so I don't see how Circle fares here.

Guvante · 2024-06-02T17:23:37

This sounds a lot like C++/CLI which has Microsoft backing and has a whole .NET sub language.

The problem was even when written together the impedance mismatch existed.

ljlolel · 2024-06-02T17:33:11

Who is using Circle C++ in production?

grumpyprole · 2024-06-02T21:14:56

There was once a time when nobody used C++ in production.

ljlolel · 2024-06-02T22:32:17

Sorry are you saying nobody is using Circle? Also it seems closed source?

Asooka · 2024-06-02T18:47:43

I'm not sure that last paragraph had to be so toxic. We've had enough toxic egos in this industry, we don't need any more. I've never heard of this person before, but I do not think I wish to be part of his project.

rurban · 2024-06-02T16:53:51

> rust: rigorous memory safety

> circle : unsafe printf

Comeon people, if you allow unsafety, you cannot call your language safe. There are safe system languages, but don't lie and call unsafe languages safe. Partial safety is not full safety.

wild_pointer · 2024-06-02T17:09:02

> circle: rigorous memory safety

> rust: unsafe { println!("{}", *r1); }

Comeon people, if you allow unsafety, you cannot call your language safe. There are safe system languages, but don't lie and call unsafe languages safe. Partial safety is not full safety.

tialaramex · 2024-06-02T19:24:10

Notice that "unsafe" isn't a magic "off" switch, it's just super powers, things you would otherwise be forbidden from doing are now legal, but things you could have done before have the same behaviour - and if everything in an unsafe block doesn't need super powers you'll get a compiler warning saying the unsafe block you wrote was futile.

So, this only does anything interesting if r1 was a raw pointer so that dereferencing it would be prohibited without the unsafe block. If it's just a reference or some smart pointer type then that's fine anyway.

lpapez · 2024-06-02T17:50:35

Unsafe in Rust is not unsafe in the same sense that C/C++ UB is unsafe.

Unsafe in Rust means "soundness cannot be statically verified, the language will insert runtime checks for you and perform a clearly defined action (panic) if they are violated".

Much ink has been spilled about "unsafe" in Rust being unfortunately named.

loeg · 2024-06-02T19:47:33

> Unsafe in Rust is not unsafe in the same sense that C/C++ UB is unsafe. Unsafe in Rust means "soundness cannot be statically verified

Right.

But as sibling points out, the rest of your sentence is incorrect. The language mostly does not insert additional, runtime checks and you are allowed to create UB-level bad behavior in unsafe blocks.

j16sdiz · 2024-06-02T18:04:43

No. No. And no.

> It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors.

https://doc.rust-lang.org/reference/behavior-considered-unde...

And they transverse

> However, violations of these constraints generally will just transitively lead to one of the above problems.

https://doc.rust-lang.org/nomicon/what-unsafe-does.html

reynmorris · 2024-06-02T16:46:18

A lot of coders already do this. My STL replacement has Vector and VectorUnsafe. Vector is checked to the hilt for bounds safety, stack safety, UB safety, etc. and is slower. But if I have a tight loop, I can use VectorUnsafe and just make sure I'm being careful, and it has no checks at all.

jjnoakes · 2024-06-02T18:07:59

Care to share your Vector implementation that is "stack safe" and "UB safe"?

reynmorris · 2024-06-02T19:40:09

It's simple, whether the backing memory is heap or stack, it's bounds checked. And overriding all the operators and only returning safe types prevents many types of undefined behavior.

Of course you can fuck with it enough to make it unsafe, but at that point you know exactly what you're doing

tialaramex · 2024-06-02T20:49:15

Without seeing it, of course it's hard to write examples, but typically for this type of thing it turns out that "fucking with it" enough to be unsafe is easy to do by mistake and so you end up basically saying resorting to classic C++ "Nobody will make mistakes" safety which we know doesn't work.

In some cases you can even "fuck with it" less than std::vector and cause memory unsafety because std::vector was implemented by people who've been fucked with before, and this "safe" collection maybe was not. Pushing items from the collection itself into the collection again when it's full is often one way to cause this - the std::vector promises this works correctly.

reynmorris · 2024-06-02T21:17:57

You can assume that the collection doesn't have good coverage, but what I'm saying is the constructs in the C++ language are there to make it have good coverage. Pair this with some Clang sanitation (like banning raw pointers) and you'd have to go out of your way to make it unsafe.

umanwizard · 2024-06-02T20:51:26

Does it stop you from writing code like this?

    Vector<int> v {1, 2, 3};
    int *p = &v[0];
    v.push_back(4);
    printf("%d\n", *p); // this is UB

reynmorris · 2024-06-02T21:07:00

That is not an issue with the safety of the Vector, it is an issue with the safety of 'int' and raw pointers. If the Vector grows, that pointer points to freed memory.

But yes, in my implementation I have a safe version of int called 'i32' which overrides the & operator and doesn't allow it to return raw pointers.

UncleMeat · 2024-06-03T12:05:03

It absolutely is a concern about vector. Iterator invalidation is a property of a type and its interfaces. One could design a vector implementation that doesn't invalidate pointers and doesn't provide this footgun to users.

There are significant costs to this safety, of course, just like adding bounds checks.

andrewstuart · 2024-06-02T23:58:08

FALSE - this is so misleading - these statements imply that 68% to 70% of ALL SECURITY ISSUES are due to memory safety - that's wrong and those quotes leave out the important context.

From the post:

  Microsoft's bug telemetry reveals that 70% of its vulnerabilities would be stopped by memory-safe programming languages.

  Google's research has found 68% of 0day exploits are related to memory corruption.

These sorts of false statements make people everywhere think that memory safety is a bigger issue than it is and that it impacts almost all application development which is completely wrong.

binarycrusader · 2024-06-03T00:27:30

Disclaimer: Speaking only for myself.

  memory safety is a bigger issue than it is and that it impacts almost all application development which is completely wrong.

It specifically states that in the context of vulnerabilities so I'm not sure why it's believed to be insufficiently qualifying.

With that said, while you're right to question the precisions of statements, Google says the same for their own data:

https://www.chromium.org/Home/chromium-security/memory-safet...

Around 70% of our high severity security bugs are memory unsafety problems (that is, mistakes with C/C++ pointers). Half of those are use-after-free bugs.

...so it's not just Microsoft saying that.

You'll also see this mentioned on Wikipedia:

https://en.wikipedia.org/wiki/Memory_safety

Going farther, the numbers vary, sometimes even more:

https://www.memorysafety.org/docs/memory-safety/#how-common-...

Extremely. A recent study found that 60-70% of vulnerabilities in iOS and macOS are memory safety vulnerabilities. Microsoft estimates that 70% of all vulnerabilities in their products over the last decade have been memory safety issues. Google estimated that 90% of Android vulnerabilities are memory safety issues. An analysis of 0-days that were discovered being exploited in the wild found that more than 80% of the exploited vulnerabilities were memory safety issues.

In short, I don't think the context here will change most folks' interpretation of the results--nor do I think it should. Any network-connected device is inherently multi-user and the security of that device is only as effective as the least secure program executing on that device at a particular permission level.

I can think of very few applications today that don't interact with a network in some way. In addition to that, my opinion is that most memory-safety issues that create vulnerabilities are also reliability issues. Secure applications are generally inherently more reliable applications.

andrewstuart · 2024-06-03T03:01:03

The full context is that MS and Google said these things - SPECIFICALLY about their compiled consumer applications.

NOT as a general statement about all security vulnerabilities.

So its false and wrong.

logicchains · 2024-06-02T17:25:37

It's quite ironic that on a page about safety they use an if statement without {} braces:

        if(x % 2)
          vec^.push_back(x);

        unsafe printf("%d\n", x);

Skipping the braces for single line conditionals in C++ is a lazy practice that almost inevitably leads to bugs, like Apple's famous goto fail bug: https://www.synopsys.com/blogs/software-security/understandi... . While memory bugs are difficult to prevent, this particular class of logic bug can be eliminated entirely just by always remembering to write two {} around the statement, so a language concerned with correctness should promote such good practices.

        if(x % 2){
          vec^.push_back(x);
        }

Gibbon1 · 2024-06-02T18:29:19

That feels like a I was learning how to program and this bit me and now I'm always scared of it type of bug. I've never seen it myself. But then again some of my friends complain about OCD coworkers that can't not obsessively remove blank lines from code.

I have seen

   for(size_t i=0; i<n; i++);
       foo(i);

   if(a>b);
     a=b;

The famous goto bug was because the assholes had no tests for that module.

jandrewrogers · 2024-06-02T17:36:58

This is nitpicking style, there are no real safety implications. Compilers will happily inform you if the parse is ambiguous or code is unreachable.

logicchains · 2024-06-02T17:41:38

If you write in this style then there's the possibility that someone else (or yourself in future) will add a second line, like:

        if(x % 2)
          vec^.push_back(x);
          do_something_else();

under the mistaken assumption that the second line will also only run in the x % 2 case. People make mistakes, this particular one can happen, does happen, and has happened many times, and the compiler absolutely will not inform or warn you, because it doesn't know you intended the second line to run in the same if statement as the first (because C++ isn't whitespace sensitive). Is ruling out an entire class of bug really not worth the effort of just typing two more characters?

throwaway376512 · 2024-06-02T18:15:27

> and the compiler absolutely will not inform or warn you

False.

https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#inde...

https://clang.llvm.org/docs/DiagnosticsReference.html#wmisle...

j16sdiz · 2024-06-02T18:01:31

Indentation can be fixed with a code formatter in pre commit hook.

dragonwriter · 2024-06-02T18:33:44

Or even an on-save or (and more likely to prevent this kind of error in a large file) as-you-type feature in your IDE/editor.

dllthomas · 2024-06-02T19:12:50

Available autoformatting is great, but IIRC the Apple issue happened in an automatic merge. Autoformatting would still help it be noticed, when someone wondered why the formatter changed some bit of the code they hadn't touched, but a formatting check in with the automated tests would catch it faster.